Michael E. Cotterell's Research
We are an interdisciplinary group, primarily focused on how open science, data science, and big data frameworks can be used to facilitate better learning outcomes in data science education.
I am always looking for highly motivated undergraduate and graduate students with strong backgrounds or interests in:
- open source software
- data science
- open science
- interdisciplinary applications
Are you interested in doing a Directed Study (CSCI 4950) under me? I'm currently working on three active projects and some grant proposals in the areas listed above. You should meet with me in person to discuss. If we both agree to pursue the directed study, then whatever you decide to work on, I'll need you to provide a one page outline for your directed study project proposal, including a project description, scope, general roadmap, and deliverables. We'll attach this to the CSCI 4950 form.
My general research interests include, Big Data, Functional Data Analysis (FDA), Regression, and Clustering with inter-disciplinary applications related to Statistics and Informatics:
Recently, companies are collecting more data more frequently than ever before. The collection and analysis of this data poses critical problems related to the scalability of new and existing algorithms. How and where do you store the data? Can the data be processed in a parallel and/or distributed fashion? What new problems occur when applying big data to existing analytics modeling methodologies? I'm particularly interested in how to adapt existing data analytics algorithms for use with big data.
Functional Data Analysis (FDA):
Functional Data Analysis (FDA) is concerned with the analysis of data in continuous functional data spaces or data mapped into such spaces. When the data is treated as a set of functions, one can take advantage of information about the functions as a whole, including their derivatives. How do FDA techniques compare to non-functional techniques? How can new and existing analytics approaches benefit from FDA? What challenges are imposed with applying FDA to big data? I'm particularly interested in how to adapt existing data analytics methodologies and algorithms for use with FDA.
For decades, multiple linear regression modeling has served as a powerful cornerstone for analytics. However, there are still questions of considerable interest in this area. How can algorithms for regression be adapted to accommodate big data and functional data? How does the introduction of functional data in regression-based models affect one's ability to infer things from those models? How do you accommodate different error models when dealing with big and/or functional data? I'm particularly interested in the areas of regression splines (e.g., smoothing splines) and functional regression (i.e., regression models involving functional covariates and functional responses).
Clustering is an unsupervised learning technique that aims to group similar objects into clusters based on a similarity/distance metric. Once clusters are formed, a domain expert can infer relationships between the objects in each cluster based on the distance metric that was used. How can algorithms like k-means and hierarchical clustering be adapted to work with functional data? What are the different distance metrics that can be used with functional dataI? I'm particularly interested in the how to adapt new and existing clustering algorithms for use with big data and functional data.
See below for an overview of my previous research areas.
Keywords: functional data analysis, big data, clustering, predictive analytics, domain-specific embedded languages, ontologies, semantic web, algorithms
Previous Research Areas
This area outlines research that I've worked on in the past.
Domain-Specific Embedded Languages for Analytics, Simulation, and Optimization
I've helped contribute to the design and implementation of ScalaTion, a Domain-Specific Embedded Language (DSEL) written in Scala that serves as a testbed for exploring a modeling continuum that includes Analytics, Simulation, and Optimization. My early work on this project included the addition of Unicode operators within the ScalaTion DSEL with the goal of making source code more concise, readable, and in a form familiar to domain experts. The result, in many cases, is code that looks more similar to textbook formulas than to traditional programming code. Related to this, I also worked on SimOptDSL, a simulation optimization package that can utilize ScalaTion to easily model and execute optimization problems. More recently, I've contributed to many of the components in ScalaTion, including coroutines, the process interaction simulation package, the linear algebra package, the analytics package, and various functions used in probability and statistics.
Ontologies & Semantic Algorithms for Service and Model Suggestion
I've applied ontologies and semantic algorithms towards problems in the Bioinformatics, Energy Informatics, and Big Data Predictive Analytics domains. Within Bioinformatics, I helped extend a service suggestion algorithm created by Rui Wang that utilized the Ontology for Bioinformatics (OBI) and made it available as the Service Suggestion Engine (SSE), a REST web service. I also worked on a plugin for Galaxy, a web application for creating and executing bioinformatics workflows, that provided an interface for Galaxy users to utilize the service suggestion algorithm. This interface allowed users to get help in constructing their workflows by providing suggestions based on the current state of the workflow design as well as user-provided goals. In Energy Informatics, I created the Ontology for Energy Informatics (OEI) as part of my internship at the Department of Energy's National Renewable Energy Lab (NREL). Similar to OBI, this ontology was built on top of the Basic Formal Ontology (BFO) in the hopes that it would facilitate easier integration with other ontologies and systems. Within the domain of Big Data Predictive Analytics, I helped with the construction of the Analytics Ontology (AO) and its associated ScalaDash application. The ontology captured domain knowledge about different analytics modeling techniques and their underlying assumptions. The ScalaDash application utilized OA to provide modeling suggestions based on a description of their dataset. Users can tweak and directly execute the models within the application using ScalaTion in order to facilitate rapid analytics.
The content and opinions expressed on this Web page do not necessarily reflect the views of nor are they endorsed by the University of Georgia or the University System of Georgia.