Michael E. Cotterell's Research
We are an interdisciplinary group, primarily focused on how open science, data science, pedagogical approaches, and big data frameworks can be used to facilitate better learning outcomes in computer science education and data science education. Collaboration and pair programming is encouraged.
I am always looking for highly motivated undergraduate and graduate students with strong backgrounds or interests in:
java, python, scala
data science, analytics
open science, jupyter
Are you interested in doing a Directed Study with me? Our department offers two courses for this:
I'm currently working on multiple projects and grant proposals in the areas listed above. You should meet with me in person to discuss. I encourage you to look at the course description for 4960 if you want to explore your own research question and earn experiential learning credit.
If we both agree to pursue a directed study, I'll need you to provide a 1—2 page directed study project proposal using the ACM Master Article Template. This proposal should include the following sections:
Introduction — explain your project idea and why it is significant.
Related Work — provide a brief overview of similar and/or related projects.
Method and Design — explain your plan for the project, including a list of deliverables with proposed dates.
Significance and Conclusion — discuss the potential impact of the project.
References — provide a list of references using the ACM Citation Style and Reference Format.
We'll attach this to the CSCI 4950/4960 form. Students who wish to explore a research project are encouraged to read this article before drafting their proposal.
Current Directed Studies
Student names used with permission.
Michael Runyan (CSCI 4960; Fall 2018)
Past Directed Studies
Student names used with permission.
Michael Runyan (CSCI 4950; Summer 2018) - Started the first phase of a data science investigation that explores the relationship between social media sentiment and the price of cryptocurrencies. The code and derived dataset for this investigation will be posted on GitHub. This work was continued later in CSCI 4960.
Dat Le-Phan (CSCI 4950; Spring 2018) - Contributed to the ScalaTion Kernel project, fixing some known bugs and implementing new graphing/plotting capabilities for ScalaTion vectors. He also updated corresponding documentation.
My general research interests include, Big Data, Functional Data Analysis (FDA), Regression, and Clustering with inter-disciplinary applications related to Statistics and Informatics:
Recently, companies are collecting more data more frequently than ever before. The collection and analysis of this data poses critical problems related to the scalability of new and existing algorithms. How and where do you store the data? Can the data be processed in a parallel and/or distributed fashion? What new problems occur when applying big data to existing analytics modeling methodologies? I'm particularly interested in how to adapt existing data analytics algorithms for use with big data.
Functional Data Analysis (FDA):
Functional Data Analysis (FDA) is concerned with the analysis of data in continuous functional data spaces or data mapped into such spaces. When the data is treated as a set of functions, one can take advantage of information about the functions as a whole, including their derivatives. How do FDA techniques compare to non-functional techniques? How can new and existing analytics approaches benefit from FDA? What challenges are imposed with applying FDA to big data? I'm particularly interested in how to adapt existing data analytics methodologies and algorithms for use with FDA.
For decades, multiple linear regression modeling has served as a powerful cornerstone for analytics. However, there are still questions of considerable interest in this area. How can algorithms for regression be adapted to accommodate big data and functional data? How does the introduction of functional data in regression-based models affect one's ability to infer things from those models? How do you accommodate different error models when dealing with big and/or functional data? I'm particularly interested in the areas of regression splines (e.g., smoothing splines) and functional regression (i.e., regression models involving functional covariates and functional responses).
Clustering is an unsupervised learning technique that aims to group similar objects into clusters based on a similarity/distance metric. Once clusters are formed, a domain expert can infer relationships between the objects in each cluster based on the distance metric that was used. How can algorithms like k-means and hierarchical clustering be adapted to work with functional data? What are the different distance metrics that can be used with functional dataI? I'm particularly interested in the how to adapt new and existing clustering algorithms for use with big data and functional data.
See below for an overview of my previous research areas.
Keywords: functional data analysis, big data, clustering, predictive analytics, domain-specific embedded languages, ontologies, semantic web, algorithms
Previous Research Areas
This area outlines research that I've worked on in the past.
Domain-Specific Embedded Languages for Analytics, Simulation, and Optimization
I've helped contribute to the design and implementation of ScalaTion, a Domain-Specific Embedded Language (DSEL) written in Scala that serves as a testbed for exploring a modeling continuum that includes Analytics, Simulation, and Optimization. My early work on this project included the addition of Unicode operators within the ScalaTion DSEL with the goal of making source code more concise, readable, and in a form familiar to domain experts. The result, in many cases, is code that looks more similar to textbook formulas than to traditional programming code. Related to this, I also worked on SimOptDSL, a simulation optimization package that can utilize ScalaTion to easily model and execute optimization problems. More recently, I've contributed to many of the components in ScalaTion, including coroutines, the process interaction simulation package, the linear algebra package, the analytics package, and various functions used in probability and statistics.
Ontologies & Semantic Algorithms for Service and Model Suggestion
I've applied ontologies and semantic algorithms towards problems in the Bioinformatics, Energy Informatics, and Big Data Predictive Analytics domains. Within Bioinformatics, I helped extend a service suggestion algorithm created by Rui Wang that utilized the Ontology for Bioinformatics (OBI) and made it available as the Service Suggestion Engine (SSE), a REST web service. I also worked on a plugin for Galaxy, a web application for creating and executing bioinformatics workflows, that provided an interface for Galaxy users to utilize the service suggestion algorithm. This interface allowed users to get help in constructing their workflows by providing suggestions based on the current state of the workflow design as well as user-provided goals. In Energy Informatics, I created the Ontology for Energy Informatics (OEI) as part of my internship at the Department of Energy's National Renewable Energy Lab (NREL). Similar to OBI, this ontology was built on top of the Basic Formal Ontology (BFO) in the hopes that it would facilitate easier integration with other ontologies and systems. Within the domain of Big Data Predictive Analytics, I helped with the construction of the Analytics Ontology (AO) and its associated ScalaDash application. The ontology captured domain knowledge about different analytics modeling techniques and their underlying assumptions. The ScalaDash application utilized OA to provide modeling suggestions based on a description of their dataset. Users can tweak and directly execute the models within the application using ScalaTion in order to facilitate rapid analytics.
The content and opinions expressed on this Web page do not necessarily reflect the views of nor are they endorsed by the University of Georgia or the University System of Georgia.