Academic Requirements (56 total units)
15 Core Courses (48 units)
Core curriculum in data science, workflows, evaluation and analysis, and data visualization
Capstone Project (8 units)
MEDS capstone projects are designed to develop professional problem-solving skills.
Summer - Session B: EDS 212, 214, 215, 216, 221
Fall: EDS 211, 213, 220, 222, 223
Winter: EDS 231, 232, 240, 411A
Spring: EDS 230, 241, 411B
Science in general, and data science in particular, are more and more requiring team science approaches to addressing the most pressing questions. Managing team science projects is therefore becoming an increasingly important skill for any scientist. This course will explore the principles and practical tools available for effective and efficient project management.
Review of quantitative methods that are commonly used in environmental science. The course will cover single and multivariable functions and graphing, basic linear algebra, complex numbers, integral calculus and simple differential equations.
This course will cover the concept of metadata and how it can be leveraged for the integration of heterogeneous datasets into standardized data products. We will practice how to download data from data repositories both manually and programmatically relying on APIs. We will also discuss how to track the provenance of data, generate metadata integrating data semantics to increase data discovery, as well as archiving data products on data repositories to make them available to the broader community.
The generation and analysis of environmental data is often a complex, multi-step process that may involve the collaboration of many people. Increasingly tools that document and help to organize workflows are being used to ensure reproducibility, shareability, and transparency of the results. This course will introduce students to the conceptual organization of workflows (including code, documents, and data) as a way to conduct reproducible analyses. These concepts will be combined with the practice of various software tools and collaborative coding techniques to develop and manage multi-step analytical workflows as a team.
This course explores using filesystems and relational databases to store and manage environmental information. Students will learn to use the `bash` command language to manipulate files, and the SQL language to create and query databases. The course will highlight the importance of filesystem organization and database structure to the survivability and usability of scientific information.
Synthesis tools in environmental science are rapidly evolving and becoming standard, formalized tools for review and assessment. Synthesis can include data aggregation, narrative reviews, systematic reviews, and meta-analysis. Meta-analyses in particular are often viewed as the gold-standard methodology to quantitatively estimate the state-of-the-art of a research domain. The analytics and assumptions have changed significantly within the last 5 years. Key topics covered in this course include effect sizes, scope of inference, and statistical analyses using weighted measures.
This course introduces students to the broad range of data sets that are used to monitor and understand the human and natural systems relevant for environmental science and management. The course will cover field-based and station data, remote sensing products, and large-scale climate datasets including climate model projections. Skills will include designing and evaluating data collection and data quality control, and working with existing databases of time-series and spatial information including new repositories of environmentally relevant datasets (e.g., the NOAA Big Data Project). As environmental problems increasingly require the use of multiple data sets from disparate sources, students will learn the basic workflow involved in selecting, obtaining, and visualizing datasets, and best practices for ensuring the reliability of data intercomparisons. Students will be introduced to emerging data products as well; examples may vary, but include data from automated observational networks and Unmanned Autonomous Vehicles (UAVs) and new satellite data products.
This course teaches key scientific programming skills and demonstrates the application of these techniques to environmental data analysis and problem solving. Topics include structured programming and algorithm development, flow control, simple and advanced data input-output and representation, functions and objects, documentation, testing and debugging. The course will be taught using a combination of the R and Python programming languages.
This course teaches a variety of statistical techniques commonly used to address and analyze environmental data sets and questions and will provide an introduction to foundational concepts of spatial and space-time dependency and associated impacts on inference, with simple models illustrating the impact of space-time dependence when analyzing data from environmental processes. Techniques include: applied regression methods for environmental data, time series methods, spatial distance weighting methods, spatial covariances, spatial prediction using kriging, and multivariate statistics.
This course introduces the spatial modeling and analytic techniques of geographic information science to data science students. The emphasis is on deep understanding of spatial data models and the analytic operations they enable. In addition to this theoretical background, students will acquire facility with libraries, packages, and APIs that support spatial analysis in Python and R.
Computer-based modeling and simulation for practical environmental problem solving and environmental research. The course will cover both the selection and application of existing models and best practices for designing new models. Topics include conceptual models, static and dynamic models, and models of diffusion, growth and disturbance. Techniques include sensitivity analysis, calibration and model scenario design.
This course will cover foundations and applications of natural language processing. Problem sets and class projects will leverage common and emerging text-based data sources relevant to environmental problems, including but not limited to social media feeds (e.g., Twitter) and text documents (e.g., agency reports), and will build capacity and experience in common tools, including text processing and classification, semantics, and natural language parsing.
Machine learning can help process big/complex data and extract knowledge. It forms one of the foundations in data science. This course provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning (decision tree, random forest, support vector machines, neural networks) and unsupervised learning (clustering, dimensionality reduction, deep learning). Problems and exercises are framed within environmental science applications. The course will use programming languages like R and Python to support learning how to do advanced scientific programming to solve real environmental problems.
Effective display and analysis of scientific information is a critical skill. This course will include a discussion of the theory of good visual design and interactive analysis and also present software tools and techniques supporting visual analysis. Students will learn how to ask an interesting data question through MySQL, learn processing software to visualize it in 2D, do a 3D interactive visualization, then follow with a project of a data of their choice. Additional topics will include dynamic and interactive visualization and web-based visualization frameworks.
This course will present state of the art program evaluation techniques necessary to evaluate the impact of environmental policies. The program evaluation methods presented will aim at identifying and measuring the causal effect of policies, regulations, and interventions on environmental outcomes of interest. Students will learn the research designs and methods for estimating causal effects with experimental and non-experimental data. This will prepare the students for interpreting and conducting high-quality empirical research, with applications in cross-sectional data and panel data settings.
This course will focus on ethical considerations in collecting, using, and reporting environmental data, and how to recognize and account for biases in algorithms, training data, and methodologies. Students will also examine the human and societal implications of these issues within environmental data science.
First quarter of a two-quarter group study/analysis of how to apply data science and tools to an environmental problem. In this quarter, students are expected to work with their project client to finalize project plans, assign individual roles and responsibilities, develop a project design plan and deliverables, and make significant headway on implementing those plans.
Second quarter of a two-quarter group study/analysis of how to apply data science and tools to an environmental problem. In this quarter, students are expected to complete all project plans and deliverables, develop and submit a project repository and technical documentation, give an oral defense of the project, present the research to a general audience.