A Reproducible Machine Learning Approach for Interpreting Ecohydrologic Model Outputs
The use of machine learning algorithms for predictive modeling is a growing area of study in the fields of ecology and hydrology. However, these methods have not been fully utilized to investigate process-based ecohydrologic model output. The purpose of this Capstone Project is to develop a framework that models, summarizes, and visualizes important variable relationships within data produced by the Regional Hydro-Ecologic Simulation System (RHESSys)—an ecohydrologic model designed by the Tague Team Lab at Bren. Two machine learning techniques, random forest and gradient boosting, are used to rank variables by their importance in predicting a chosen response variable, giving users a better understanding of where to focus their research. The primary deliverables of this project include a reproducible workflow with extensive documentation on necessary data preparation and machine learning concepts—as well as an interactive application to view workflow results and explore relationships within data. Using these tools, researchers can more efficiently analyze, explore, and identify important variable relationships in RHESSys datasets.
Acknowledgements
Bren School: Will Burke, PhD Student; Janet Choate, Lab Manager and Associate Specialist; Louis Graup, PhD Student; Allison Horst, Assistant Teaching Professor; Naomi Tague, Professor
Maureen Kennedy, Assistant Professor and Tague Team Lab Collaborator, University of Washington