2020 ESA Annual Meeting (August 3 - 6)

COS 57 Abstract - A lesson plan for incorporating machine learning into undergraduate biology education: Big data and bird songs

Lauren E. Schricker1, Sam Donovan1 and Justin Kitzes2, (1)Biological Sciences, University of Pittsburgh, Pittsburgh, PA, (2)Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA
Background/Question/Methods

Despite the recent emergence of data science as a target for undergraduate education, little is known about how to successfully introduce these skills in disciplinary contexts such as lower-level biology courses. Based on research ongoing in the Kitzes Lab at the University of Pittsburgh, we set out to develop a modular lesson plan for teaching basic data science concepts that we titled “Machine learning with bird songs.” Our goal was to develop a modular, laboratory-based curriculum that would occupy approximately two, two-hour lab sessions, with supplementary options for lecture-based introductions and follow-up. The lesson revolves around teaching students to use basic machine learning techniques to identify bird songs from recorded audio and associated visual spectrograms. Our hypothesis is that the use of a concrete system such as bird song recordings will enable diverse student populations to explore otherwise abstract data science concepts in a flexible and accessible manner.

Results/Conclusions

We have designed and released our draft lesson plan as an Open Educational Resource on QUBESHub. Our lesson consists of three learning tracks that can be used individually or sequentially by an instructor. The first track consists of a visual exercise in constructing groups of bird songs from an unlabeled set of spectrograms (unsupervised learning) followed by an exercise in matching labeled templates to unlabeled example files (supervised learning) and the construction of a confusion matrix (model evaluation). In the second track, students receive or generate cross-correlation scores between a known template of a bird song and other training files (feature extraction), plot cross-correlation scores and examine them graphically (visualization), and train a simple k-nearest neighbor classifier to identify unknown songs (model training, optimization, and prediction). In the third track, students can use an inexpensive field recorder to record their own audio data and use previously developed models to classify these data to the species level. Our next steps are to partner with the Quantitative Undergraduate Biology Education and Synthesis (QUBES) project to support educators in customizing these teaching resources for diverse institutional settings and student audiences. To engage with this audience, we are recruiting a Faculty Mentoring Network (FMN) group through the QUBES community.