To gain expertise in a field is to recognize, understand, and effectively use underlying disciplinary principles. Too often students rely on rote memorization rather than principle-based reasoning to solve problems. Students list steps in water transport or stomatal opening but cannot reason to a correct prediction when changes are introduced in the system, e.g. when drought or fusicoccin toxicity occur. A learning progression (LP) helps us understand how students’ mental models of key principles are refined and strengthened over a curriculum. LP-based assessments provide a nuanced measure of students progress towards a goal, rather than a binary right/wrong verdict. Our goal is to develop LP-based computer-scorable constructed response (CR) assessments to describe how students reason about the physiological principles of flux (i.e., movement of substances down gradients) and mass balance (i.e., Conservation of Mass). CR assessments allow students to fully express their understanding of physiological mechanisms rather than merely recognize the correct answer among multiple choice distractors.
Our current learning progression is: Lowest level, L1: story-telling or non-mechanistic reasoning; L2: beginning principle-based reasoning but with errors; L3: principle-based reasoning using components in isolation; L4: principle-based reasoning with consideration of the interacting components, L5: principle-based reasoning with full consideration of interacting components and threshold values. We used our learning progression to design CR assessments in six contexts: plant tropism, plant transport, neuromuscular, renal, cardiovascular, and respiratory physiology. We collected short answer responses from over 5000 students ranging from freshman to seniors. We created human-scoring rubrics aligned with the LP, and experts scored the student responses. We used human raters’ scores as a training target and are applying machine learning techniques to create computer-automated scoring models.
Results/Conclusions
We will present our constructed-response questions and our progress in creating computer-automated scoring models. For example, the computer-automated scoring for an example item has an interrater reliability of Cohen’s Kappa = 0.67, which is categorized as good on Altman’s scale (1991) but could be improved. Detailed analysis of human and machine scoring for this item indicates excellent agreement for L1 responses, good agreement for L2 and L3 responses and weaker agreement for L4 and L5. As the student wording used for both L4 and L5 reasoning is quite similar, more answers are needed for better discrimination. When fully developed, this automated scoring tool will allow widespread use of the learning progression and associated CR assessments to monitor and inform undergraduate ecophysiology teaching.