Posts Tagged ‘evaluation’

Reliability, Validity and Loosely-Coupled Assessment

June 25th, 2009 jonmott Comments

Last week Jeremy Brown wrote a thoughtful response to my post about “PLNs, Portfolios, and a Loosely-Coupled Gradebook.” Jeremy expressed concern that my notion of “loosely-coupled” assessment doesn’t adequately address the issues of validity and reliability. He also issued a “warning” about the “assessment minefield into which” I am marching.

While fully appreciating Jeremy’s evaluation bona fides, my working definitions of “reliability” and “validity” are slightly more straightforward (and conventional?) than those he uses. Simply put, I take validity to mean the accuracy with which I am measuring a variable. In layman’s terms, I ask, “Am I actually measuring what I think I’m measuring?” Reliability, on the other hand, refers to the consistency of those measurements: “Am I measuring the same variable consistently over time (at various points of observation) and across multiple subjects?” The old bathroom scale example has always helped me keep these two concepts straight. If a person weighs 200 pounds and the bathroom scale says they weigh 200 pounds, then the measurement is valid. However, if the scale indicates different weights each time a person steps on it (even though their weight hasn’t changed), the measurement isn’t reliable. On the other hand, if the scale consistently indicates that a 200 pound person weighs only 150 pounds every time they weigh themselves, the measure is reliable (consistent) but not valid (accurate).

It’s important to be clear about what we mean by validity and reliability because Jeremy’s central concern seems to be that greater student ownership of and responsibility for portfolios will degrade the “reliability of the judgments passed” on individual student work. He posits two reasons that this would be the case:

  1. The “degree of difficulty” versus “relative facility” of the work completed and submitted via a portfolio.
  2. Selection of artifacts-A student might inadvertently include artifacts which under-represent her or his actual expertise or skill level.

I concur that the variability in student facility with various digital technologies might result in unreliable measures of student ability and skill. While part of being digitally literate in 2009 means being able to create and publish content online at some minimum level of professional acceptability, evaluators should be careful not conflate portfolio design prowess with content area expertise. The same is true when evaluators read papers-students are expected to write at a basic minimum level of professional acceptability, but eloquent prose is not the same as subject matter achievement. Consequently, it is critically important that those who require portfolios to be abundantly clear about the purposes of the portfolio “assignment” and how the portfolio will be assessed. Portfolios should then be assessed accordingly.

When thinking about the purposes of portfolios, administrators and faculty members should be careful to distinguish between the various goals they might have for student portfolio creation and evaluation. On the one hand, portfolios might be encouraged or assigned to help students reflect on their learning, to engage with others about what they’ve learned and how they’ve learned it, to present their work to various audiences, and to develop essential digital literacy and communication skills. When pursuing such goals, validity and reliability concerns are secondary to the process of creating and maintaining a portfolio, so highly student-centric, student-owned and operated portfolios are desirable.

On the other hand, if the purpose of a portfolio is to provide a consistent, aggregated view of a students’ performance through their time at an institution or in an academic program, reliability and validity are central concerns and the cautions Jeremy offers are more immediately relevant. It is my belief, however, that we should strenuously avoid assigning portfolios for purely institutional or program assessment purposes. If our programs aren’t designed in such a way that examples of student work (whether compiled into portfolios or not) is assessed (and possibly collected and aggregated) along the way, then it seems appropriate to redesign the programs instead of bolting an artificial evidence gathering requirement on at the back end.

Once again, we need to begin with the end in mind. What is it we want our students to  become? What experiences do they need to become such? What artifacts are the natural result of (or natural extensions) of these experiences? How will we consistently evaluate these artifacts to give individual students feedback about their performance and growth? How will we aggregate these evaluations to determine our institutional or program level performance? These are the questions that should drive our portfolio and assessment strategies-not external accreditation requirements. If we focus on these student-centric questions, meeting even the most stringent accreditation requirements will be a relatively simple afterthought.

Demonstrating a Significant Difference

October 31st, 2008 jonmott Comments

Larry Seawright and I made our presentation this morning at Educause 2008. Our slides are available here.

Together with Stephanie Allen and Whitney Ransom McGowan, Larry and I have been working on an alternative approach to evaluating the effectiveness of teaching & learning technology. Traditionally, evaluation takes the form of comparative-media studies in which one group of students learns via standard methods (control) and others learn with new, experimental methods (test). Over and over (and over) again, these kinds of studies have found differences that are not statistically significant.

The so-called “NSD” (no significant difference) problem is the bane of teaching & learning evaluators the around the world. A growing group of influential scholars has rejected the comparative-media studies approach in favor of design-based research. Borrowing elements of this approach, we have implemented a goal-driven model of instructional design, technology integration, and evaluation at BYU.

Our approach to evaluating the impact of teaching & learning technology (and getting beyond the NSD problem) begins with the end in mind. The first and essential step in this approach is to begin any teaching & learning with technology project with a carefully articulated goal. Without such a goal, there is no clear, shared understanding of what “success” looks like. Hence, evaluation is virtually impossible–if you don’t know what success looks like, i.e. what should be better as the result of a project, what should you evaluate?

Measuring the impact of teaching & learning technology depends on a clear articulation of learning goals, strategies for accomplishing those goals and tactics for implementing those strategies. The goals can then be re-formulated as teaching & learning “problems” and strategies and tactics become “solutions.” Evaluation is then simply the process of measuring the results implemented solutions, as illustrated below:


To facilitate the consistent articulation of teaching & learning goals, we’ve adopted the Sloan-C’s Five Pillars: (1) Student Learning Outcomes, (2) Cost Effectiveness (Scalability), (3) Access, (4) Student Satisfaction, and (5) Faculty Satisfaction. By choosing to explicitly focus on one or more of these goals in every teaching and learning project, we identify what success should look like and, at the same time, establish an evaluation plan for each project.

As the examples in our slides suggest, there are often serendipitous results of teaching & learning technology implementation efforts. For example, a project aimed at improving access might also improve student learning outcomes and student satisfaction. However, by articulating and staying focused on a clear, shared rationale (and funding justification) for projects, we have been able to consistently measure and demonstrate the impact of our teaching & learning technology projects and get beyond the NSD problem.

It all begins by starting with the end in mind.