Testing, Assessment, Outcomes

A continuing focus of my work is helping others understand the complexities of evaluating student learning by testing and assessment and determining the impact or outcome of teaching on learning. When we talk about what makes a good test, the issues of validity and reliability always come up. The validity of a test is the degree to which it measures something well, and the reliability is established to the degree that the test is reliable across different test takers. A simple example is a vocabulary test.

Example of Valid and Reliable

The test is valid if the vocabulary on the test is directly related to the content of the curriculum. If the curriculum is math and the vocab is quadratic equations, and the test includes:

“The name Quadratic comes from “quad” meaning square, because the variable gets squared (like x2).

It is also called an “Equation of Degree 2″ (because of the “2” on the x)”

If students can apply the relevant vocabulary on the test, and that effectively measures their learning, the test is valid. If the test is helpful across all students in telling the teacher who is learning more and who is learning less about the curriculum, than the test is deemed reliable. The problem of testing can become its mis-use, and its use in a high stakes situation.

Consequences of High Stakes Tests

Bari Walsh: “Daniel Koretz has spent a career studying educational assessment and testing policy, weighing the consequences of high-stakes accountability tests. In a bracing new book that might be seen as a capstone to that work, Koretz excoriates our current reliance on high-stakes testing as a fraud — an expensive and harmful intervention that does little to improve the practices it purports to measure, instead feeding a vicious cycle of pointless test prep. The book’s title, The Testing Charade, captures his point; excessive high-stakes testing undermines the goals of instruction and meaningful learning.”

Bari Walsh: “For parents, teachers, school leaders, and advocates who want to understand how we got here, the book is an accessible exploration, charting a path toward more sensible assessment practices. We asked Koretz, a professor at the Harvard Graduate School of Education, to reflect on how current testing policies touch the lives of parents and teachers — and how they can advocate for change.” From a parent’s point of view, the public conversation around testing can seem quite binary. There’s a pro-rigor and achievement camp, and there’s an anti-testing, opt-out camp. Can you offer a balanced framing of this for parents?” 

Daniel Koretz: “As I stress in The Testing Charade, standardized tests themselves are not the problem; the problem is the misuse and sometimes outright abuse of testing. Testing done right can be valuable, sometimes irreplaceable. For example, how do we know that the performance gap between African-American and white students is slowly narrowing, or the gap between poor and well-off students has been growing at the same time? Standardized tests.”

Daniel Koretz: “And standardized tests, designed and used appropriately, can help teachers improve instruction. Indeed, the main use of standardized tests many years ago, when I was in school, was to improve instruction, not to hold teachers accountable. The pressure to raise test scores has become so strong that testing often degrades instruction rather than improving it. Many parents have encountered this — large amounts of teaching time lost to test prep that is boring, or worse.”

Bari Walsh POSTED: November 3, 2017

Parents Indicted in Cheating Scandal

Some of the recent cheating scandal is illustrative of some of the problems with high stakes testing.

“The massive college admissions scam is a harsh reminder that wealthy families can cheat their way to even greater privilege. And some saythis scandal is just the tip of the iceberg. Here’s what we know so far in this developing case: Who’s involved? Federal prosecutors say 50 people took part in a scheme that involved either cheating on standardized tests or bribing college coaches and school officials to accept students as college athletes — even if the student had never played that sport.”

“Actresses Lori Loughlin and Felicity Huffman are among the dozens of parents facing federal charges. Others charged include nine coaches at elite schools; two SAT/ACT administrators; an exam proctor; a college administrator; and a CEO who admitted he wanted to help the wealthiest families get their kids into elite colleges. How did this scheme work?”

“How the wealthy and powerful allegedly gamed the system
It was all orchestrated by William Rick Singer, CEO of a college admissions prep company called The Key. Singer pleaded guilty to four charges Tuesday and admitted that everything a prosecutor accused him of “is true.” “There were essentially two kinds of fraud that Singer was selling,” US Attorney Andrew Lelling said. “One was to cheat on the SAT or ACT, and the other was to use his connections with Division I coaches and use bribes to get these parents’ kids into school with fake athletic credentials. ”Here’s how the standardized test cheating apparently worked: Some parents paid between $15,000 and $75,000 per test to help their children get a better score, prosecutors said. Singer arranged for a third-party — usually Mark Riddell — to take the test secretly in the students’ place or replace their responses with his own. Prosecutors detail the two-pronged scheme. How did Riddell allegedly take the tests without being noticed by the test administrators? Singer bribed those test administrators, prosecutors said. Igor Dvorskiy, who administered SAT and ACT tests in Los Angeles, and Lisa “Niki” Williams, who administered the tests at a public high school in Houston, are both accused of accepting bribes to allow Riddell to take the tests. Both are charged with conspiracy to commit racketeering.”

—CNN, March 19, 2109

Using Tests Well—Outcomes!

So using tests well means using them for learning outcomes that we all agree is the point of school. Learning outcomes should not be stated in the more or less summative form, but rather, on a scale of quality that is more reflective of how student thinking and learning has changed for the better. These are harder scales to measure and harder outcomes to achieve, but with better testing and formative assessment, we can expect better outcomes. For example, formative assessment, where tests are used to tell teachers and learners what has been learned before the end of the class, or before the end of the course, is a good use of testing and assessment to improve student learning because it promotes real change towards achieving the stated outcomes of the curriculum.


Using tests to compare students across states or different types of schools may not be the best use of tests. Worse effects of the mis-use of testing can lead to unequal advantages of culturally savvy parents and students gaming the system. The best use of testing, to make sure every student not only “gets” the curriculum but thrives in developing their thinking in response to excellent curriculum is the best use and the most equitable outcome for every one of our students. Every one of them!


Dr. Robert A. Southworth, Jr.

Dr. Robert A. Southworth, Jr.

Share this article:

Leave a Reply

Your email address will not be published. Required fields are marked *

More from EdSpeak

Discover the tools and strategies modern schools need to help their students grow.

Community Schools Reform

As a seasoned researcher of K-12 public schools and someone dedicated to improving the quality, equity, and creativity in education, I wholeheartedly support the proposal

Read More »

Subscribe to EdSpeak!

The SchoolWorks Lab Blog, connecting teaching to policy through research.