|
Rising Demands for Testing Push Limits of Its Accuracy
By DIANA B. HENRIQUES
New York Times
September 2, 2003
During a tutoring session last December, Jennifer Mueller,
a high school student in Whitman, Mass., came up with a
second correct answer for a question on the state's high
school exit exam - an answer that the giant company that
designed the test had never anticipated.
When statewide scores were adjusted to reflect Ms.
Mueller's discovery, 95 dejected seniors who had failed the
test by one point suddenly found they could graduate after
all.
"I got flowers delivered to the school, and letters and
thank you notes," said Ms. Mueller, 18, who wants to be an
American Sign Language interpreter. "I was just wicked
excited."
Her find was not the only testing flaw to surface recently.
Indeed, it was the second problem reported last year in
Massachusetts. In Nevada, a scoring error caused 736
juniors and sophomores to fail that state's high school
exit exam. And in Georgia this spring, officials canceled
statewide exams for more than 600,000 fifth graders when
the third error in three years was discovered in the tests.
Testing is the buzzword of education these days, with state
legislatures and the federal government demanding more of
it than ever before. Everything from high school graduation
to eligibility for transfers, tutoring and federal aid is
tied to the results. But educators and some testing
industry experts are warning that the new demands are
pushing the limits of the testing industry's ability to
provide fair and accurate tests.
When President Bush signed the No Child Left Behind Act in
January 2002, calling for increased annual testing in
grades three through eight by the 2005-06 school year, the
testing industry - dominated by a handful of companies -
had just weathered the three most error-plagued years in
its history. Researchers at Boston College recently found
that last year was hardly better, with at least 18 problems
reported, almost matching the total reported between 1976
and 1996.
Many experts are warning that the increased testing and
tight deadlines of the education law will trigger a spike
in human errors unless greater attention is paid to quality
control issues.
"I think preventing them entirely is impossible," said
Prof. Mark L. Davison, an educational psychologist at the
University of Minnesota, saying that the amount of testing
is likely to double in the next few years. "As existing
companies expand and new companies move into the field,"
Professor Davison said, "they're going to experience
growing pains."
Executives at some of the largest testing companies say
they can meet the demands of the law while improving the
industry's recent track record. But even some of them fear
that educators and politicians have unrealistic
expectations.
"They want faster, better and cheaper - and we often tell
them, pick two out of the three, because you can't have all
three," said Stuart Kahl, the president of Measured
Progress, a fast-growing testing company in Dover, N.H.
Because errors can have such life-altering consequences for
students and schools, a few critics are even calling for
federal oversight of the industry.
Secretary of Education Rod Paige, a staunch defender of the
education law, said that was an issue for Congress to
decide. "If, in their judgment, there is a need for some
type of federal regulation, that's the role that Congress
plays," Mr. Paige said in an interview.
In fact, it is very difficult to monitor the performance of
the big testing companies, said Kathleen Rhoades, a
co-author of "Errors in Standardized Tests: A Systemic
Problem," a study released this summer by the National
Board on Educational Testing and Public Policy at Boston
College.
"They don't have to let you in, they don't have to answer
your questions," said Ms. Rhoades, who worked on the study
with Prof. George Madaus.
Indeed, Ms. Mueller's discovery was only possible because
Massachusetts - hoping to catch errors early - makes all
its test questions public after the tests are given. But
the practice adds substantially to testing costs because
each year's test must be built from scratch.
Beginning in 1999, Ms. Rhoades and Professor Madaus
conducted a systematic search for reports of testing errors
and found more than 100 in the United States, Britain and
Canada from 1976 through 2002, a period that saw
extraordinary growth in school testing. One major testing
company, for example, had its revenues rise more than
tenfold during those years.
The study confirmed the rising number of errors cited in a
series of articles in The New York Times in May 2001. And
more errors have been reported since the research for the
study was completed, Ms. Rhoades said. All told, of the 103
reported errors and disputes over testing results, more
than two-thirds occurred in the past four years. And only a
quarter of those were detected by testing companies
themselves, she said.
Several testing company executives said that the Boston
College study reflected an "antitesting agenda" and that it
did not distinguish between serious errors and trivial
ones. But they agreed with the researchers that haste was
the most common contributor to errors. Neal Kingston, the
chief operating officer at Measured Progress, said his
company had occasionally been asked to devise and deliver
new statewide tests in three months - an utterly impossible
task, he said.
Under the law, schools must show that all students -
regardless of race, for instance - are showing improvement.
But gathering accurate data to allow students to be placed
into the appropriate racial group is a major problem for
testing companies.
Many states still rely on information gathered at the
district, school or even the classroom level. And when
children fill in the demographic information themselves, it
is riddled with errors, Mr. Kahl said. Children may simply
not know which ethnic group they belong in, or even how
their names are listed in school files.
Building these student information systems is an
unappreciated part of the challenge, and expense, of
complying with the law, Mr. Kahl added.
For schools, the school year that opens in September 2005
is "the crunch year," he said. Ideally, testing companies
would already be at work on the new tests that will be
administered then. But few states are that far along, he
said.
The pressure does not ease when the tests are delivered.
States want the tests scored quickly so they can give tests
in May and have the results in time for summer school. "But
giving a test, getting it right and getting it back in two
weeks - you've just multiplied the odds for mistakes," said
Mark Musick, the president of the Southern Regional
Education Board.
Many of the largest testing companies are expanding to cope
with the added work and compressed schedules built into the
law. Pearson Education Measurement, which says it is the
nation's largest school testing company, has increased its
answer-sheet scanning equipment by two-thirds since 2000
and expanded the office space devoted to essay-scoring by
more than 300 percent.
CTB/McGraw-Hill, another testing giant, said it had also
added capacity and was upgrading its aging computer
systems.
And Harcourt Educational Measurement, the third major
full-service company in the market, said it had been adding
professional staff and revising its procedures for
detecting and preventing errors.
Mr. Paige, the education secretary, said that the
opportunities created by the law would attract more
companies to the testing business. But industry experts say
it is hard for new companies to come in because of
shortages of specialized personnel, especially the
psychometricians who devise tests and monitor their
validity. Moreover, newcomers need an expensive computer
infrastructure, and states demand a proven track record.
"You're not going to be able to go to `Joe's Truck Stop and
Testing Service' and get a test," Mr. Musick said. "You've
got to go to a major provider that, in spite of its
problems, is still respected."
Besides time, money can be a key factor in determining how
error-prone a state's testing program is - as shown by a
judge's findings in a lawsuit against the Pearson testing
subsidiary after a large scoring error on Minnesota's high
school exit exam in 2000.
Almost 8,000 students got incorrect scores as a result of
the error, which was discovered when a parent demanded to
see his daughter's test results and found that correct
answers had been marked wrong. Initially, the trial judge
refused to allow the students' lawyers to seek punitive
damages against the subsidiary, then known as NCS Pearson.
But the judge later reversed himself in a scathing opinion
that said the company "continually short-staffed the
relatively unprofitable Minnesota project while maintaining
adequate staffing on more profitable projects like the one
it had in Texas."
The company settled the lawsuit in September 2002 on terms
that prohibit it from commenting on it. But Steve Kromer,
general manager of the Pearson testing unit, said that
Pearson had made substantial improvements in its
quality-control procedures in the past three years.
Harcourt, too, has had some widely publicized problems,
including the one that Ms. Mueller discovered in
Massachusetts. The company recently settled a class-action
lawsuit filed after its testing error in Nevada.
Charging more for improved quality control services,
however, is difficult when state finances are in such
dismal shape and when the costs of complying with the law
are so uncertain.
Concern about this rising tide of testing errors is
reviving the long-dormant issue of industry regulation. "We
regulate our pet food, and we don't regulate the tests
which are making major decisions about the lives of our
kids," said Monty Neill, executive director of FairTest, an
advocacy group in Boston.
Others have called for an independent oversight panel that
could monitor for quality in testing. Professor Madaus, the
co-author of the Boston College study, said he preferred
that approach to letting the federal government regulate
the industry because he feared that politics would taint
the professionalism of test evaluation.
Even some testing executives see merit in at least
compiling a national database to track testing errors.
"Researchers have to hunt and peck where they can to find
the mistakes and compile them," said Dr. Kingston of
Measured Progress. "A lot of mistakes, quite possibly,
don't even get caught."
Return to complete article list
|