Research and Studies
What Went Wrong
It has now been nearly six months since the June administration of the Regents Physics exam in New York State. Statewide, the failing rate rose from 11% to 33%. In spite of a flood of cries from nearly all schools in the field, Commissioner Richard Mills and his staff have steadfastly maintained that the exam was a good one and that the scoring process was scientifically confirmed.
--What really happened in schools throughout the state?
Response to Schools and the Media
In the June 21 edition of the same paper, SED spokesperson Tom Dunn said, "We're looking at the examination very carefully….We're checking everything again. The main priority…was to make sure the test is fair to the students taking it."
Dr. Gary DeBolt, Assistant Superintendent of the Fairport Schools, has spent numerous hours and days in communication with SED on this issue. In an e-mail from SED representative Anne Schiano on June 21, she stated, "Jerry (DeMauro, psychometrician from SED) has been reviewing the exam with staff for the past few days. So far, it appears the exam is solid. The teachers I have spoken with are not taking issue with the exam nor are they expressing concern with any particular items." Yet, there were numerous complaints about the exam structure and content (i.e. compounded questions and removal of choice) this year.
By June 26, it was apparent that the heat on this issue was not going to abate. In a story in the Syracuse Post Standard, SED spokesperson Dunn said, "The state has heard complaints from many districts about the Physics exam and it is awaiting scores from all districts to see if it was a statewide phenomenon." This statement offered the first ray of hope that the Commissioner and SED would actually recognize a statewide problem. Dunn continued by posing two possibilities, "Either the scoring of the test was too hard or classroom teachers have not done enough to teach the material covered by the state's standards." Data have shown that the scoring was too difficult-Physics had a much higher cut score than any other Regents exam (students needed 68% to achieve a passing grade of 65% vs. 39% to pass Living Environment). Further, it is highly unlikely that the entire teaching core failed to properly teach Physics this year. A third scenario, which Dunn did not mention, is the explanation proffered by SED in defense of the scoring process.
The very next day, in an e-mail response to our district's queries, Diana Harding from SED stated, "The Commissioner, et. al., have reviewed the data in light of teacher issues with the scoring scale. At this point I have not received any information regarding (nor expect to receive) changes in the physics or chemistry scale. This is contrary to media reports released this morning." It was obvious to the media that the striking statewide Physics data could only signify a problem with the scoring. Ms. Harding dispelled that thought immediately and made clear Commissioner Mills' unwavering position and involvement.
Two days later, SED's position on the matter solidified. In an e-mail dated June 28, Ms. Harding explained, "The test was not flawed. Students may have not been prepared appropriately for the exam. Parents who called or e-mailed me were given the following questions to ask their schools. I suggest you look at them carefully." Ms. Harding then listed the questions: "Did the staff receive appropriate staff development? Many opportunities were provided locall (sic), regionally and Statewide by mentors, teacher center, BOCES, CSDs, and SED. Was staff permitted to attend these opportunities?"
The Scaling Process - What Went Wrong
The state contracted with a private firm, Echtenact, Inc., to initiate and complete the setting of the cut score (test grade needed to pass) for Regents Physics. On March 4, 2002, the contractor assembled a group of 28 current and/or retired Physics teachers to establish the minimum passing grade and the grade of distinction or mastery. Prior to the use of scaled scoring movement, the minimum passing score was 65% and distinction or mastery was 85%. This committee process of determining cut scores on standards-based tests is similar to the processes utilized by large testing organizations such as Educational Testing Service which uses five or more groups of 30 educators each to determine fair scoring. The use of only one group of 28 to implement the cut-score process runs counter to the process. However, in and of itself, this is not what went wrong.
The Process Continues a Questionable Path
It is important to note that the Final Reports for all newly developed Regents exams can be found on the state's website, with the exception of Physics and Chemistry. The report for Physics was obtained through the Freedom of Information Law (FOIL). It was surprising to see that references to the cut scores recommended by the Echtenact committee were redacted in this report.
It is clear through the documents received under FOIL that one member of the original committee complained to SED that the cut scores set by the committee on March 4th were too low. Apparently, SED agreed with this one individual, as SED abandoned the group, abandoned the process, and abandoned the results of their work. After abandoning the majority opinion of the group of 28, SED set up its own new group with only 9 of the original members who were willing to reconvene.
SED has done its own internal analysis of the setting of the cut scores by the group of 9. Two reports, Follow Up Rounds to Physical Setting: Physics Regents Examination Standard Setting Study and Analyzing the Assumptions Underlying Standard Setting for the Physical Setting: Physics Regents Examination, were produced. SED has not provided an independent analysis.
The original cut score by the original committee was set at 31. While 31 out of 77 may look low, it is not inconsistent with cut scores set by the same process for the other Regents exams. In fact, it is similar to the cut scores for Earth Science and Living Environment. SED has consistently maintained that these low cut scores represented very difficult questions; therefore, fewer questions needed to be answered correctly to pass or achieve proficiency.
The new group of 9 evaluated each test item in relationship to meeting the standards. These recommendations were then given to state psychometrician, Gerald DeMauro. Mr. DeMauro discarded the results of two of the committee members and modified the results of another 2. Mr. DeMauro then set a new cut score for passing Regents' Physics. The new cut score was set at 58, thus increasing the passing requirement by 87 percent.
Trapped by Past Problems
In 2001, the cut scores for the new Regents exams were heavily criticized by the field and the media as being too low. Peter Simon of the Buffalo News exposed this issue in June 2001. These exams were high-stakes tests that students had to pass in order to receive a diploma. The Physics exam, which is not needed for graduation, is not a high-stakes test. It should be noted that similar cut score and scaling problems occurred with the June 2001 administration of the 8th grade ELA test. Scores plummeted with a corresponding outcry from local schools. SED responded with a minor score change, but basically kept the practices in place.
While my focus here is not the quality or validity of the Physics exam, I must point out that there were significant problems with the test's construction that mitigated its usefulness, effectiveness, and validity. At least one question (#6) came from the old Physics curriculum core and did not belong on the test. The field testing and exams (3 operational and 1 anchor) were done before the core was finished. Further, SED admitted that there was "double jeopardy" on the exam by testing the same content at different levels of difficulty. So, if students answered a particular concept incorrectly at the lowest level of difficulty, later in the exam they were sure to answer the same concept incorrectly under a greater level of difficulty. Science education experts in the state have concluded that there were too few questions to choose from in order to prevent this "double jeopardy" from occurring.
Results of AP Physics Exam vs. Regents Physics Exam
Advanced Placement Physics is a college-level course for advanced high school students across the country. The grades range from 1 to 5, with "5" being "highly qualified." A "3" is considered a solid AP score, and some colleges accept a "2" as a passing score which qualifies a student for up to 6 college credits. The AP exam is a national exam prepared by the College Board.
In Fairport, we compared AP and Regents Physics results for those students who sat for both exams in 2002. Nine students scored a "5" on the AP Physics exam, while scoring a combined average of only 90.6 (B+) on the Regents exam. Thirteen students scored a "4" on the AP test with a combined average of 85.5 (a low B) on the Regents. Eighteen students scored a "3" on the AP and an average of 80.7 on the Regents. Three students scored a "2" on the AP with a Regents average of 78.6.
Our very best Physics students, as judged by national college standards and receiving college credit for their accomplishments, could not score an "A" average on the New York State Physics exam.
Attempts to Point Out the Problems Have Failed
Attempts to communicate the aforementioned cut-score setting irregularities with the Commissioner have been unsuccessful. In fact, one of the committee members (an expert in the field of physics) tried seven times to contact the Commissioner to inform him about what had happened to the scoring process. He was denied access to Commissioner Mills all seven times.
More on Tests & Processes
I have often written of the flaws associated with Item Response Theory (IRT), the process of weighting the difficulty of each question. Setting passing scores (cut scores) has been fraught with problems as well. Both processes are integral to the Physics exam, and both processes appear to have failed.
Unfortunately, similar and equally disturbing errors and failures with the process have occurred with the ELA and Mathematics assessments in grades 4 and 8. In theory, IRT and the cut-score process should be better than former testing models. Had the ETS model of four or five groups of 30 been utilized, the Physics scoring issue and the whole process of setting passing scores should not have occurred. When all of the work for IRT and cut-score setting are discarded, the results are meaningless and, more important, very harmful to students.
In summary, the June 2002 Physics' scoring process encompassed the following problems and irregularities.