Back to Home
The ConsortiumPerformance AssessmentActivismConsequences of Testing

This article has been archived for you by

The Impending Loss of Talent: An Exploratory Study Challenging Assumptions About Testing and Merit

by Daryl G Smith & Gwen Garrison 2005

Using five different data sets, this study explores the bidirectional relationship between standardized tests and various indicators of success. Examining student success on a variety of indicators such as cumulative grade point average and graduation rates, the study demonstrates the limited usefulness of such tests, particularly when the data are disaggregated by gender or race and ethnicity. Overall, if tests are overemphasized in the admissions context, they can contribute to a significant loss of talent.

Using five different data sets, this study explores the bidirectional relationship between standardized tests and various indicators of success. Examining student success on a variety of indicators such as cumulative grade point average and graduation rates, the study demonstrates the limited usefulness of such tests, particularly when the data are disaggregated by gender or race and ethnicity. Overall, if tests are overemphasized in the admissions context, they can contribute to a significant loss of talent.

The state of California and increasingly the nation are embroiled in controversies about the use of affirmative action in admissions decisions. The debate has become polarized into two positions: one emphasizing access for students who have been denied entrance historically to our most selective public institutions and the other insisting on academic indicators of merit, particularly tests, to determine the selection procedure.

In many large and selective public universities, the focus of attention has been on the Scholastic Achievement Test (SAT) and other standardized tests, along with grades, as the best and most appropriate criteria for college admissions. At their core, many of these policy discussions rely on the assumption that the tests are valid measures of academic merit and thus are a fair and important factor in deciding admissions. Most recently, numerous lawsuits have been filed in which score differentials on standardized tests, often of just a few points, have been used to assert that students have been denied admission unfairly. Most of these suits are brought on behalf of White students, suggesting that students of color with lower scores were given discriminatory preference (Margulies, 2002; Olivas, 1999). The Supreme Court cases involving the University of Michigan rest centrally on differences in test scores.

In many arenas, the arguments presented by the critics of affirmative action appear to have not only sound logic but also humane concerns. The critics note that underrepresented students admitted under affirmative action often have lower test scores than others. They point out that these students have lower retention rates on average. Thus, ignoring scores not only penalizes those students who have worked hard and "earned" admission, it also sets underrepresented students up for failure-something that would be avoided, these critics assert, if these lower achieving students were to attend less selective and "more appropriate" institutions. These critics also point to the volumes of research that suggest that the SAT and high school grades are two of the best predictors of performance and thus that the focus of educational efforts should be on improving academic achievement and performance in high school rather than on "lowering" admission standards. This study seeks to investigate whether there is empirical support for these claims.

At a meeting concerning the use of testing in college admissions, a highly regarded civil rights attorney commented that if the Law School Admission Test (LSAT) had been weighted heavily when she and Thurgood Marshall had been applying to law school, neither of them would have been admitted. It is that comment that inspired this research. Clearly, finding ways to understand how the SAT and other standardized admissions tests, like the Graduate Record Examination (GRE) and LSAT, identify or ignore talent is an urgent necessity. The current study presents an alternative way of investigating the arguments for and against a test as an indicator of merit by looking at student talent identification retrospectively and empirically. Taking indicators of student success and looking backward, how well would the test fare in identifying talent? How many students, and which ones, would have been successful at the end? How many of these would not have been admitted if the relevant standardized test had been emphasized? An important contribution of this approach is that it retains an emphasis on academic merit and excellence while investigating the adequacy of standardized tests as indicators of excellence by using a variety of archival data sources to explore these issues. Legal and other policy debates ultimately rest on the validity of the tests. However, rarely are validity studies carried out using anything more than 1st-year grades. A key component, then, of any analysis rests on the selection of success indicators that might more closely approximate genuine outcomes. For this study, several longer term indicators of success were used. For undergraduate success, 4-year GPA and graduation from college were appropriate. For law school, passing the bar was a good proxy for access to the profession.

A large body of research suggests that the SAT and other tests are among the best single predictors of 1st-year grades. This study does not question the years of research that support this conclusion, although there are those who do (e.g., "Common Sense," 1997; Crouse & Trusheim, 1998; Duran, 1986; Hathaway, 1984). However, the meaning of "the best" is often ignored in the policy debates. First, the "best" predictor is not necessarily a good or adequate predictor for individual academic success. Most research suggests that tests such as the SAT predict at best 20% of the variance of success (Beatty, Greenwood, & Linn, 1999; Schrader, 1978; Schwan, 1988). Many studies show much lower relationships, and others continue to show little or no relationship (Carver & King, 1994; Dalton, 1976; Fleming & Manning, 1998; Morrison & Morrison, 1995; Thacker & Williams, 1974). These studies suggest that 80% of student success or failure is explained by other factors such as institutional efforts (Hurtado, Milem, Clayton-Pederson, & Allen, 2000; Lowman & Spuck, 1975), noncognitive factors (Sedlacek, 1998), or psychological constructs (Steele & Aronson, 1995). Second, numerous studies suggest that the SAT's power as a predictor varies dramatically for different groups of students by race/ethnicity, age, and gender ("Common Sense," 1997; Crouse & Trusheim, 1998; Dalton, 1976; Duran, 1986; Guinier, Fine, & Balin, 1997; Hathaway, 1984; Rosser, 1992). Third, most research has focused on the role of the SAT in predicting 1st-year grades-leaving ultimate college success, as measured, for example, by graduation, largely unexplored (e.g., Wallach, 1976; Wightman, 2000, 19).

One of the challenges in looking at the data is the use of statistics in policy studies. The use of multivariate analysis in studying predictive validity is a perfectly valid approach. Indeed, it is the only way to look at a complex relationship among numerous factors. Nevertheless, the results of these analyses are often oversimplified when applied in legal and policy contexts. Even though testing agencies and others continue to state that the predictive power of a single test is quite limited, policy discussions and lawsuits continue to assume that merit and tests are synonymous (e.g., Mangan, 2002). This study attempts to depict the direct relationships between available indicators of test scores and academic success. Graphic illustrations are a powerful tool for demonstrating relationships relevant to policy issues. Indeed, Smith, Best, Stubbs, Archibald, & Roberson-Nay (2002) demonstrate the important role of graphics in visually presenting information relevant to important policy and scientific questions. For this study, a number of data sources were identified as ultimate indicators of academic success. These data sources were institutional and national and, while not complete in each case, permitted a variety of different analyses.


Because this study is focused on ultimate success indicators rather than interim indicators such as 1st-year grades, the literature reviewed here focuses primarily on the research that looks at academic success beyond the 1st year and existing retrospective studies. Whatever the focus, however, validity studies are almost always studies of admitted students. Some are institution specific. Some are based on system-wide or national research, while others are based on meta-analyses.

As stated earlier, many studies that use traditional regression models to examine the power of a standardized test to predict grades in the 1st year tend to show some positive relationship. Generally, the power of that prediction varies, but at most, such tests are predicted to account for about 20% of the variance. However, as the success measure becomes more long-term and as ethnicity, gender, and age are considered, the power of the test often declines in significance. There are examples of such research, particularly at the graduate level. In a large meta-analysis of the relationship of the GRE to grades in graduate programs in psychology, Goldberg and Alliger (1992) found that the GRE accounts for at most about 5% of the variance. In one of the most recent retrospective studies, Sternberg and Williams (1997) asked faculty to evaluate the academic performance of students completing a doctorate in psychology. The GRE was a significant predictor of success only for 1st-year grades. There was no validity after that (save a small relationship for men only between a subscore of the GRE and the quality of the dissertation). Gough and Hall (1975) found that as medical students are evaluated for clinical competency as opposed to academic performance, the role of standardized tests diminishes considerably. Guinier et al. (1997) found, in an analysis of grades in the 1st year of law school, that beyond a threshold LSAT score, there was no relationship between grades and LSAT scores. House (1989) found significant differences in predictive validity of the GRE for older and younger students for graduate programs in education with much less validity for older students. In a review of studies of later life achievement, using indicators such as salary, professional esteem, and honor societies, Klitgaard (1985) found no relationship to SAT scores. In a retrospective study using grades in a 1st-year math course, Wainer and Steinberg (1992) found that for similar grades in the math course, women had lower SAT scores. In another study of the relationship of the SAT to grades, Moffatt (1993) found no predictive validity for African Americans or for students over 30. Similarly, Swinton (1987) found, in a study conducted at the Educational Testing Service, that women over 25 consistently outperformed predictors of 1st-year grades in graduate school.

Others have focused on the need to look at other factors and predictors: Sedlacek (1998) demonstrated the predictive validity of "noncognitive" factors for African Americans; Sobol (1984) has documented the importance of admissions judgments; and various studies have demonstrated the impact of educational experience on performance (Lowman & Spuck, 1975; Steele & Aronson, 1995; Young, 1994). Bowen and Bok (1998) underscore the high graduation rates of students who attend selective institutions, regardless of their entering SAT scores-countering quite directly the notion that admitting students with lower scores necessarily jeopardizes their success. Sternberg and Grigorenko (2002) suggest replacing static testing such as the SAT with more dynamic testing.

In general, this review indicates that relatively few studies focus on longer term success, and those that do point to a diminished capacity of tests for addressing admissions concerns. Indeed, few data outside individual institutions allow for the exploration of success looking back to entrance scores. Even fewer data sources are available that permit desegregation by such factors as race/ethnicity, gender, age, and field of study. Finally, the existing research most often relies on the statistical terminology concerning "variance" to communicate the predictive power of tests, leaving open the very real possibility of misusing the results in policy and legal settings.


This study is tailored to look narrowly at the relationship between a variety of success indicators and the test scores for various groups of students by illustrating graphically the power or lack of power in these predictions. It leaves unexplored the very critical element of the importance of successful educational interventions that make test scores even less predictive and the many factors that influence performance outside of a student's background. It is clear that in all the data collected, tests were likely to have been one of the elements considered for admission. As with most studies, this research does not include those students who were not admitted. The critical questions here are the power of the relationship for guiding admissions decisions for individuals and the assumption that through a reliance on the test, academic talent and excellence are revealed.

Public-policy discourse, in contrast to empirical evidence, often suggests that tests are strong predictors of success. In that context, one would expect to see among various indicators of academic success virtually no representation from those at the lowest score levels and total representation from those at the highest levels. One might imagine a stepped bar graph in which for any measure of success, few from the lowest scores would be present, continuing to a greater percentage, if not total representation, from the highest scores. Indeed, such a clean relationship would reflect a correlation coefficient approaching 1.0. One might also expect to find that there is a threshold score or range that would be useful in facilitating the admissions process. Research might reveal that within a specified range at the lower end of the test scores, students are not likely to succeed. The current study is designed to look at the validity of these expectations and, in particular, to illustrate graphically the degree to which such distinctive relationships do or do not exist.


The study required identifying archival data that was longitudinal and included indicators of success and relevant test scores. A strong data set would also include information on race/ethnicity and gender. A search of existing data that would permit the required analysis revealed limited availability. Of those available, some were single-institutional data (shared under assurances of anonymity), one was system-wide data, and one was a national database. In a few cases, we were able to disaggregate by race and gender, but in most cases such data were not always available.


The study used data from five sources: 1. System-wide data from a selective university system. Information was available concerning SAT scores, successful graduation and retention, and dropout due to academic failure. Over 37,000 students were included from the entering class of 1989 and 1990. For this study, breakdowns by race and gender were not available.

2. Admissions data for a selective public university. The data provided information about SAT scores, college grades, and graduation as indicators of success. No race and gender data were available for the study.

3. Baccalaureate and Beyond data. A license to access the Baccalaureate and Beyond data set was obtained from the National Center for Educational Statistics. A subset of the data set focusing on selective institutions was developed. This data set of approximately 2,000 college graduates of selective institutions allowed us to look at SAT scores, college grades, disciplinary fields of study, high school grades, and race and gender.

4. Mean GPA and retention rates for a highly selective science program. A selective private science program provided math SAT scores, mean grade point average (GPA), and graduation rate for its majority and underrepresented students from 1986 to 1995. Here the math SAT was used because of its relevance to the science curriculum and its role in admissions.

5. The bar passage rate for a selective law school. The data included LSAT scores, bar passage rates, race and gender information, and also included whether students were admitted on scores alone or whether a committee reviewed and rated the application.


Because this was an exploratory study, requiring longitudinal data from a variety of contexts, available archival data were identified. By necessity, then, not all data were complete, nor could all questions be addressed. Without complete data, more in-depth analyses were not possible. The system-wide data were not available by race and ethnicity, for example. The Baccalaureate and Beyond and liberal arts data do not permit a comparison of both "success" and "failure" since they include only college graduates. The study of college grades, however, permitted a look at levels of performance. Some data permitted the study of different racial groupings, though the numbers are sometimes extremely small and can thus be unstable.


Traditional predictive studies on validity employ some form of multiple regression to look at the ability of a single background measure or some combinations of measures to predict an educational outcome. By investigating the singular relationship of a standardized test to an outcome, this study is not intended as a substitute for that more traditional approach. The literature reviewed certainly underscores the relevance of those analyses. Indeed, the expectation was that doing such an analysis would replicate the available literature.

Rather, this study was intended to explore the strength or weakness of the relationship between a test and an outcome. For this purpose, a great deal of time was spent trying to find ways to visually depict the relationship through tables and graphs. As Smith et al. (2002) assert in their study of the use of graphs in the "hard" and "soft" sciences, "graphs represent an especially potent and persuasive type of visual device" (p. 751). The goal was to see the relationship and to see the potential impact of denying admissions based on low scores. Indeed, in some ways, the study provides a way of investigating levels of risk in admitting students whose scores range widely on the test by focusing on the bivariate relationship between success and tests.

It was clear that data could be considered in a number of ways: calculating the percentage of success from each test grouping and the percentage of those who achieved success and making comparisons to the population distribution. For this report, three primary analyses are presented using a combination of tables and figures. The first analysis looks at the percentage of success from each of the clustered test scores. That is, among those who were successful, what was the representation from the different SAT groupings? Here, threshold scores can be determined, as can the risk taken when students with various test scores are admitted. Even where differences among groups are noted, one can also see the degree to which students do or do not perform according to what their test scores would predict.

The second analysis shows the distribution of test scores for the success measures used in each data set in comparison to the distribution of scores in the overall population being studied. If 10% of an entering class had SAT scores from a particular range and 10% were present in the population of those who were successful, then having that SAT score would not be an important predictor. The gap between distributions, then, shows how significant the test score becomes in the analysis. The final analysis examines cumulative grades and displays the percentage of students from each test category who achieved a particular grade level.


One sees in the analysis a variety of clusters for test scores. For most of the SAT data, we have used distributions of SAT scores, since these are generally understood. However, the categories varied by data source. Because LSAT scores are less well understood, these scores were grouped by their overall distribution in the population such that we could look at students whose scores were one and two standard deviations above and below the mean. For the Baccalaureate and Beyond database, because the ACT and SAT data were combined, quartiles were used.


For purposes of this summary report, the conclusions from each of the data sets will be presented with a summary analysis at the end.


Broad data from the university system on the relationship of graduation and retention demonstrate, as expected, some relationship between SAT and graduation 6 years after entry (Table 1). Table 1 demonstrates, in the aggregate, a linear relationship between the SAT and graduation. Typically, this is the kind of data that would demonstrate the usefulness of the SAT. Nonetheless, in looking at success, we see that 64% of those with extremely low SAT scores (less than 900 combined) are still successful. The remaining groups include success rates of 71%, 78%, and 82%, respectively (Figure 1).

To eliminate the lowest category would have eliminated 64% who were successful in order to avoid admitting 36% who were not successful. Even in the highest SAT group, nearly 20% were not successful. How to evaluate this level of risk would depend upon an institution's own view of the students in this lowest category and the other attributes these students might bring to the institution. For these data, scores below 900 could be considered a threshold.

Another way to look at the data is to compare the distribution of SAT scores within the population. The lowest SAT composite scores represented only 6% in the entire admissions population and 5% in the graduating class-a trivial difference in the distribution. In fact, the distribution of SAT scores in the successful group mirrors closely the distribution in the admissions population.

It is important to underscore the point that this study does not presume that admissions evaluations are not relevant or that admissions evaluations should not have considered test scores. All the students admitted to this system were evaluated, and some students would have been evaluated based on more than simple numbers. The results of this analysis suggest that the SAT alone is not pertinent to the ultimate success of these students and that successful students are as likely to come from among candidates with lower SAT scores as among those with higher scores. Moreover, basing lawsuits on discrepancies in scores, particularly from the midrange, would not be supported from these data.

Similarly, the dropout rate for academic reasons follows a somewhat similar pattern, though there are more students from the lowest group who drop out than from the others (Table 2). Another view might be, however, that while 12% have dropped out for academic reasons from the lowest SAT group (low scores for a selective institution), 88% did not. Furthermore, there is virtually no distinction among the remaining three categories. Comparing the admitted student distribution to the dropout distribution, one sees overrepresentation in the lowest SAT groups and underrepresentation in the upper two categories.


Looking at ultimate graduation and retention after 6 years, the pattern for the single-university data is similar (see Figure 2 and Table 3). Here again, the graduation rate among the lowest SAT scores admitted is lower than for the other groups, but a majority of students still graduate from that group. Such a success rate might be considered highly significant for a group whose combined scores were less than 800 in a highly competitive institution. For the rest of the distribution, the success rate is relatively flat. Moreover, students with scores of 1000-1200 were as likely to succeed as or more likely to succeed than those above that range.

The population distribution here closely mirrors the graduation distribution, especially in the top three categories. Indeed, there is slightly greater success among the middle groupings than in the highest group.


The Baccalaureate and Beyond data set permitted a great deal of experimentation on how to display the data and how race, gender, academic major, and the kind of selective institution mediated success. Data from two types of selective institutions, labeled Liberal Arts I and Research I in the data set, were considered. These data were examined according to levels of achievement as represented by grades and permitted some analysis by academic major and race/ethnicity.

Table 4 presents the grade distribution for students from selective liberal arts colleges. Of the 43 students receiving a cumulative GPA of A upon completion, 26% came from each of the first three quartiles and 23% came from the highest quartile. Similar patterns emerge for the other grade levels. One can compare these distributions to the population distribution to see slight overrepresentation from three out of four categories of SAT, including the lowest and highest. About 10% of each group, including the lowest SAT group, ultimately earned an A average. The differences emerged for the middle grade levels, where more from the highest quartiles earned B grades than from the lowest quartiles (68% vs. 59%.) One can see, however, considerable overlap.

A similar pattern emerges for the students graduating from the Research I institutions (Table 5), where the distribution for each grade level is close to the population distribution. Note though that the highest quartile is underrepresented in the A category. Overall, one sees little or no relationship between SAT scores and cumulative college grades. Significantly, both for the liberal arts students and those attending a Research I institution, the students getting As are as likely to come from the lowest as the highest SAT quartiles.

Figure 3 and Table 6 include the total group from the selective institutions and present the data slightly differently, but with a similar conclusion. Here the perspective is comparing the percentage of students from each SAT group in each grade level. One sees how flat the distribution is for each grade. That is, 13% of those in the bottom quartile, 12% of those in the 2nd and 3rd quartiles, and 11% of those in the top quartile earned an overall GPA of A at graduation.

Comparisons between those students in the bottom quartile of the SAT and those in the top quartile reveal that a greater percentage of students from the bottom quartile earned As than those from the top quartile. Indeed, for this population, students with low SAT scores are slightly overrepresented in the A grades and students with the highest SAT scores are slightly underrepresented in the A category. This is also the case for both men and women when they are considered separately. Since the differences between test groupings are small, the relative lack of differences is the most important point here.

While the numbers for students of color are relatively small in this group of highly selective institutions, the data are quite revealing. Data for Latino students are presented in Table 7 and Figure 4. The most important conclusion from these data is the uneven relationship between test scores and success. Indeed, among those in the A category, a higher percentage come from the bottom and 2nd quartile than from the top grouping.

For African American students who earned an average of A, 58% came from the bottom quartile (75% from the bottom two quartiles), while only 17% came from the top quartile. By comparing these percentages to the distribution of students in the population in Table 8, one sees that the percentage of students receiving As from the bottom quartile is the same as those in the population. Those receiving As from the top quartile are slightly overrepresented compared to the population. Of those who earned an average of C, 44% came from the bottom quartile, and none came from the top quartile (Table 8).

The analyses by major also reveal little relationship between grades and SAT scores. Indeed, in engineering (Table 9), where one might expect to see a strong relationship between testing and success, students from the bottom quartile are clearly overrepresented in the A category and the top group are underrepresented. Of those graduating with an A average, 46% came from the bottom quartile, even though, overall, only 26% came from that quartile. Of the 20 students in the top quartile, only 1 maintained an A average.

Thus, these data provide support for the weak relationship of the SAT to overall college grades. They also reveal the loss of talent that would occur if the SAT is weighted too heavily in admissions. While the Baccalaureate and Beyond data cannot take into account the dropout rate along the way, the system-wide data reviewed earlier suggest that the SAT scores of those students who dropped out, while playing a role, would not invalidate the current analysis.


A highly selective science program provided data that allowed us to examine the relationship between students' SAT math scores and mean GPAs and graduation rate for majority and underrepresented students. These data span a 9-year period from 1986 to 1995 and involve 2,137 students. The data clearly reveal the limited range of SAT scores for students admitted. Few have scores below 600 on the SAT math exam.

Table 10 displays the graduation rate and mean GPA for all students and then for majority (White and Asian) and underrepresented (African American, Latino, and Native American) students according to their SAT math scores. The data suggest that there is a threshold score of 600. For the 6 students who were admitted with math SAT scores at or below 600, 2 graduated. Even here, disaggregation is important. Two of the 3 White students graduated, while none of the underrepresented students graduated. Above 600, the graduation rates range from 79% to 81% for all students. For underrepresented students the graduation rate ranges from 65% to 83%. However, the highest graduation rate is for the second-lowest and highest SAT groups. The 65% graduation rate is from the middle of the range.

Figure 5 demonstrates that except for those with math SAT scores below 600, the graduation distribution for all students is relatively flat. For majority students, White and Asian, there is a slight threshold below 670 (though the graduation rate is still quite high below the threshold). One can see how variable the distribution is-in particular, for underrepresented students of color-and that students with scores near 600 succeeded as well as or better than students in the other higher groups.

Figure 6 demonstrates, once again, the closeness in GPA within the spectrum of SAT scores represented. For all students and majority students, one sees a small positive relationship between SAT math scores and GPA. Nevertheless, majority students from the lowest SAT grouping have a higher GPA than those in the next group. When one disaggregates underrepresented students, the lack of predictability becomes evident. In addition, one can see that while underrepresented students appear to have lower GPAs, SAT scores are not related to GPA in a meaningful way. Clearly other variables are at play in the success of most students, but especially underrepresented students.


The law school data focused primarily on bar passage rate, in relationship to the LSAT, as the measure of success. For this analysis, the Lassoers of all entering students were distributed according to standard deviations above and below the mean. The mean LSAT score was 163, with the range being 141-178. In a pattern similar to that found in the other analyses, the data suggest limited relationships between bar passage and the LSAT for all students (Table 11).

Indeed, while the bar graph (Figure 7) shows a stepwise progression for the data, the actual bar passage rate varies from 63% to 100%. Here, too, the lowest LSAT group had the lowest pass rate of 63%, suggesting a potential threshold, though one on which 63% who succeeded might be eliminated. Indeed, there is little differentiation except for the lowest group. Comparing the distribution of those who passed the bar with the population distribution, one sees a very similar pattern of distribution. Eliminating the 13% of the population with the lowest test scores would have eliminated 75% of students from those groups who succeeded.

The data, when broken down by gender and race (thus reducing the numbers in each category considerably), suggest that there are gender differences. Men from the lowest group actually achieved 88% passage of the bar. For women the number was 55% (Table 12). However, even the women with scores only one standard deviation below the mean achieved an 82% bar passage rate, as did 79% of the men in this category. While the bar passage increased for the women with an increase in scores, this pattern did not hold for the men. Men from the lowest and highest groups achieved 100% passage, and men from the second lowest achieved 86% passage, in contrast to 75% passage among those from the second highest.

Moreover, the relationship between bar passage and LSAT is quite unreliable for students of color (Table 13). All Asian students from the lowest LSAT groups passed the bar. For underrepresented students the rates varied. While 60% of the lowest group (which included only 5 participants) passed the bar, the passage rates for the other groups were quite varied.

Data were also available for those who were admitted by their scores (usually undergraduate GPA and LSAT scores) alone and those who had to be reviewed by committee. Table 14 summarizes these findings.

There are two significant findings from this analysis. First, there were no differences in bar passage rate between these two groups. 83% of those who were admitted by their scores passed the bar and 81% of those who had to be reviewed by committee passed. Moreover, 47 of the 52 underrepresented students of color were admitted with scores alone, not with committee rankings. 80% of the 5 who were reviewed passed the bar, and 75% of those who were admitted by scores passed the bar. Twenty-eight of 32 Asian American students were also admitted by their scores. Of these, 96% passed. Of the 4 who were reviewed, 50% passed. In contrast, 46% of White students were admitted only after review. Of those reviewed, 83% passed the bar, and of those admitted on their scores, 84% passed the bar. These data support the usefulness of a review and also suggest that White students benefit more from these opportunities than is often presumed. Indeed, of the three ethnic groupings, the percentage of Whites who were admitted based on their scores alone is by far the lowest (54% to 90% for Asians, 93% for underrepresented students).


This study, while exploratory in nature, has attempted to look at academic success retrospectively using a variety of indicators of success-graduation rates, grades, and passage of the bar exam. Five data sets, all but one of which focused on the SAT, provided an opportunity to investigate the role of standardized tests in student success. The data obtained for this study, while having limitations, provide an important perspective on the relationship between tests and success. A key limitation is the use of existing data with different variables provided in each.

Overall, one can see that there would be significant loss of talent if tests emerged as an overriding consideration in admissions. While we might have expected some threshold point to be obtained for the data, we did not find this, except in the case of the highly selective science school. In virtually all the data, there was a demarcation between those whose test scores were at the lowest levels and those whose scores were above. The advisability of admitting students from the lowest testing group would depend on institutional views about risk, intervention, and success. Even so, for students in this group, a high percentage were still represented in the success categories. One can only wonder what improved institutional practices would do to improve success. For all the data, successful students were as likely to come from the middle groups as from the top tier. These results demonstrate the care that must be used in relying on tests to determine merit in facing the many policy and legal challenges that emerge from admissions decisions. We should note that the important findings of this study would have been masked in traditional validity studies.

Moreover, when looking at the distributions by race and ethnicity, one would want to use extreme caution when attributing significance to tests, given the lack of consistency shown in these data. Indeed, contrary to the views of many, it is clear that if there are differentials in performance, they may not be a function of test scores. While further exploration with larger samples of underrepresented students is needed, the current data underscore the need for caution in linking test scores and success. Indeed, the lack of consistency among underrepresented students is one of the most consistent findings from this analysis.

In some policy and legal studies, we are led to believe that when we look at student success we are only seeing students with higher test scores, and conversely when we look at failure, we are looking at students with lower scores. This is particularly true when students come from an underrepresented group. These data suggest quite a different picture. The analysis, though not definitive and certainly limited, underscores the drawbacks of relying too heavily on tests in admissions decisions. It debunks the myth that performance in school is directly related to test scores. Finally, the study supports the power of using retrospective approaches to evaluate both policy and institutional efforts. The view of the relationship between tests and success described in the initial conceptual hypothesis is not sustainable.

Each of the recent challenges to affirmative action in admissions, whether through state propositions or legal action, rests on the argument that students are being admitted with "less merit" and that this approach works against student who have earned "merit." The results of this study suggest that if merit is defined by those who succeed, standardized tests as preadmissions indicators of merit are quite inadequate overall and especially inadequate for underrepresented students of color. In the absence of reliable indicators, holistic admissions and human judgment are likely to be the best approach. This is especially true in the context of highly competitive admissions in elite institutions. In such contexts, there are many more people who are superbly qualified for admissions than there are places available. Reducing the complex calculus of admissions to only one or two numerical indicators is likely to be misleading. Moreover, in the case of using standardized tests as the major criterion, this calculus will work against historically underrepresented students-the groups for whom affirmative action was initially created. Finally, it is clear that educators, policymakers, and lawyers must examine critically any argument that rests primarily on tests as indicators of merit.


Beatty, A., Greenwood, M. R. C., & Linn, R. L. (Eds.). (1999). Myths and tradeoffs. The role of tests in undergraduate admissions. Washington, DC: National Research Council.

Bowen, W. G., & Bok, D. (1998). The shape of the river: Long term consequences of considering race in college and university admissions. Princeton, NJ: Princeton University Press.

Carver, M. R. Jr., & King, T. E. (1994). An empirical investigation of the MBA admission criteria for nontraditional programs. Journal of Education for Business, 70(2), 95-98.

Common sense about SAT score differentials and test validity. (1997, June). Research Notes, RN-01.

Crouse, J., & Trusheim, D. (1998). The case against the SAT. Chicago: University of Chicago Press.

Dalton, S. (1976). A decline in the predictive validity of the SAT and high school achievement. Educational and Psychological Measurement, 36, 445-448.

Duran, R. P. (1986). Prediction of Hispanics' college achievement. In M. Olivas (Ed.), Latino college students (pp. 221-245). New York: Teachers College Press.

Fleming, J., & Manning, C. (1998). Correlates of the SAT in minority engineering students: An exploratory study. Journal of Higher Education, 69(1), 91-108.

Goldberg, E. L., & Alliger, G. M. (1992). Assessing the validity of the GRE for students in psychology: A validity generalization approach. Educational and Psychological Measurement, 52, 1019-1027.

Gough, H. G., & Hall, W. B. (1975). The prediction of academic and clinical performance in medical school. Research in Higher Education, 3, 301-314.

Guinier, L., Fine, M., & Balin, J. (1997). Becoming gentlemen: Women, law and institutional change. Boston: Beacon Press.

Hathaway, J. G. (1984). The mythical meritocracy of law school admissions. Journal of Legal Education, 34, 86-96.

House, J. D. (1989). Age bias in prediction of graduate grade point average from Graduate Record Examination scores. Educational and Psychological Measurement, 49, 663-666.

Hurtado, S., Milem, J. F., Clayton-Pederson, A. R., & Allen, W. R. (2000). Enacting diverse learning environments: Improving the climate for racial/ethnic diversity in higher education. San Francisco: Jossey-Bass.

Klitgaard, R. (1985). Academic performance and later life contributors. In R. Klitgaard (Ed.), Choosing elites (pp. 116-131). New York: Basic Books.

Lowman, R. P., & Spuck, D. W. (1975). Predictors of college success for the disadvantaged Mexican-American. Journal of College Student Personnel, 16(1), 40-48.

Mangan, K. S. (2002). Law-school council considers plan to de-emphasize test scores and rankings that use them. The Chronicle of Higher Education. Retrieved from

Margulies, J. (2002). Lead plaintiff against Michigan still hopes for a spot at its law school. Chronicle of Higher Education, 49(16), A23.

Moffatt, G. K. (1993, February). The validity of the SAT as a predictor of grade point average for nontraditional college students. Paper presented at the Annual Meeting of the Easter Educational Research Association, Clearwater Beach, FL. (ERIC Document Reproduction Service No. ED 356252)

Morrison, T., & Morrison, M. (1995). A meta-analytic assessment of the predictive validity of the quantitative and verbal components of the Graduate Record Examination with graduate grade point average representing the criterion of graduate success. Educational and Psychological Measurement, 55(2), 309-317.

Olivas, M. (1999). Higher education admissions and the search for one important thing. University of Arkansas at Little Rock Law Review, 12, 993-1024.

Rosser, P. (1992). Sex bias in college admissions tests: Why women lose out (14th ed.) Cambridge, MA: National Center for Fair and Open Testing.

Schrader, W. B. (1978). Admissions test scores as predictors of career achievement in psychology. Princeton, NJ: Educational Testing Service (ERIC Document Reproduction Service No. ED 241563).

Schwan, E. S. (1988). MBA admissions criteria: An empirical investigation and validation study. Journal of Education for Business, 63, 158-162.

Sedlacek, W. (1998). Admissions in higher education: Measuring cognitive and noncognitive variables. In D. Wilds & R. Wilson (Eds.), Minorities in higher education (pp. 47-71). Washington, DC: American Council on Education.

Sobol, M. G. (1984). GPA, GMAT, and scale: A method for quantification of admissions criteria. Research in Higher Education, 20(1), 77-88.

Smith, L. D., Best, L. A., Stubbs, D. A., Archibald, A. B., & Roberson-Nay, R. (2002). Constructing knowledge: The role of graphs and tables in hard and soft psychology. American Psychologist, 57(10), 749-761.

Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797-811.

Sternberg, R. J., & Grigorenko, E. L. (2002). Dynamic testing: The nature and measurement of learning potential. New York: Cambridge University Press.

Sternberg, R. J., & Williams, W. M. (1997). Does the Graduate Record Examination predict meaningful success in the graduate training of psychologists? American Psychologist, 52(6), 630-641.

Swinton, S. S. (1987). The predictive validity of the restructured GRE with particular attention to older students. Princeton, NJ: Educational Testing Service.

Thacker, A. J., & Williams, R. E. (1974). The relationship of the Graduate Record Examination to grade point average and success in graduate school. Educational and Psychological Measurement, 34, 939-944.

Wainer, H., & Steinberg, L. (1992). Sex differences in performance on the mathematics section of the Scholastic Aptitude Test: A biredectional validity study. Harvard Educational Review, 62, 323-336.

Wallach, M. A. (1976). Tests tell us little about talent. American Scientist, 64, 57-63.

Wightman, L. (2000). Standardized testing and equal access: A tutorial. In M. Chang, D. Witt, J. Jones, & K. Hakuta (Eds.), Compelling interest: Examining the evidence on racial dynamics in higher education (pp. 84-125). Stanford, CA: Stanford University Press.

Young, J. W. (1994). Differential prediction of college grades by gender and by ethnicity: A replications study. Educational and Psychological Measurement, 54(4), 1022-1029.

DARYL G. SMITH is Professor of Education and Psychology at The Claremont Graduate University. Her research interests focus primarily on issues of diversity in higher education and institutional change. She has written on the educational benefits of diversity, hiring issues for diverse faculty, and strategic evaluation.

GWEN GARRISON is the Director of Student and Applicant Studies at the Association of American Medical Colleges. Her research interests include the benefits of diversity, institutional impact, and educational policy.

Cite This Article as: Teachers College Record Volume 107 Number 4, 2005, p. 629-653 ID Number: 11815, Date Accessed: 5/2/2007 6:26:52 PM

Return to complete article list