Selected Annotated Bibliography on The SAT: Bias and Misuse


Includes entries on:


Admissions Alternatives

Coaching

Gender Bias

Test Misuse

Predictive Validity

Racial/Ethnic Bias

Speededness

Test Construction

 


Compiled by the staff of the

National Center for Fair & Open Testing (FairTest)


Updated January 2002

2002

 


INDEX

GENERAL ANALYSIS


Bowen

Lemann

Nairn

National Research Council

Owen

Weiss

 


ADMISSIONS ALTERNATIVES


Assocation of Governing Boards

Bradley

Hiss

Lavin

Rooney

Sacks

Schaffner

 


BIAS REDUCTION TECHNIQUES


Angoff

Shapiro

 


COACHING


Becker 1990a

Coffin

Cole

Federal Trade Commission

Johnson 1984, 1989

Powers

Reynolds

Slack

Stockwell

Zuman

 


COURSEWORK/CLASSES TAKEN


CEEB 1997, 2001

Gross

Rosser 1989


Sharif


Wainer 1992

Wilder

 


GENDER BIAS


Becker 1990b

Bolger

Boswell

Bridgeman 1989, 1991, 1992, 1994

Burton

Carlton

Clark

CEEB 1997, 2001

Connor 1992

Donlon

Dwyer 1976

Gross

Hembree

Kanarek 1988

Kessel

Leonard

Linn 1987, 1989

Mazzeo

Miller

Rosser 1989, 1992

Schmitt 1991a


Sharif


Silverman

Stricker 1991

Ting

Wainer 1992

Wilder

 


GUESSING


Albanese

Katz, 1994

Linn 1987

 


LANGUAGE BIAS


Hoover

Taylor

 


“NEW” SAT


Bridgeman 1992

Katz, 1994

Schmitt 1991a

 


PREDICTIVE VALIDITY/USEFULNESS


Baron

Bowen

Bradley

Bridgeman 1989, 1991, 1992, 1994

CEEB 1999

Cole

Cone

Crouse 1988, 1991

Geiser

Hiss

Horner

Kanarek 1989

Katz, 1994

Kessel

Lang

Morgan 1990

Pearson

Rooney

Rosser 1989

Schaffner

Sedlacek

Slack

Stricker 1991

Ting

Tracey

Vars

Wainer 1992, 1993

Wilson

 


RACIAL/ETHNIC BIAS


Bowen

Carlton

CEEB 2001

Cole

Connor 1992

Crouse 1988

Donlon

Dorans

Freedle

Gross

Hackett

Hembree

Hiss

Hoover

Lavin

McIntosh Commission

Pearson

Rogers

Rooney

Schmitt 1987, 1990, 1991a, 1991b

Silverman

Steele

Taylor

Ting

Tracey

Vars

Walter

 


SAT-MATH


Becker 1990b

Bridgeman 1989

Clark

CEEB 1997, 2001

Donlon

Dwyer 1976

Gross

Johnson 1989

Kessel

Linn 1989

Rosser 1989

Schmitt 1991a

Wainer 1992

Wilder

 


SAT-VERBAL


Bridgeman 1992

Burton

Clark

CEEB 2001

Donlon

Dwyer 1976

Freedle

Katz 1991, 1994

Rogers

Rosser 1989

Schmitt 1988

Wilder

 


SPEEDEDNESS


Dorans

Rosser 1989

Schmitt 1987, 1991b

 


TEST ANXIETY


Everson

Hembree

 


TEST CONSTRUCTION/CONTENT


Bolger

Bridgeman 1992, 1994

Carlton

Donlon

Dorans

Dwyer 1976

Hackett

Katz 1991, 1994

Kessel

Lemann

Linn 1989

Mazzeo

Miller

National Research Council

Rosser 1992

Schmitt 1988, 1990, 1991a, 1991b

Taylor

Wainer 1989

 


TEST MISUSE


CEEB 1988

McIntosh Commission

National Assn. for College Admission Counseling

National Research Council

Rooney

Rosser 1989, 1992


Sharif


Walter

Wilder

 


Albanese, M. A. The Projected Impact of the Correction for Guessing on Individual Scores. Journal of Educational Measurement, V.25, N.2, Summer 1988.


Demonstrates that on tests such as the SAT, which have a penalty for guessing, substantial score losses can occur as a result of omissions. “One might speculate that an examinee might not gain admission to a college of his or her choice because of too cautious behavior or that a high-ability examinee might fail to be selected as a National Merit scholar on the basis of such behavior. Both events could have serious effects on students’ lives.”

 


Angoff, W. H. Philosophical Issues of Current Interest to Measurement Theorists. Research Report 87-33, Princeton, NJ: Educational Testing Service, August 1987.


ETS researcher analyzes Differential Item Functioning (DIF) techniques for detecting item bias:

“...when we study item bias, the measure we use to match the two groups for ability is... unsatisfactory....

The matching variable is typically a test score, most often the score on the test that comprises the items under study. Obviously, then, the approach may be regarded as a circular one.”

 


Association of Governing Boards. Multiple Choices for Standardized Tests. Priorities, Number 10, Winter 1998.


Intended to help higher education trustees and chief executives sort through the complicated problem of how to enhance equity and excellence, this special issue of the journal Priorities tackles the uses and limitations of standardized tests in college admissions. Featuring a lengthy essay by alternative assessment expert William Sedlacek, a history of the SAT and ACT, and brief pieces on coaching, affirmative action, and the SAT’s predictive ability, the 16-page report examines the value of the SAT and ACT in undergraduate admissions. It offers key questions for college leaders to pose about the role of test scores, and explains techniques to measure noncognitive characteristics of applicants.

 


Baron, J., & Norman, M. F. SATs, Achievement Tests, and High-School Class Rank as Predictors of College Performance. Educational and Psychological Measurement, V.52, 1992.


Examines SAT, Achievement Tests (ACH) and high school class rank (CLR), individually and in combination, as predictors of college GPA at a selective university. High school rank was the best predictor when used alone. Concludes “SAT is redundant when good measures of past performance are available, such as ACH and especially CLR (which is useful in predicting other outcomes aside from grades).”

 


Becker, B. J. Coaching for the Scholastic Aptitude Test: Further Synthesis and Appraisal. Review of Educational Research, V.60, N.3, Fall 1990a.


Examines primary research and literature reviews on the effects of coaching on SAT performance. Studies generally show that coaching does result in raised scores. ETS research is considered dubious: “the ETS stance is based on different evidence than is available from the full collection of coaching studies....Published studies sponsored by ETS show much smaller coaching effects even when other design features are held constant.”

 


Becker, B. J. Item Characteristics and Gender Differences on the SAT-M for Mathematically Able Youths. American Educational Research Journal, V.27, N.1, Spring 1990b.


SAT-Math results were examined for gender differences by item content, item format, and section and serial placement. Overall, the test favored boys. Multiple-choice and algebra items made up the bulk of the exam, and were easier for boys. The items with content (miscellaneous) and format (data sufficiency) favoring girls were those least represented on the exam.

 

Bolger, N., & Kellaghan, T. Method of Measurement and Gender Differences in Scholastic Achievement. Journal of Educational Measurement, V.27, N.2, Summer 1990.

Compares the performance of 15-year-old boys and girls on standardized multiple-choice versus free-response examinations (based on the same syllabus). Finds an advantage for males on multiple-choice tests. Offers explanations based on other research, such as that males tend to guess more on multiple-choice exams, an advantageous strategy relative to omission, which is practiced more by girls. Asserts that the findings “raise issues for educational policymakers regarding the choice of method of measurement in examinations.”


 

Boswell, K., Giakoumis, T., Maynard M., Murray, P., & Kimble, T. Consequences of SAT Scores on Male and Female Self-Perceptions and College Application Decisions. Paper presented at the annual meeting of the American Educational Research Association, Boston, April 1990.


Examines the impact of student self-perceptions about SAT scores for both genders. Students disagreed strongly with the statement that SAT scores are a good predictor of ability to succeed in college. Women were more likely than men to report lower confidence because of their test scores, and were more likely to believe the test was biased. Almost one-third of the students were discouraged from applying to a particular college because of their test scores. “If the test is of minimal value to institutions, and the main effect is to discourage some otherwise qualified students from applying, then...’serious doubts may be raised about the need for widespread testing.’”


 

Bowen, William and Derek Bok. The Shape of the River: Long-Term Consequences of Considering Race in College and University Admissions. Princeton, NJ: Princeton University Press, 1998.


Analyzing the progress of three generations of alumni from 28 selective colleges and universities, Bowen and Bok argue that affirmative action has played a vital role in higher education. Data they present confirms the SAT’s weak predictive ability for African American students. The “fit” hypothesis (that SAT scores help match applicants’ skills with college selectivity) proved to be unfounded, as Black students with lower test scores graduated at higher rates the more selective the school they attended. SAT results correlated even more poorly with college class rank for Black students than scores did for White students. Test scores also do a poorer job of predicting graduate degree attainment for Black students than they do for Whites. In later life, SAT scores offered no bearing on one’s level of civic participation.

 


Bradley, D. R., Hiss, W., Bruce, M., Datta, M., Kinsman, S., Provasnik, S., & Smedley, J. The Optional SAT Policy at Bates: A Final Report. Lewiston, ME: Bates College, Committee on Admissions and Financial Aid, February, 1990.


A report from Bates College, which made SAT score submittal optional in 1985, concludes that “the optional SAT policy has had no negative, and quite possibly a positive, impact on the quality of students admitted.” Student quality has increased since Bates adopted the policy; academic performance of SAT submitters and nonsubmitters is nearly the same; nonsubmitters’ GPAs are higher than their SAT’s would predict; “none of the standardized tests [currently in use at Bates] predict students’ performance very well.” The authors conclude, “there is much in the data that would call into question the policy of requiring any standardized test scores, given how poorly they predict academic performance at Bates.”


 

Bridgeman, B., Hale, G. A., Lewis, C., Pollack, J., & Wang, M. Placement Validity of a Prototype SAT with an Essay. Research Report 92-28, Princeton, NJ: Educational Testing Service, May 1992.


Examining changes in the SAT based on 1989 proposals, ETS researchers find that the use of an essay would have reduced the underprediction of women’s grades. Using only multiple-choice questions, both the new and old version were found to underpredict women’s grades by .25 grade points. The current revision of the SAT I exam does not contain an essay.

 

Bridgeman, B., & Lewis, C. The Relationship of Essay and Multiple-Choice Scores With Grades in College Courses. Journal of Educational Measurement, V.31, N.1, Spring, 1994.

Essay and multiple-choice sections of Advanced Placement (AP) exams in American History, European History, English Language and Composition, and Biology are compared to college grades to determine predictive values. Essays are found to be more reliable predictors. Multiple- choice sections substantially “underestimate the ability of women to perform in college history courses....Results suggest that more weight might be placed on the history essays to achieve more nearly gender-fair selections for granting college credit or advanced placement.”

 


Bridgeman, B., & Wendler, C. Prediction of Grades in College Mathematics Courses as a Component of the Placement Validity of SAT-Mathematics Scores. College Board Report 89-9, New York: College Entrance Examination Board, 1989.


ETS researchers examine the use of the SAT-M as a placement test. Determine that SAT-M is generally a poor predictor when compared to a college’s own placement test or high school GPA.

Also find that SAT-M substantially underpredicts female grades and overpredicts male grades, much more so than placement tests.

 


Bridgeman, B., & Wendler, C. Gender Differences in Predictors of College Mathematics Performance and in College Mathematics Course Grades. Journal of Educational Psychology, V.83, N.2, 1991.


Within college math courses, women typically had equal or better grades than their male classmates with higher SAT-M scores (high school preparation tended to be comparable). To account for the discrepancies between SAT-M and college performance, authors suggest “there may be features of the content of the test itself or of its administration that enlarge the difference between males and females.” Also, “If a single equation were used to predict the course grades of men and women from scores on the SAT-M...there would be underprediction of the grades of female students....Gender differences are much larger on SAT-M scores than they are on course grades...”

 


Burton, N.W. Trends in the Verbal Scores of Women Taking the SAT in Comparison to Trends in Other Voluntary Testing Programs. Paper presented at the annual meeting of the American Educational Research Association, April 1987.


ETS researcher finds that the lower scores of women relative to men on the verbal sections of the SAT and the PSAT contrast with their superior performances on other verbal/English tests, such as the ACT English section, the Test of Standard Written English (TSWE) and the English Composition Test (ECT). The fact that more females than males take the SAT and PSAT only explains part of the gap.

 


Carlton, S. T., & Harris, A. M. Characteristics Associated with Differential Item Functioning on the Scholastic Aptitude Test: Gender and Majority/Minority Group Comparisons. Research Report 92-64, Princeton, NJ: Educational Testing Service, 1992.


The performance of matched SAT takers is compared across three main categories and many sub-categories: Item Content (e.g., Technical/Non-Technical), Points Tested (e.g., Semantic Relationship Analogy), and Format (e.g., Math word problem). The following trends, among others, are clear:

Whites perform better than all others on Analogies -- which remain on the “new” SAT -- while Blacks outperform Whites on Antonyms -- which were eliminated from the test in 1995. Whites and males do better on questions with science content; Blacks, Hispanics and females benefit from humanities and human relations content and from references to people. Concludes that score differences between women and men and among ethnic groups are caused by factors other than the constructs purportedly being measured, including item format, number of references to people, and use of confusing English words.

 


Clark, M. J., & Grandy, J. Sex Differences in the Academic Performance of Scholastic Aptitude Test Takers. College Board Report 84-8, New York: College Entrance Examination Board, 1984.

Finds that the SAT overpredicts for males and describes underprediction for females as “consistent and pervasive.” The gender gap is shown not to result solely from boys taking more higher-level math courses. Also, the decline in girls’ verbal scores is partly explained by changes ETS made in the test in the late 1960s and early 1970s to make the test more gender-neutral. “From the two samples we studied, there appears to be a consistency between SES [socio-economic status], percent in an academic curriculum, coursework, and grades. It is the SAT scores that appear to be out of line.”


 

Coffin, G. C. Computer as a Tool in SAT Preparation. Paper presented at the Florida Instructional Computing Conference, Orlando, FL, February, 1987.


Computer software was used in a test preparation program with low-income urban students. Scores went up an average of 91 points combined (51 points Verbal and 40 points Math). “By teaching test-wiseness to those who score low on the socio-economic ladder, we attack the equity issue.”


 

Cole, B. P. College Admissions & Coaching. In A. G. Hillard III (Ed.), Testing African American Students: Special Re-Issue of The Negro Educational Review. Morristown, NJ: Aaron Press, 1991.


Reviews research on use of the SAT in admissions, its predictive validity, and its susceptibility to coaching. “If the SAT truly measured aptitude, then one could not coach so effectively.” Underscores disadvantages to Black and other minority students, asserting, “in spite of the fact that serious questions have been raised against the admissions exams regarding their validity, the tests continue to be used to the detriment of minorities and the poor.”

 


College Entrance Examination Board. Guidelines on the Uses of College Board Test Scores and Related Data. New York: Author, March 1988.


Lists test uses “to be avoided,” including “using test scores as the sole basis for important decisions affecting the lives of individuals, when other information of equal or greater relevance and the resources for using such information are available.” Furthermore, “using minimum test scores without proper validation on the basis of students’ performance within the institution, and, if appropriate, by specific programs or by student subgroups,” is to be avoided, as is “making decisions about otherwise qualified students based only on small differences in test scores.” No enforcement mechanism is provided.

 


College Entrance Examination Board. Admissions Staff Handbook for the SAT Program 1999-2000. New York: Author, 1999.


Documents that high school grades predict first-year college grades better than the SAT. Also shows that the SAT has a margin of error of 60 points, and that two test-takers must differ by at least 120 points in combined SAT scores before it can be said that their abilities differ. Nevertheless, compares the SAT to a yardstick: “Because the subject matter of high school courses and high school grading standards vary widely, the tests have been established as a common standard against which student performance can be compared.”


 

College Entrance Examination Board. 2001 College-Bound Seniors: A Profile of SAT Program TestTakers. New York: Author, 2001.


Summarizes test scores and population characteristics of SAT I and SAT II test takers. Females achieve better overall high school grade point averages, take more English classes, and receive significantly higher grades than males, but score 7 points lower on the SAT-Verbal. Females also take more math and science coursework in high school and earn virtually comparable grades, yet males score 35 points higher on the SAT-Math section.

 

College Entrance Examination Board. 1997 National Ethnic/Sex Data. New York: Author, 1997. Note

that the CEEB no longer releases this publication to the public.


Reports SAT score averages by ethnicity and gender using data from the Student Descriptive Questionnaire. Significant Black/White, Latino/White, American Indian/White, and male/female score gaps exist on the Math section, among students who have taken the same level of math. Among students who have taken honors courses in English, large gaps exist between Whites and all minority groups and between male and female students.

 


Cone, A. E., & Rosenbaum, J. L. Predicting Academic Success Among Student-Athletes. The Academic Athletic Journal, Spring 1990.


Examines measures that may be useful for predicting graduation among student-athletes. The most significant variables are high school grade point average and number of college preparatory math classes taken. “It is especially important to note that the SAT score variable, which has been one of the most commonly used predictors of academic success and the most controversial aspect of Proposition 48, was washed out by other more significant variables.”

 

Connor, K., & Vargyas, E. J. The Legal Implications of Gender Bias in Standardized Testing. Berkeley Women's Law Journal, V.7, 1992.

Examines a wide range of laws, regulations, and legal precedents which may be applied to situations of gender bias on post-secondary and employment tests. Looks at the SAT as well as employment tests such as the General Aptitude Test Battery (GATB), the Armed Services Vocational Aptitude Battery (ASVAB), and the Differential Aptitude Test (DAT). Concludes that reliance on tests which discriminate on the basis of gender or the combination of gender and race are “untenable as a matter of law.”

 


Crouse, J., & Trusheim, D. The Case Against the SAT. Chicago: The University of Chicago Press,1988.


Uses a detailed statistical analysis to demonstrate that the SAT is redundant in the college admissions process and shows that it has an adverse impact on Black and low-income applicants. Concludes that the SAT neither helps colleges make better admissions decisions nor helps applicants select colleges.

 


Crouse, J., & Trusheim, D. How Colleges Can Correctly Determine Selection Benefits from the SAT. Harvard Educational Review, V.61, N.2, May 1991.


Examines the validity and usefulness of the SAT for colleges. Using information available to colleges from the College Board’s Validity Study Service, shows that the SAT is redundant when used with high school grades. Without the SAT, the vast majority of admissions decisions would be the same, as would predictions of outcomes such as average college grades and graduation rate.Questions the justification of “1.7 million students...tak[ing] the SAT each year when efficient use of the test changes only a small fraction of admissions decisions, and does not appreciably improve the decisions in the few cases where it does change them.”

 


Donlon, T. F. Content Factors in Sex Differences on Test Questions. Research Memorandum 73-28, Princeton, NJ: Educational Testing Service, 1973.


Demonstrates that “[t]he approximate 40 point difference between the sexes...[on the SAT-Math section] is, at least in part, a function of the test specifications.” Depending upon what types of items are used, the male advantage on the SAT-Math section could increase to about 60 points or could diminish to about 20 points. Items on the SAT-Verbal section drawn from the world of practical affairs or science are easier for males, while items associated with human relations, humanities or aesthetics are easier for females. Regarding race and gender, concludes that: “Long-standing and stereotyped expectations of subgroup performance may be less permanent than is believed.”

 

Dorans, N. J., Schmitt, A. P., & Curley,W. E. Differential Speededness: Some Items Have DIF Because of Where They Are, Not What They Are. Paper presented at the annual meeting of the National Council on Measurement in Education, April 1988.

ETS researchers find the test’s speededness to be a cause of differential performance by Black test-takers on SAT-Verbal analogy items. Concludes “[i]s the test too speeded? If so, test specifications may need to be reexamined.”

 


Dwyer, C. A. Test Content and Sex Differences in Reading. The Reading Teacher, May 1976.


ETS researcher reviews the strong evidence that ETS can manipulate test content to change score averages for males and females and shows how they did so on the SAT-Verbal section in the early 1970s, under the assumption that male and female scores should “balance.” Dwyer concludes, “[SAT-Math score balancing] could be done, but it has not been, and I believe that probably an unconscious form of sexism underlies this pattern: when girls show the superior performance, ‘balancing’ is required; when boys show the superior performance, no adjustments are necessary.”

 

Everson, H. T., Shapiro, L., & Millsap, R. E. The Effects of Test Anxiety, Item Difficulty and Order on Performance on a Mathematics Achievement Test. Paper presented at the annual meeting of the American Educational Research Association, March 1989.

Reviews research that shows anxious test-takers “do not perform as well as their less anxious counterparts.” Finds that anxious test-takers performed most poorly on the more difficult items.

 


Federal Trade Commission. Staff Report on the Federal Trade Commission Investigation of Coaching for Standardized Admission Tests. Boston Regional Office, April 1981.


Concludes that ETS and College Board materials for students did not accurately describe the real possibility of meaningful score gains from coaching. Finds an approximately 50 point overall increase in SAT scores for students who were coached, and evidence of a socio-economic differential between those who were coached and those who were not. Raises questions about the implications of coaching effectiveness on the SAT, which supposedly measures abilities developed over years of learning, and about the fairness of such a test when coaching is only available to some students.

 


Freedle, R., & Kostin, I. Semantic and Structural Factors Affecting the Performance of Matched Black and White Examinees on Analogy Items from the Scholastic Aptitude Test. Research Report 91-28, Princeton, NJ: Educational Testing Service, 1991.


When matched by overall verbal SAT scores, Whites are found to do better on easy items, which often occur early in the test, while Blacks do better on harder, later items. A “practice effect” hypothesis is offered to explain this finding, backed by citations of other research showing that Black students have less experience with the SAT than do Whites prior to the test.

 


Geiser, S. & Studley, R. UC and the SAT: Predictive Validity and Differential Impact of the SAT I and SAT II at the University of California. Oakland, CA: University of California, Office of the President,2001.


The four-year study of more than 80,000 students enrolled in the University of California system looked at the relationship between high school GPA (HSGPA), SAT I scores, SAT II scores, and first-year undergraduate grades (FGPA). The weakest predictor of college performance proved to be SAT I scores, which explained just 12.8% of the difference (or variation) in freshman grades. SAT II scores and HSGPA each separately explained approximately 15% of the variance: each of these factors did a better job of forecasting college performance than the SAT I did. Combining HSGPA and SAT II scores yielded a measure that explained 21% of the difference in freshman grades.

Adding the SAT I to this equation improved the predictive ability by less than 1%, demonstrating that the SAT I adds little useful information to the assessment of a student’s application. In addition, researchers discovered that the predictive power of the SAT I was further compromised when socioeconomic status was taken into account, with SAT I scores more closely associated with family income and parents’ education than SAT II scores or high school GPA.

 


Gross, S. Participation and Performance of Women and Minorities in Mathematics. Department of Educational Accountability, Montgomery County (Maryland) Public Schools, July 1988.


In a sample of SAT examinees matched by math classes taken, males outscored females by 33 to 52 points on the SAT-Math section, despite the fact that girls got higher grades in all math classes. Similarly, among Black and White students who took the same level of math classes, Whites outscored Blacks by 49 to 74 points.

 


Hackett, R. K., Holland, P., Pearlman, M., & Thayer, D. Test Construction Manipulating Score Difference Between Black and White Examinees: Properties of the Resulting Tests. Princeton, NJ: Educational Testing Service, February 1987.


Using a “Golden Rule”-like procedure (designed to make the test fairer, not easier, by eliminating items which exaggerate differences between target groups) finds that tests can be constructed which substantially reduce the score gap between Black and White test-takers and yet maintain test content specifications.

 


Hembree, R. Correlates, Causes, Effects, and Treatment of Test Anxiety. Review of Educational Research, V.58, N.1, Spring 1988.


This meta-analysis of 562 studies shows that test anxiety causes poor performance and that students with high test anxiety hold themselves in lower esteem than do those who are less test anxious. Asserts that the aptitudes of individual test-anxious students are consistently mis-interpreted and undervalued. Females have higher test anxiety than males, Blacks in elementary school have higher test anxiety than Whites, and Hispanics have higher test anxiety than Whites at all ages.

 


Hiss, W. D. Optional SAT’s: Six Years Later. Bates: The Alumni Magazine, September, 1990.


Four issues that prompted Bates College to adopt an SAT Optional admission policy are described and policy effects are discussed: (1) Research showed “the value of testing was reasonably modest either alone or in tandem with other credentials.” (2) Bates officials suspected that SAT’s could be “unhelpful, misleading, and unpredictive for...minority students, rural and Maine students, first-generation college students, and foreign-born or bilingual students.” (3) “The SAT’s were producing in hundreds of thousands of teenagers a kind of mass hysteria,” including “frantic coaching;” and “If [coaching] does work, it is simply one more advantage in the process for the rich and means the tests are hardly standardized.” (4) The school wanted to publicly say, “Bates cares about intellectual integrity, hard work, and real achievement.”

 


Hoover, M. R., Politzer, R. L., & Taylor, O. Bias in Reading Tests for Black Language Speakers: A Sociolinguistic Perspective. In A. G. Hillard III (Ed.), Testing African American Students: Special Re-Issue of The Negro Educational Review. Morristown, NJ: Aaron Press, 1991.


Addresses biases in language and reading tests for speakers of Black English and members of other socioeconomic, cultural and ethnic groups. Biases include use of an elaborate, stylized “superstandard” English, rather than simple, basic language in questions and instructions, and “lexical” bias which disadvantages students unfamiliar with certain words because of class, geographical or interpretation differences from the norm expected by the test. Both types of bias cause a student’s abilities to be measured inaccurately.

 


Horner, B., & Sammons, J. Rolling Loaded Dice: Use of the Scholastic Aptitude Test (SAT) for Higher Education Admissions in New York State. New York: New York Public Interest Research Group, Inc., 1989.


Finds that the SAT is a nearly useless tool in the college admissions process in New York state by examining validity studies filed under the state’s Truth-in-Testing law. For three out of four New York colleges, SAT scores failed to yield a prediction of college performance that is greater than 12% better than chance. At half the schools, the SAT’s predictive power is no better than 4% above chance. High school performance (grades and/or class rank) is three times more helpful in predicting college performance than the SAT.


 

Johnson, S. T. Preparing Black Students For The SAT. Does It Make a Difference? Evaluation Report of the NAACP Test Preparation Project, June, 1984.


SAT scores of low-income minority students who received coaching improved from pre-test, on post-test and actual SAT. Across three groups, pre-to-post-test improvement averaged 33 Verbal and 24 Math points; pre-test-to-SAT improvement averaged 25 Verbal and 24 Math. One group improved 87 points (combined Verbal and Math). “The results clearly support the effectiveness of a coaching program for improving the SAT-Prep performance of those Black youth who have earned relatively low or mid-range scores on a pre-test.”

 


Johnson, S. T., & Wallace, M. B. Characteristics of SAT Quantitative Items Showing Improvement After Coaching Among Black Students From Low-Income Families: An Exploratory Study. Journal of Educational Measurement, V.26, N.2, Summer, 1989.


Found that certain math items seem “moderately responsive to test coaching efforts among a sample of minority youth with less than strong mathematics backgrounds....The findings suggest that performance on a broad range of item types of quantitative SAT items can be substantially improved, even with a rather modest coaching intervention.” Says coaching programs for low-income minority and other at-risk populations should be extended, even though “test developers maintain the view that aptitude tests measure capacities that are...not likely to be changed meaningfully by short-term coaching.”

 


Kanarek, E. A. Gender Differences in Freshman Performance and Their Relationship to Use of the SAT in Admissions. Paper presented at the Northeast Association for Institutional Research Forum, Providence, RI, October 1988.


Studies Rutgers’ 1985 freshman class of nearly 2000 students. Finds that despite lower SAT-Math scores, women have higher GPAs in math and science classes. Women receive substantially higher grades in humanities classes, despite lower SAT-Verbal scores. Females earn higher grades than males overall, and “the current equations for predicting freshman performance underpredict for women.” Rutgers’ most prestigious scholarship program has relied heavily on the SAT, resulting in many more male than female winners, despite the fact that the pool of eligible recipients is predominantly female.

 


Katz, S., Blackburn, A. B., & Lautenschlager, G. J. Answering Reading Comprehension Items Without Passages on the SATWhen Items are Quasi-Randomized. Educational and Psychological Measurement, V.51, N.3, 1991.


To explore whether “cognates” (knowledge gleaned from overlapping information across questions) help test-takers answer reading comprehension (RC) items, students were given randomized RC questions without passages. Performance was not significantly lower than that of students given passageless items in correct order (i.e., those who could use cognates), but it was above chance for both groups. Suggests that this may be related, if not to cognate use, to “interpersonal or cultural knowledge shared by item writers and the testing population,” or to test-taking skills. Concludes that “successful performance on the RC task without passages remains a serious problem for the SAT.”

 


Katz, S., & Lautenschlager, G. J. Answering Reading Comprehension Items Without Passages on the SAT-I (New SAT), the ACT, and the GRE. Educational Assessment, V.2, N.4, 1994.


College students were given reading comprehension (RC) tasks from the SAT and SAT-I; controls were given passages plus questions (P), experimentals were given questions only, no passage (NP).

With five answers to choose from, the chance of correctly answering an RC item without the passage should be 20% if the passage is necessary. Thirty- six percent of the NP SAT and 43% of the NP SAT-I items were answered correctly, far exceeding chance. Concludes that SAT RC items are invalid, and questions what is distinguishing students on SAT RC sections.

 


Kessel, C. & Linn, M. Grades or Scores: Predicting Future College Mathematics Performance. Educational Measurement: Issues and Practice, Vol. 15, N. 4, Winter 1996.


Summarizing more than a dozen studies of large student groups and specific institutions such as MIT, Rutgers and Princeton, Kessel and Linn concluded that young women typically earn the same or higher grades as their male counterparts in math and other college courses despite having SAT-Math scores 30-50 points lower. Among the causes documented in their literature review: the SAT's emphasis on speed; the multiple-choice format of the questions; the simplistic content; differences between males and females in self-confidence related to test taking; and stereotype vulnerability (diminished performance when gender differences were expected).

 


Lang, E. L., & Rossi, R. J. Understanding Academic Performance: 1987-1988 National Study of Intercollegiate Athletes. Paper presented at the annual meeting of the American Educational Research Association, Chicago, April 1991.


Examines eight predictor variables for collegiate academic performance, including “academic preparedness” based on SAT score and high school GPA. Though high SAT scores related to good performance, there was no correlation between SATs and poor performance. “In contrast, high school GPA is related (in the expected way) to both low and high academic performance at college.

Because SAT scores currently play a powerful role in college admissions and athletic eligibility, the effectiveness of these scores for their intended purpose may need to be examined more closely.”

 


Lavin, D. & Hyllegard, D. Changing the Odds: Open Admissions and the Life Chances of the Disadvantaged. New Haven, CT: Yale University Press, 1997.


From 1970 through 1976, the seventeen-campus, two hundred thousand student CUNY system guaranteed admission to all high school graduates with an 80 average or above, without consideration of test scores. During that period the size of the CUNY freshman class grew by 75%, and the number of Black and Hispanic enrollees quadrupled at the system’s four-year colleges. Other indicators of the success of the policy: college graduation rates in low-income and minority communities soared; the students who participated in the program earned much more than those who did not; and many of the negative impacts of cumulative disadvantage were overcome. The authors conclude that all these benefits came without sacrificing academic standards.

 


Leonard, D. & Jiang, J. Gender Bias and the College Predictions of the SATs: A Cry of Despair. Research in Higher Education, Vol. 40, N. 3, June 1999.

Analyzing the impact of test score use on female admissions and enrollment at the elite University of California at Berkeley, researchers found reliance on gender-biased test scores in the late 1980s cost between 200 and 300 otherwise qualified females admission to the Berkeley campus each year. The paper projects that SAT under-prediction "arguably leads to the exclusion of 12,000 women from large, competitive, flagship' state universities" annually and shows that "significant under-prediction of women's college grades remains after one has taken out the effects of choice of program of study and that it exists across the range of scores in which highly competitive colleges and universities actually make their cut-off decisions." Leonard and Jiang devote the final sections of their paper to a stern criticism of the Educational Testing Service and College Board for down-playing the existence and implications of gender bias.


 


Lemann, Nicholas. The Big Test: The Secret History of the American Meritocracy. New York:Farrar, Straus, and Giroux, 1999.


The most comprehensive history of the SAT to date, The Big Test characterizes development of the SAT as a well-motivated intention that produced unexpected negative consequences. Designed to open the doors of higher education to those not white, male, and upper class, the test instead created a new “elite.” The middle third of the book shows how a multi-year campaign by ETS and the College Board to have California require applicants to take the SAT led to nationwide expansion of its use.

The final third offers a behind-the-scenes look at the passage of Proposition 209 in California, the ballot question that banned “racial preferences,” and the assumptions about merit underlying the affirmative action and SAT debates.

 


Linn, M. C., De Benedictis, T., Delucchi, K., Harris, A., & Stage, E. Gender Differences in National Assessment of Educational Progress Science Items: What Does “Don’t Know” Really Mean?Journal of Research in Science Teaching, V.24, N.3, 1987.


Shows that girls are more reluctant to guess on multiple-choice questions than are boys, but that uncertainty is not consistently related to performance. Boys overestimate their likelihood of success and so take risks unknowingly, for which they are rewarded in the multiple-choice format.

 


Linn, M. C., & Hyde, J. S. Gender, Mathematics, and Science. Educational Researcher, V.18, N.8, November 1989.


The SAT-Math section is shown to exhibit much larger differences between men and women than any similar test: “Given the declines in gender differences on most national assessments, the large, consistent gender differences found for the voluntary SAT-Math sample are anomalous.” Finds that for tests of mathematical ability “it is possible to eliminate or exaggerate gender differences by selecting test questions with contexts favoring males or females.” Females are found to do better on math problems concerning aesthetics, interpersonal relationships and traditionally female tasks, such as sewing, while males do better at questions involving measurement, sports and science.

 


Mazzeo, J., Schmitt, A. P., & Bleistein, C. A. Sex-Related Performance Differences on Constructed-Response and Multiple-Choice Sections of Advanced Placement Examinations. College Board Report 92-7, ETS Research Report 93-5, New York: College Entrance Examination Board, 1993.


Male-female performance differences on three Advanced Placement (AP) tests were analyzed. Males consistently did better on multiple-choice sections while females did better on constructed response sections (essays and word problems), even when gender-bias potential was controlled. The findings have “important implications for high-stakes standardized testing....Currently, a large amount of standardized testing occurs in a multiple-choice format...equity concerns would dictate a mix of the two types of assessment instruments.”


 

McIntosh Commission. Report of the McIntosh Commission on Fair Play in Student-Athlete Admissions. Cambridge, MA: National Center for Fair & Open Testing, 1994.


The study determined that the eligibility rule for the National Collegiate Athletic Association (NCAA), Proposition 48, with its cut-off scores of SAT 700 (pre-recentering, now 820) or ACT 17, eliminated 45% of African American students who would have graduated if they had enrolled. That compared with 6% of otherwise qualified White students who were ruled ineligible. The report argued against implementation of Proposition 16, whose higher test score cut-offs would exacerbate the problem of African American eligibility. The Commission cautioned against using test scores to determine eligibility and scholarship receipt since no test has been validated for this purpose. They also urged the NCAA to allow individuals institutions to determine their own academic standards.

 


Miller, D. L., Mitchell, C. E., & Van Ausdall, M. Evaluating Achievement in Mathematics:Exploring the Gender Biases of Timed Testing. Education, V.114, N.3, Spring, 1994.


Two SAT-type math exams were given to male and female high school students, one timed and one untimed. Females raised their scores significantly in the untimed condition, while males did not.

Authors relate results to previous research on the different problem solving strategies of males and females, such as boys’ tendency to guess and use unfamiliar strategies, and girls’ inclination to be more cautious and deliberate than boys. The time factor is said to have “a negative, prejudicial impact on females’ ability to communicate their knowledge of mathematics...”

 


Morgan, R. Predictive Validity within Categorizations of College Students: 1978, 1981, 1985. Research Report 90-14, Princeton, NJ: Educational Testing Service, 1990.


Acknowledges a consistent decline of the SAT’s predictive validity and analyzes the trend across various subgroups. Predictive validity has decreased for Blacks, Whites, Asian-Americans, males and females. The decrease is more pronounced -- and the test least predictive -- for the lower two-thirds of freshman classes than for the top third.

 


Nairn, A., & Nader, R. The Reign of ETS: The Corporation that Makes Up Minds. Washington,DC:Center for the Study of Responsive Law, 1980.


The first full-scale report to expose the Educational Testing Service and its practices. Includes organizational history, test development information, and examples of testing abuses, documented with over 1000 footnotes. Also describes and reprints New York state's landmark Truth-in-Testing law which requires disclosure of test questions and answers along with test validity studies.

 


National Association for College Admissions Counseling. Statement of Principles of Good Practice. Alexandria, VA: Author, 1999.


These guidelines for members of the National Association for College Admission Counseling include a series of statements regarding standardized college admissions tests. Stating that, “test results can never be a precise measurement of human potential,” the Principles urge colleges and universities to use test scores in a valid and appropriate manner. The Principles warn against “using minimum scores as the sole criterion for admissions, thereby denying certain students because of small differences in scores,” and call for test scores to be used only in conjunction with other data such as high school grades and recommendations.

 


National Research Council. (1999). Myths and Tradeoffs: The Role of Tests in Undergraduate Admissions. Washington, D.C.: National Academy Press.


The history and validity of the SAT and ACT are detailed, along with a description of current trends in undergraduate admissions and an analysis of the benefits and detriments of considering test scores.The National Academy of Sciences panel concludes that an institution’s central goals should be the driving force behind admissions policies and practices. Institutions should look with a critical eye at how test scores are used, and ensure that they are not used to make fine distinctions between applicants. Finally, test producers have an obligation to communicate to institutions the appropriate uses of test scores and the limitations tests have in presenting students’ talents and abilities.

 


Orfield, Gary and Edward Miller, Eds. (1998). Chilling Admissions: The Affirmative Action Crisis and the Search for Alternatives. Cambridge, MA: Harvard Education Publishing Group.


In the wake of Proposition 209 in California and the Hopwood decision in Texas, both of which banned the use of affirmative action, institutions of higher education must struggle with ways to maintain campus diversity. The authors in this collection warn that minority enrollment will drop precipitously without a radical refiguring of admissions policies. One chapter centers on the experiences of the University of California, Irvine, where admissions officials devised a new system of evaluating students which de-emphasizes standardized tests and instead takes into consideration both academic and personal criteria. Another chapter on race and testing warns of the dangers in relying on test scores given longstanding gaps in achievement between White and underrepresented minorities.

 


Owen, David with Marilyn Doerr. None of the Above: The Truth Behind the SATs. NewYork: Rowman & Littlefield Publishers, Inc., 1999.


The new version of this classic book dissecting the shortfalls of the SAT maintains David Owen’s engaging narrative style while updating many statistics and issues. The in-depth (and entertaining) examination of the SAT and the Educational Testing Service (ETS) questions test construction, development, and security procedures. It effectively de-mystifies technical and statistical concepts used by ETS to justify its tests, and includes a chapter on the effectiveness of test preparation and ETS’ attempts to both deny and profit from coaching.

 


Pearson, B. Predictive Validity of the Scholastic Aptitude Test (SAT) for Hispanic Bilingual Students. Hispanic Journal of Behavioral Sciences, V.15, N.3, August 1993.


Compares SAT scores and college grades of Hispanic versus non-Hispanic students at a large university. “The clearest result was a significantly lower mean SAT score (both Verbal and Math) for the Hispanic group, despite equivalent college grades...the lower SAT score for this population is not associated with poor academics.” Suggests that bilingualism, in combination with the time constraint of the SAT, may contribute to lower scores. Concludes that “relatively little information is provided by the SAT score, compared to the high school record and the autobiographical material included in most applications,” that the “new” SAT will not alleviate this problem, and that “educators must readjust their thinking on this issue now.”

 


Reynolds, A. J., Oberman, G. L., & Perlman, C. An Analysis of a PSAT Coaching Program for Urban Gifted Students. Journal of Educational Research, V.81, N.3, January/February, 1988.


Participation in a summer test preparation program resulted in mean PSAT-Math score improvement of 4.7 points (47 SAT points). This finding is in line with previous coaching literature, as “research has consistently shown that statistically significant gains resulting from coaching can be expected from typical test preparation programs.” Authors stress that test preparation programs distract students from valuable cognitive engagement: “The major purpose of coaching programs, especially on ETS tests, is to learn not substantive content training but the test’s structure, how it is designed, testwiseness strategies, and effective response patterns....SAT-type tests measure a very limited set of abilities, and, if overemphasized, downplay the development of essential competency skills needed in society.”

 


Rogers, J., Dorans, N. J., & Schmitt, A. P. Assessing Unexpected Differential Item Performance of Black Candidates on SAT form 3GSA08 and TSWE form E43. Unpublished ETS statistical report SR-86-22, January 1986.


For Black test-takers, finds the analogy item type on the SAT-Verbal section to be more difficult than other item types. Seven of 11 items with large differentials between Blacks and Whites favored Whites, including an item whose key was “physician:patient” and whose correct answer was “accountant:client.” Of the 4 items that favored Blacks, 3 were reading comprehension items on a passage describing the achievements of a Black mathematician.

 


Rooney, Charles with Bob Schaeffer. Test Scores Do Not Equal Merit. Cambridge, MA: National Center for Fair & Open Testing, 1999.


Summarizes the experiences of several hundred colleges and universities that do not use the SAT or ACT to make admissions decisions about some part or all of their incoming freshmen classes. Such institutions have seen an increase in student diversity, a continuation of academic quality, educational benefits for high school students, and positive feedback from alumae/i, guidance counselors, and parents. Five in-depth case studies analyze the experiences of Bates, Muhlenberg, Franklin & Marshall Colleges, the Texas Public University System, and the California State University System. It concludes with a general “how to” guide for institutions considering a change in test score requirements.

 


Rosser, P. The SAT Gender Gap. Washington, DC: Center for Women Policy Studies, 1989.


Analyzes data from two studies, one using College Board item data for 100,000 test-takers from a 1987 SAT administration and another of 1000 students in a Princeton Review test preparation course. Finds that the vast majority of SAT questions which exhibit large gender differences in correct answer rates are biased in favor of boys, despite girls’ superior academic work. The timed nature of the test is one factor in girls’ lower scores. The gender gap widens as high school grades increase.Both males and females estimate their abilities in English and math to be closer to their SAT scores than their GPA, so girls judge themselves less able than their grades would indicate, and less able than boys.

 


Rosser, P. Sex-Bias in College Admissions Tests: Why Women Lose Out (4th ed.). Cambridge, MA: National Center for Fair & Open Testing, 1992.


Shows how bias in the SAT underpredicts girls’ abilities and thus inhibits them from entering the schools of their choice, winning National Merit Scholarships, and entering gifted and talented programs. Demonstrates content bias in SAT items: an analysis of 24 reading comprehension passages reveals that 34 famous men and eight other men are mentioned. Only one famous woman, Margaret Mead, is mentioned and her work is criticized. Two other women are mentioned.


 

Sacks, Peter. Standardized Minds: The High Price of America’s Testing Culture andWhat We Can Do To Change It. Cambridge, MA: Perseus Books, 1999.


This highly readable examination of the nation’s testing fixation critiques the increasing presence of standardized exams in K-12 classrooms, university admissions, and the workplace. The two chapters devoted to university admissions debunk the myths surrounding the usefulness of the SAT, ACT, GRE, and the MCAT, and point to their harmful function in blocking underrepresented minorities from higher education. Drawing from the experiences of Bates College, the public university systems in Texas and California, and Ben-Gurion University in Israel, admissions alternatives that deemphasize or eliminate test scores highlight the growing trend towards “quitting the SAT.” Sacks concludes: “In short, a new paradigm for choosing who has and has not merit for going to college is likely to enhance the overall educational quality of academic institutions and do so with no sacrifice in academic performance of students.”

 


Schaffner, P. E. Competitive Admission Practices When the SAT is Optional. Journal of Higher Education, V.56, N.1, January/February 1985.


Describes changes in Bowdoin College’s admission policy that eliminated its SAT requirement and provides statistical analyses of their effects. States that Bowdoin’s initial decision was based on four arguments: (1) standardized tests reflect socioeconomic status and therefore “stand as barriers to college for qualified minority and disadvantaged applicants;” (2) “SAT scores had not been shown to be good predictors of academic performance in the college;” (3) “the public” was over- concerned about the importance of test scores; (4) admissions staff wished to “form detailed impressions of each applicant,” a “personalization” of their process. Test score submitters and withholders often showed comparable potential and performance. High school GPA, not SAT score, was most predictive of collegiate GPA (all four years). Bowdoin’s applicant pool expanded, and “virtually all students in both groups have proven capable of meeting the college’s educational requirements,” with a mean GPA gap of only 0.2 to 0.25 in favor of submitters.

 


Schmitt, A. P., & Bleistein, C. A. Factors Affecting Differential Item Functioning for Black Examinees on Scholastic Aptitude Test Analogy Items. Research Report 87-23, Princeton, NJ: Educational Testing Service, 1987.


Categorizes analogy item answers of matched Black and White SAT-takers. Proposes that differential reasoning strategies account for differing answers. Even within matched groups, Black examinees less often reached the end of the test. “This differential speededness might be related to less well developed test-taking skills.”

 


Schmitt, A. P., & Crone, C. R. Alternative Mathematical Aptitude Item Types: DIF Issues. Research Report 91-42, Princeton, NJ: Educational Testing Service, 1991a.

 


Proposed SAT-Math question types were tested for differential item functioning among male/ female and majority/minority populations. On Student Produced Response (SPR) items, White males outperformed all others, even when matched for total test score. Nonetheless, SPR items were added to the “new” SAT I in 1995.

 


Schmitt, A. P., & Dorans, N. J. Differential Item Functioning for Minority Examinees on the SAT.Journal of Educational Measurement, V.27, N.1, Spring, 1990.


ETS researchers find that when item content is of special interest to an ethnic group, members of that group score unexpectedly well on the item. Speededness is also a factor in Blacks’ and Hispanics’ lower scores on the SAT. Items containing homographs -- words spelled like other words but having different meaning or pronunciation -- disadvantage Blacks, Hispanics and Asian-Americans.


 

Schmitt, A. P., Dorans, N. J., Crone, C. R., & Maneckshana, B. T. Differential Speededness and Item Omit Patterns on the SAT. Research Report 91-50, Princeton, NJ: Educational Testing Service, 1991b.


Black and Hispanic students matched with Whites for SAT scores are found to be negatively affected by the test’s speededness; they do not reach the end as often as do matched Whites. On the mathematical sections, females may omit more difficult items at a higher rate than comparable male candidates.

 


Sedlacek,W. E., & Adams-Gaston, J. Predicting the Academic Success of Student-Athletes Using SAT and Noncognitive Variables. Paper presented at the annual meeting of the American Educational Research Association, Chicago, April, 1991.


Noncognitive variables are better predictors of grades for athletes than are SAT scores. It is suggested that athletes be considered nontraditional students rather than student-athletes. Standardized tests have lower correlation with freshman grades for non-White and nontraditional students than for White students. The authors write, “SAT scores should not be used in selecting or predicting the early success of student-athletes. Propositions 48 and 42 cannot be implemented fairly using SAT scores if these results are at all true at other institutions. The school studied would be doing a great disservice to its student-athletes if the SAT were used to deny the right of any student-athlete to compete in the first year.”

 


Shapiro, M. M., Slutsky, M. H., & Watt, R. F. Minimizing Unnecessary Racial Differences in Occupational Testing. Valparaiso University Law Review, V.23, N.2, Winter 1989.


Criticizes ETS’ “differential item functioning” (DIF) technique used, supposedly, to reduce bias on the SAT and other tests. DIF matches groups by total test score, then flags those questions on which specific groups, such as women and men, score differently. Such procedures “do not measure item bias per se, but only item bias relative to overall bias...[they] can only detect whether a particular item is significantly more biased or significantly less biased than the aggregate of all the test items as a whole.”

 


Sharif v. New York State Education Department, 88 Civ. No. 8435 (JW), U.S. District Court, Southern District of New York, February 1989.


Judge John H. Walker rules the use of the SAT alone in awarding New York state scholarships to be biased against women. Sole use of the SAT had resulted in 72% of the top scholarships going to males. Finds plaintiffs “established that the probability, absent discriminatory causes, that women would consistently score 60 points less on the SAT than men is nearly zero.” Even when the effects of ethnicity, parental education, courses taken and proposed college major are factored in, “under the most conservative studies presented in evidence...at least a 30 point combined differential remains unexplained....After a careful review of the evidence, this Court concludes that SAT scores capture a student’s academic achievement no more than a student’s yearbook photograph captures the full range of her experiences in high school.”

 


Silverman, L. Unnatural Selection: A Legal Analysis of the Impact of Standardized Test Use on Higher Education Resource Allocation. Loyola of Los Angeles Law Review, V.23, N.4, 1990.


Extensive legal analysis of issues surrounding the use of standardized tests for admission to higher education. Looks at difficulties of litigating claims of gender or racial bias and examines possible legislative options for reducing the impact of testing in admissions decisions.

 


Slack, W. V., & Porter, D. The Scholastic Aptitude Test: A Critical Appraisal. Harvard Educational Review, V.50, N.2, May 1980.


Challenges the fairness and usefulness of the SAT through a comprehensive review of literature on predictive validity and the effectiveness of coaching. Finds that test coaching is effective and that students “who believe the admonishments of the test designers and do not prepare for the test may be needlessly deprived of admission to the college of their choice.” Plus, “students who accept the SAT as a measure of aptitude may suffer a loss of self-esteem by interpreting low scores as an indication of their own deficiencies.” Concludes “the SAT adds little to the prediction of college performance over the high school record alone.”


 

Steele, C. & Aronson, J. Stereotype Threat and the Test Performance of Academically Successful African Americans. In C. Jencks & M. Phillips (Eds.), The Black-White Test Score Gap. Washington, D.C.: The Brookings Institute, 1998.


Coining the term “stereotype vulnerability,” researchers showed how students who are aware of racial and gender stereotypes about their group’s intellectual ability score lower on those standardized tests that purport to measure academic aptitude. In control groups where similar students were given no reason to suspect that the demeaning stereotypes would apply to their performance, African Americans performed as well as Whites on very challenging tests. The authors conclude that, “our research suggests that testing situations are unlikely to be neutral in relation to group identity.”


 

Stockwell, S., Schaeffer, R., & Lowenstein, J. The SAT Coaching Cover-Up: How Test Preparation Courses Can Raise Scores by 100 Points or More andWhy the College Board and ETS Deny the Evidence. Cambridge, MA: National Center for Fair & Open Testing, 1991.


Summarizes studies demonstrating that SAT coaching works, and that ETS misleads the public on this matter and has suppressed studies on coaching’s effectiveness. Concludes that good coaching courses raise scores by 100 points on the average. Discusses bias, fairness, usefulness, and SAT-optional admission policies.

 


Stricker, L. J., Rock, D. A., & Burton, N. W. Sex Differences in SAT Predictions of College Grades. College Board Report 91-2, New York: College Entrance Examination Board, 1991.


ETS researchers examine the over- and underprediction of grades for men and women at several colleges within a major state university. Find SATs consistently underpredict grades for women across all levels of grades, colleges and ethnicity. Underprediction is not eliminated when adjustments are made for differences in grading standards among different courses. It is eliminated when non-SAT variables are examined (i.e., high school GPA and percentage of required reading and assignments completed). Women “could still be disadvantaged if decisions about admissions, scholarships, and similar academic matters were made solely on the basis of grade predictions from the SAT or the SAT and [High School Rank].”

 


Taylor, O., & Lee, D. L. Standardized Tests and African Americans: Communication and Language Issues. In A. G. Hillard III (Ed.), Testing African American Students: Special Re-Issue of The Negro Educational Review. Morristown, NJ: Aaron Press, 1991.


Discusses sources of culturally-based communication and language bias in standardized tests, including cognitive style bias, linguistic bias and communicative style bias. Standardized tests tend to be based on the assumption that all test-takers “evidence ability by using a similar cognitive style.”

However, differences in cognitive style only reflect different ways of knowing and problem-solving, not differences in ability. Therefore, “standardized tests which fail to recognize differences in language style fail to accurately determine ability.” Concludes that “the data suggests...the very assumptions and paradigms upon which most standardized tests are based need to be revised.”

 

Ting, Siu-Man Raymond and Alfred Bryant. The Impact of Acculturation and Psychosocial Variables on Academic Performance of Native American and Caucasian Freshmen. The Journal of College Admission, Spring 2001.

When analyzing high school GPA, SAT scores, and NCQ (Noncognitive Questionnaire) scores for students in a freshman seminar at a southeastern public university, a general model to explain college performance was found to be less effective than one that takes ethnicity into consideration. For both White and Native American students, high school GPA proved to be the best predictor of freshmen college performance, with the noncognitive variables of successful leadership experience, community service, highest level of education expected, and self-pride also contributing to the prediction models.

SAT scores were either weak or not an indicator for both ethnic groups.


 

Tracey, T. J., & Sedlacek,W. E. Prediction of College Graduation Using Noncognitive Variables by Race. Paper presented at the annual meeting of the American Educational Research Association, April 1986.


Compares the usefulness of SAT scores versus a “Non-Cognitive Questionnaire” (NCQ) for predicting graduation rates of minority students. “SAT scores were not found to be related to graduation in any of the [race group] samples. The NCQ dimensions were found to be fairly predictive for both races, but especially for the Black samples.”

 


Vars, F. & Bowen, W. Scholastic Aptitude, Test Scores, Race, and Academic Performance in Selective Colleges and Universities. In C. Jencks & M. Phillips (Eds.), The Black-White Test Score Gap. Washington, D.C.: The Brookings Institute, 1998.


Reviewing data on more than 10,000 students at 11 selective public and private institutions of higher education, researchers found that the relationship between SAT scores and college grades is weaker for African American students than for Whites. A 100-point increase in SAT combined scores, holding race, gender, and field of study constant, led to a one-tenth of a grade point gain for college GPA. This offered about the same predictive value as looking at whether an applicant’s father had a graduate degree or her mother had completed college. Researchers concluded: “The relatively weak relationship between SAT scores and academic performance, especially for black students, underscores why admissions officers must be free to consider factors other than grades and SATs when choosing among candidates.”

 


Wainer, H. The Future of Item Analysis. Journal of Educational Measurement, V.26, N.2, Summer 1989.


ETS researcher describes how SAT test construction works: an item is performing “correctly” only if high scorers get the question right more often than low scorers. Therefore, if low scorers get what has been labeled a “difficult” item correct at too high a rate, the question is discarded because it is not “discriminating” correctly.

 


Wainer, H., Saka, T., & Donoghue, J. The Validity of the SAT at the University of Hawaii: A Riddle Wrapped in an Enigma. Educational Evaluation and Policy Analysis, V.15, N.1, Spring 1993.


Presents data on the decreasing validity of the SAT as a first year grade (FYG) predictor for Hawaiian students at the University of Hawaii. In 1982 the SAT-FYG correlation was already lower than the national average -- which itself is lower than the SAT-High School GPA correlation -- and by 1989 it had “diminished considerably.” Authors assert that the SAT’s lack of validity in this case be taken seriously, although they do not confirm hypotheses about causes.

 

Wainer, H., & Steinberg, L. S. Sex Differences in Performance on the Mathematics Section of the Scholastic Aptitude Test: A Bidirectional Validity Study. Harvard Educational Review, V.62, N.3, Fall 1992.

ETS researchers examine the SAT-M scores of nearly 47,000 first year college students, matched for college math grades and course level. Women scored 21 to 55 points lower than men with identical grades. Concludes “it is a capital mistake to use the SAT-M in isolation for decisions involving comparisons between the sexes” since the SAT-M alone does not accurately reflect women’s math abilities.

 


Walter, T. L., Smith, D. E. P., Miller, S. D., Hoey, G., & Wilhelm, R. Predicting the Academic Success of College Athletes. Research Quarterly for Exercise and Sport, V.58, N.2, 1987.


Re-examines 1974-1983 data on scholarship-funded college football players to determine hypothetical admission and graduation rates had the National Collegiate Athletic Association’s Proposition 48 been in place. Proposition 48 denies scholarships to student-athletes who do not meet a cut-off score on the SAT or ACT and have a high school GPA of 2.0 in core classes. Eighty-six percent of Black students who would have been denied admission on the basis of the SAT actually succeeded in their coursework. While high school GPA correctly predicted success in 84% of the cases, test scores accurately predicted success only 30% of the time. Finds SAT scores are unrelated to college GPA for Blacks, and only weakly related for non-Blacks.

 


Weiss, J., Beckwith, B., & Schaeffer, B. Standing Up to the SAT. New York: Simon & Schuster, 1989.


Practical guide which includes information about racial, gender, and other types of bias, test misuse, and admissions alternatives. Explains how to use the SAT's flaws to beat, avoid, and change the test.

Also includes an SAT Bill of Rights and proposals for college admissions test reform.

 


Wilder, G. Z., & Powell, K. Sex Differences in Test Performance: A Survey of the Literature. College Board Report 89-3, New York: College Entrance Examination Board, 1989.


A meta-analysis of studies concerning gender differences in test performance finds there is “[a] disproportion in the number of males at the upper score levels of the SAT-mathematical sections,” as there is for other tests of quantitative ability. Part of the score gap on college admissions tests is explained by the changing nature of the test population, but “the remaining difference does not seem to be explained solely by differences in the courses taken by males and females, as some critics have suggested.” Notes that “even the small differences reflected in [college admission test scores] can affect the educational opportunities offered men and women since the tests are used by colleges and universities and other institutions as important bases for decisions about admission, scholarships, and awards.”

 


Zuman, J. P. The Effectiveness of Special Preparation for the SAT: An Evaluation of a Commercial Coaching School. Paper presented at the annual meeting of the American Educational Research Association, April 1988.


Using research from a Harvard University doctoral dissertation, substantiates a 110 point average increase in SAT scores after test coaching by the Princeton Review program. A careful experimental design is followed to demonstrate that coaching is the sole cause of the large score increase.