Use of the Graduate Record Examinations (GRE) Verbal, Quantitative, and Physics tests in graduate admissions is a questionable practice if minimum acceptable scores are required. At the same time, research shows that these tests have little value as predictors of students’ future success in graduate school and beyond. An attractive alternative is the use of non-cognitive assessments such as tests of certain personality characteristics. At the Women in Astronomy IV meeting in Austin, TX, in June 2017, a panel discussion of these topics was held, and conference attendees were polled regarding which personal characteristics are most important for success in graduate school. This paper reports on these discussions.
Members of the Women in Astronomy IV Graduate School Admissions White Paper Group: Adam Burgasser, Alex Rudolph, Amy Bartholomew, Andrew Sturner, Arianna Brown, Briana Indahl, Caitlin Doughty, Günther Hasinger, Jackie Monkiewicz, Jenna Whitley, Julie Posselt, Katie Jameson, Katy Rodriguez Wimberly, Kelsie Krafton, Kirsten Blancato, Lyndele von Schill, Marcela Morillo, Niyousha Davachi, Peter Frinchaboy, Raquel Martinez, Rebekah Dawson, Rose Gibson, Sarah Moran, Sinclaire Manning, Sophia Nasr, Theresa Melo, Xinting Yu, Zach Berta-Thompson
The Graduate Record Examinations (GRE) Verbal, Quantitative, and Physics tests are widely used by departmental graduate admissions committees. They assist in evaluating candidates for whom other measures are uncertain — such as applicants from other countries or from undergraduate institutions of varying or unknown quality.
Unfortunately, this widely-used tool is also widely misused and is known to be discriminatory, in the sense that members of underrepresented groups perform more poorly, on average, than white males, for reasons having to do with issues other than ability. For these reasons, a panel session entitled “Graduate Admissions in a Post-GRE World” was held at the Women in Astronomy IV meeting in Austin, TX, in June 2017 to discuss how graduate admissions might be conducted without use of the GRE. This white paper is an outgrowth of that session.
Problems with the GRE have led to statements from the Educational Testing Service (ETS), which developed the GRE : “GRE scores help you compare applicants, but if you use an arbitrary cut score as a criterion, you could miss an applicant who would be a great asset to your program.” The use of a hard cutoff — which rules out candidates no matter how strong their other qualifications may be — effectively makes the GRE the sole criterion in those cases. According to Potvin et al. , more than a third of physics Ph. D. programs in the U.S. require a minimum GRE score for admission.
A serious problem with the use of cutoffs is that, for reasons that are poorly understood, demographic groups show significant differences in test performance. The ETS claims that these differences are, in part at least, due to “educational access at an early age.” Indeed, the math on the GRE Quantitative does not exceed typical tenth-grade level. But therein lies the potential problem: the children of low socioeconomic status families do not always attend high-quality middle schools. Students from such schools do well enough in advanced mathematics to be successful physics majors, but those math topics/skills are not on the GRE Quantitative test. Another plausible explanation is stereotype threat, i.e., the fear on the part of anyone other than White and Asian males that poor performance on their part will reflect on the group to which they belong .
Figure 1 shows the test score distributions that characterize various demographic groups. If a vertical line were drawn at say, a score of 700, very few Hispanic or Black applicants would be accepted, and women would be preferentially rejected. A graduate admissions process that relies on a GRE cutoff score, or even has a strong preference for high scores, will result in ethnic and gender homogeneity, relative to the applicant pool, in the entering class.
An additional problem is that the fees associated with taking the GRE and reporting official scores are a significant financial burden for many students, and this burden tends to fall hardest on the same underrepresented groups.
At the same time, there is little evidence that the GRE has redeeming value as a predictor of success in graduate school or beyond. For example, Miller et al.  found only a weak correlation between Physics Ph.D. completion rate and either undergraduate grade-point average or GRE Quantitative score, and no correlation between completion rate and either GRE Physics or GRE Verbal score, among students at 27 graduate programs in the U.S. Even for the GRE Quantitative test, the completion rates of U.S. physics majors varied by less than 10% as their scores ranged from the 10th to the 90th percentile.
One measure of success in astronomy is to obtain a named postdoctoral fellowship. A group of astronomers  circulated a questionnaire to 271 present or past holders of the Hubble, Einstein, NSF, Sagan, and/or Jansky Fellowships between 2010 and 2015, asking them to report their scores on the GRE Physics test (PGRE) and the number of first-author publications they had produced before entering graduate school and while in graduate school. The distribution of the 149 self-reported PGRE scores was very broad, approximately flat across the upper half of the distribution (15 or more fellows in every 10-percentile bin above the median) and some respondents in every 10-percentile bin (Figure 2). The distribution is also significantly different for males and females. In addition, the correlation between PGRE score and number of first-author papers at the two career stages, as well as the total number, was weak, with Pearson r correlation coefficients ranging from 0.11 to 0.16.
In response to these and other problems with the GRE, the astronomy profession has taken a number of steps.
Most recently, the AAS Board of Trustees endorsed a report from the AAS Task Force on Diversity and Inclusion in Graduate Education . It recommends holistic evaluation of applicants to graduate school, in lieu of the GRE.
Earlier, in a special session at the 223rd AAS meeting in 2014 January (session 337), “The Proper Use of GRE Scores and Noncognitive Measures for Enhancing Diversity and Excellence in Astronomy Graduate Programs,” these and other issues were laid out in a panel discussion.
The GRE was also a topic of discussion at the Inclusive Astronomy 2015 conference, and it figures in the Nashville Recommendations, which grew out of that conference. There, the GRE is named as a barrier to access and figures in recommendation RBA1S, which reads, “Develop and deploy best-practice, research-based tools for evaluating graduate school applications holistically and equitably: Eliminate the General and/or Physics Graduate Record Exams (GRE) for graduate school admission . . . and integrate holistic measures of scientific talent into graduate admissions procedures (see, e.g., the Fisk-Vanderbilt Bridge Program toolkit for sample protocols and rubrics).” The Nashville Recommendations also include a list of graduate programs’ PGRE requirements, sorted according to whether the programs do not accept (6 programs), consider optional (44), recommend (31), or require (51) the PGRE [spreadsheet link].
In 2016 January, the AAS Council adopted a statement with the recommendation “… that graduate programs eliminate or make optional the GRE and PGRE as metrics of evaluation for graduate applicants. If GRE or PGRE scores are used, the AAS recommends that admissions criteria account explicitly for the known systematics in scores as a function of gender, race, and socioeconomic status, and that cutoff scores not be used to eliminate candidates from admission, scholarships/fellowships, or financial support, in accordance with ETS recommendations.”
In 2017 June, the Women in Astronomy IV (WiA IV) conference hosted a panel discussion entitled, “Graduate Admissions in a post-GRE World,” with Caitlin Casey, Katie Jameson, and Casey Miller. This white paper is an outgrowth of that panel .
The panel discussion at WiA IV identified several concerns regarding the removal of the GRE as a requirement for graduate admissions. For each concern, the panel identified possible solutions or best practices for evaluating graduate applications without the GRE.
Value: The GRE and PGRE measure knowledge, provide information not otherwise available, show a real difference in ability, and may show talent otherwise unrevealed. Will standards be lowered?
More studies along the lines of  should be done to improve understanding of the value of the GRE as a predictor of career achievement.
Comparisons: The GRE has value for comparing students from schools with different grading standards, where the GPA has different meanings, and from unknown schools.
Admissions committees should develop institutional profiles of schools from which they admit students. In addition, graduate programs should broaden and institutionalize trust networks with undergraduate institutions. “Bridge programs” are pre-eminent examples of this approach, but other implementations also exist.
Bias: Letters of recommendation and other noncognitive measures, such as personality tests, would become more important. Personality tests would need to be made robust, and readers of letters would need to be alert for writer and evaluator bias.
Evaluation of noncognitive assessments, including but not limited to letters of recommendation and personal essays, should be done objectively, with previously-defined rubrics that omit biased keywords. Structured evaluation/interview protocols may be used. If noncognitive assessments are introduced to replace the GRE (see below), they should be developed with input from psychologists (specifically industrial/organizational psychologists) and with guidance from best practices.
Nonuniformity: How to avoid penalizing students for not submitting a score?
The best way to handle this problem is not to use the GRE at all.
Reputation: Will departmental reputation will be diminished by not requiring the GRE?
This concern will subside as more top departments drop the GRE while maintaining the quality of their graduates. Since 2017, when Harvard’s Department of Astronomy stopped accepting scores from the GRE General exam, the number of applications has increased, allowing the program to become more selective. In 2018, Stanford University’s Faculty Senate voted to end the requirement that all graduate programs require the GRE for admissions.
Institutional barriers: Some graduate schools require the GRE.
If the institution insists on requiring it, ignore it.
In a post-GRE world, there will be increased reliance on measures of personal characteristics, i.e., on noncognitive assessments. Research has shown that, when noncognitive assessments are used in addition to scores on tests of general ability for the evaluation of applicants for employment or for educational opportunities, all these measures’ predictive power for job performance is increased . Rubrics for use in evaluation of applicants to graduate school have been developed; examples are given in the 2018 AAS Task force report .
Chosen for discussion at Women in Astronomy IV was a well-studied example of a noncognitive assessment, a set of personality tests called the Emotional Social Competence Inventory (e.g., link). It includes four clusters: Self Management, Self Awareness, Social Awareness, and Relationship Management, which in turn include the traits listed in Table 1 . These measures have been found to show negligible differences among demographic groups . Therefore, their use would enhance both validity and diversity in selective evaluations.
WiA IV attendees received a survey asking which of these characteristics are most important for students’ success in research. The preponderance of opinion among the 73 respondents was that the Self Awareness and Self Management clusters are most likely to be “crucially important” or “moderately important.” We remark here that emphasizing these two clusters would help to avoid selecting against people with conditions that might affect their social skills. The principal survey results are listed in Table 2.
Self Management: Managing one’s internal states, impulses, and resources
Striving to improve or meeting a standard of excellence
Readiness to act on opportunities
Persistence in pursuing goals despite obstacles and setbacks
Keeping disruptive emotions and impulses in check
Flexibility in handling change
Taking responsibility for personal performance
Self Awareness: Knowing one’s internal states, preferences, resources, and intuitions
Recognizing one’s emotions and their effects
Knowing one’s strengths and limits
A strong sense of one’s self-worth and capabilities
Social Awareness: Handling relationships; awareness of others’ feelings, needs, concerns
Sensing others’ feelings and perspectives and taking an active interest in their concerns
Reading a group’s emotional currents and power relationships
Anticipating, recognizing, and meeting others’ needs
Respecting and relating well to people from varied backgrounds
Relationship Management: skill or adeptness at inducing desirable responses in others
Sensing others’ development needs and bolstering their abilities
Inspiring and guiding individuals and groups
Initiating or managing change
Wielding effective tactics for persuasion
Negotiating and resolving disagreements
listening openly and sending convincing messages
Nurturing instrumental relationships
Teamwork and Collaboration
Working with others toward shared goals and creating group synergy in pursuing collective goals
Construct a rubric or shopping list of desired characteristics.
Scan personal essays (if used) and letters of recommendation for these characteristics.
Make selection criteria objective (achievements vs. qualities).
Use a coarse rating scale.
Properly designed rubrics are key; publish models, provide guidelines.
An example of a rubric for evaluation of non-cognitive constructs is given in Appendix A.
From the evidence discussed above, it is reasonable to conclude that elimination of the GRE as a requirement for graduate admissions and its replacement with well-designed measures of noncognitive constructs will be a step toward greater inclusiveness and diversity in astronomy and may result in higher-quality graduate programs. The organizers and attendees of Women in Astronomy IV encourage the community to make this step.
This material is based upon work supported by the National Science Foundation under Grant No. 1643046, Women in Astronomy IV Conference. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
The Fiske-Vanderbilt sample noncognitive assessment rubric for possible use in graduate admission decisions may be downloaded here:
This tool was obtained from http://fisk-vanderbilt-bridge.org/tool-kit. This tool is used by permission of the Fisk-Vanderbilt Masters-to-PhD Bridge Program, a joint graduate program in science and engineering at Fisk University and Vanderbilt University, as indicated by the Fisk-Vanderbilt Bridge Program watermark. The use or repurposing of these tools does not necessarily represent the endorsement of the Fisk-Vanderbilt Masters-to-PhD Bridge Program, its officers, or its participants.