SIGNUPABOUTBW_CONTENTSBW_+!DAILY_BRIEFINGSEARCHCONTACT_US


Return to main story


STRAIGHT TALK ABOUT THE COMPUTERIZED GMAT

One year ago, the Graduate Management Admissions Council (GMAC), the nonprofit organization that owns and administers the GMAT test that's required for students applying to business school, made a historic move: It introduced the GMAT CAT (computer-adaptive test), an electronic version of its flagship Graduate Management Admissions Test. Nearly all test-takers worldwide must now brave the CAT except for roughly 1,200 people who, because of their remote locations or disabilities, still have access to the old paper-based version.

Part of the idea was to more easily accommodate the rapidly rising number of business school applicants: During the 1997-98 admissions cycle, applications to the Business Week's top 25 schools hit an all-time high of 89,031, up 10% from 1996. Last year, nearly 230,000 MBA wannabes registered for the paper-based GMAT. And since the GMAT CAT's debut on Oct. 11, 1997, nearly 190,000 prospective B-school students have taken the computerized version.

The change in the GMAT's format has set off a scramble among students to figure out the best way to take the test. Many are confused about how an adaptive test is scored -- especially since some test-preparation companies may have dispensed bad advice on how to take the CAT. On Oct. 6, Business Week Online's Nadav Enbar discussed these and other issues with GMAC Vice-President Fred McHale. Previously the executive director of Educational Testing Service (ETS), McHale served as the lead developer of the GMAT CAT, focusing mostly on ways to ensure the fairness of test questions and also developing statistical methods to make the paper-based and electronic tests comparable. Here is an edited transcript of that conversation:

Q: Fred, let's get right down to it. Just what makes the GMAT CAT adaptive, and how does that affect the test's scoring methodology?

A: There are some ways to think about it without getting into psychometrics [the branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of variables such as intelligence, aptitude, and personality traits]. Here's the basic function: The computer-adaptive test starts out by posing questions of average difficulty. As you answer those questions, depending on whether you are correct or incorrect, the test poses future questions accordingly. So if you answer a question incorrectly, the next question will be easier, with a smaller point value; and conversely, if you answer the question correctly, the next question will be more difficult, with a larger point value. The larger number of difficult questions you answer, the higher score you receive. By the CAT's end, you come to a point where you are consistently getting the same level of questions correct, and your score reflects that.

That's the idea of an adaptive test: The questions become more difficult whenyou answer them correctly, and less so if you answer them incorrectly. We also want to make sure that all of the relevant subject matter is covered, that every student gets to answer questions surrounding algebra, geometry, problem solving, etc. So the scoring is based on the difficulty of the question, which is directly tagged to the point level and the content of the question.

All of the questions administered on the CAT have been given before to students who have taken the paper-based GMAT. So, we have statistical data on the difficulty of every question before we send it out to be used in the computer-based test. There are also new questions that we're using for the first time within the computer-based test. These 10 or 11 new questions are not part of the score but are administered as dummy questions to gauge difficulty for future use. We use that data to see how the questions function...we may discard or use them for future tests. And these mock questions are embedded in the operational system. I might get a practice question that comes in at any level of difficulty.

Q: The CAT is just about to have its first birthday. Have you received many complaints concerning its scoring methodolgy/algorithm?

A: The only complaint I am aware of in terms of scoring is the word out there [among prospective students who take the test] that you should spend more time answering the early questions, and then guess at or skip the remainder. The false information out there is stating that if you get the earlier questions correct, it will take you to such a high level of scoring that you don't have to do anything but guess on the remaining questions to keep that high score. I've received complaints from students who have followed that technique -- and have earned a much lower score than their academic background would indicate.

If you haven't answered questions at the end of the test, then you're penalized. The penalty would be proportional to the number of questions you've answered over the number of total questions on the test. So if you answer 20 questions on the verbal section, and you don't answer the rest, your highest possible score would be a 20/37. If you answer only 1 of the 37 questions, then you would only get scored 1/37-- which is the lowest possible score you can receive.

The best strategy is to answer every single question. If you don't, your score is adjusted. Scoring an 800 on the test never meant that a test taker answered all questions correctly. It could mean that any person who got all but five questions correct still got that elusive 800. But if you do not answer questions, your score will be negatively adjusted.

Q: Are the prep services such as Princeton Review or Kaplan disseminating some of this bad advice?

A: I think there definitely is some misunderstanding out there about how the test operates. And that word is going around on the Internet. Some prep courses are delivering the wrong advice, and that's where students are getting confused. I know students who have received quant scores in the lowest 5% of test-takers, and they're working in accounting! I attribute those low scores to following that bad advice. I can gauge that because the CAT allows us to see how much time a test-taker spends on each question. So, if a particular student spends three minutes for each of the first 20 questions, and then spends an average of 30 seconds on the rest, I can deduce that he or she is guessing randomly.

That stategy may have worked for some people some of the time, and that's enough for the test prep companies to say: "See, it works for us!" They believe it works, and they're going to continue plugging it because having strategies and imparting them is what their business is about. But imparting incorrect strategies has always been a part of the test-prep business. One service, during the GMAT's paper days, used to teach people how to identify which section was the pretest section (the section that doesn't count toward your GMAT score) so users could skip it. Their strategy was wrong, but it would work sometimes. So, they aren't giving completely erroneous information, but their strategies don't consistently work, and many people end up getting hurt by this advice that isn't completely accurate.

Q: Has ETS or GMAC done anything to correct the misinformation?

A: We have talked to a number of people at various test-prep services over the phone to try to correct their test-taking advice. And in the near future, we plan on putting a press release on our Web site (www.gmat.org) as well as sending it to prep companies, B-schools, and the media. The news release will detail the correct preperatory methods test-takers should use and point out the specific things that are different between the CAT and paper-based test [no date has been set for the release].

Q: Are you aware of any other issues test-takers are unhappy about with respect to the CAT?

A: Test-takers have also been unhappy that they can't go back and review their answers or skip questions that they have a hard time with [and come back to them later]. That may change in the future as we do additional research. [Skipping harder questions to answer the easier ones first] is a strategy on the CAT that isn't comparable to the paper test. In the 1940s, the "new" multiple choice test wasn't comparable to the essay test that was the standard of that time.

Bottom line: you have to know how the test is designed so you can preprare for it. You can't use the old test strategies. On the CAT, everyone has to answer all of the questions on the test (there are 37 questions in the quantitative section, 41 in the verbal section). To compensate for that, we give test-takers almost twice as much time on an average basis for most of the questions.

When we built the first CAT prototype, we doubled the amount of time available for a person to answer verbal questions and found that it wasn't sufficient. So, we added more time to it in the second prototype. Test-takers had an average of 80 seconds to spend on each question on the paper-based test. Now on the CAT they can spend about two minutes on each question. Both the verbal and quant sections run a total of 75 minutes each, and the the two essays each have a 30-minute time limit.

There are actually fewer questions that each individual student needs to answer on the test. But we have quadrupled the total number of test questions available for each test (adding to the diveristy of questions that may be posed on the CAT). In the paper-based test, there were three sections of verbal, and students had 25 minutes to complete each section. There were many more questions on the paper-based test than on the CAT because the paper test required that you give enough questions to cover the whole range of a test-taker's abilities. The result was that many questions were posed on the paper test that were irrelevant to a test-taker's ability. Because of the adaptive nature of the CAT test, you can ask fewer questions overall, but more questions at an individual's ability level.

Q: Who can concerned test-takers call to ask questions or register complaints?

A: We have a customer service line [(609) 771-7330], as well as an E-mail address [gmat@ets.org] that constantly field questions about different situations...a test score or a situation at a test center, whatever it happens to be. The turnaround via E-mail is 48 hours and is immediate over the telephone [office hours are 8 a.m. to 8 p.m. ET, five days a week, and Saturdays from 9 a.m. to 4:45 p.m.].

In terms of the number of complaints fielded, all I can say is that it's not any more than it was when we were using the paper-based test. If anything, the number of complaints about test situation/conditions has gone down because of the standardization of how test centers should operate and feel. There are specifications that we established for every single test center in terms of what instructors say and the procedures that they follow, say, to give test-takers things like scratch paper. That keeps the whole process much more standardized.

The standards are usually upheld, but there are times that they're not, and we'll get complaints. When that happens we'll either talk to several people over at Sylvan Learning Systems [the company that helps run the CAT test sites], or to the people at an actual institution if the test site is at a college university center, and try to get the problems rectified. That's another thing that makes the CAT more efficient than the paper test: When problems occurred during the paper-based test, it was the word of the student vs. the word of the administration. Now, we have video cameras in every CAT test center so everything is recorded on tape, and the tape never lies.

Q: Some students feel they're not receiving the proper counseling or advice from the various published guides that are supposed to prepare them for the CAT. What has been done to make the GMAT Guide appropriate for CAT preparation.

A: The Official GMAT Guide has review questions based on the pools of questions that are used in the CAT practice test. Test takers can also buy our POWERPREP software, which has two actual computer-adaptive tests, delivered just like they would be in the field, for people to take as well as review questions. The Guide no longer has paper tests bundled in it.

Q: Can you tell me a little bit about the history surrounding the CAT's development?

A: Sure. Our first priority going into the CAT's operation in October, 1997, was to do a study measuring scores and differences between the paper-based and computer GMAT versions. In October, 1996, we sampled 4,000 people to see a comparability of scores between those students. One thing we found is that the verbal section of the CAT was taking too much time. The average time it took test-takers to answer a verbal question on the paper-based test was 85 seconds. For the CAT it ballooned to two minutes. We knew that it would take longer to answer the questions on the CAT because you now have it geared toward your difficulty level -- they are more relevant and more difficult. On the CAT, there are no longer any easy questions to whip through. We re-did the study this past April (1997), and we're now giving CAT-takers more time, reducing the number of questions they have to answer in the quant section (the first prototype had 39 questions, while the second prototype had 37), and adding more time in the verbal section to make sure that the vast majority of people have enough time to answer all of the questions.

Q: Is there comparability across cultures and national boundaries?
A:
In terms of scores, we did a study where we looked at U.S. students, English and non-English speakers, males and females, and U.S. populations that were white, African-American, Hispanic-American, and Asian-America and within each group, compared their paper and CAT scores. In all of those cases, we found that there was a comparability of scores. But even though we know there is comparability for the group as a whole, that doesn't mean that when you're dealing with an individual person, that that one person will not have a predilection toward the paper or the CAT -- there's definitely preference involved. We're concerned with a balanced playing field, while also providing a platform for testing new things in the future. Ultimately, we were trying to make sure that we did not have some big swing in scores from the paper test to the CAT.

Since we did the comparability studies in October of 1996 and April of 1997 about 190,000 people have taken the CAT. Every month we look at people who've taken the test more than one time. And what we've found is that on average, people from one administration of the test to the next gain about 30 points. That's about on par with the score increases people experienced on their second try of the paper-based test.

The average score was much higher when we first started the administering the CAT, particularly the quant scores. But as time went on the average kept dropping. Now, the average score is almost the same as it was for the paper-based GMAT last year: 515. That's almost all due to quant scores being lower. Our suspicion for the early rise is that test-takers who were quant-capable were more apt to take the test in the fall when the CAT started out. Now we're starting to see the rest of the population taking the test, which has brought the average down.

Q: Where are the CAT's scoring directions placed or published? I know that the newly refurbished GMAT site has a test score area that gives users an idea of how to interpret their scores as well as how to receive them, but no methodolgy is listed.

A: Research papers, published in academic journals and discussed at national conferences*, provide the most in-depth, statistically oriented analysis on the CAT's scoring methodology. But needless to say, the paper-based test is much easier to understand as far as the scoring is concerned. If you got 15 correct, you could go to the conversion table, and find your score. With the CAT, it's different. A large number of people get the same number of questions correct and incorrect. A more difficult question sets a higher weight than an easier question, and each question has to be tacked to a score.

Q: Where is a layperson's guide to the CAT's scoring algorithim published?

A: The explanation for a layperson/test-taker is just what we publish on the GMAT bulletin and the Web site [you can download a bulletin copy from the GMAC Web site at www.gmat.org and view with Acrobat Reader. See pages 4 and 17 in the downloaded copy). Both areas talk about how the adaptive testing works, but the actual technical scoring is only published in the journals because of the statistical nature surrounding it. It's one area I wish we could do a better job in: explaining the scoring to a person who's not a statistician.

Q: There's no point-value system devised to help students more easily understand their scores?

A: The weight changes depending on where you are in terms of other questions that you took. To put your score in a conversion table, I would have to look at all of the questions you answered, because looking at any one question's weight is determined by the questions you answered to get there. I would have to provide a table for every single variation of number of questions you answered. We're talking about hundreds of questions.

We always did a stat adjustment on the paper-based score to make sure that the test given in June of 1997 was comparable to the test given in June of 1993. Questions in a particular test given earlier might have been more difficult than questions given in a test within the last year. Scoring between a 200 and 800 on the paper-based GMAT was not just based on the number of questions you got correct but also on the comparablilty of difficulty with previous tests. With the CAT, we're essentially doing that with every question.

Q: Can you give a quick overview of the GMAT CAT score report and the range in scoring?

A: The GMAT score report contains four scores: verbal, quantitative, total, and analytical writing. Verbal and quantitative scores are reported on scales ranging from 0 to 60. Scores below 10 or above 46 are rare. The total score is reported on a scale ranging from 200 to 800, but extreme scores (below 250 or above 700) are uncommon. The analytical writing score, a separate score reported on a scale from 0 to 6, is the average of four ratings of your responses to the two topics in the analytical writing assessment.

Q: Where do you see the GMAT CAT heading in the future?
A:
We established the infrastructure for computer-based testing for several reasons: For the added flexibility it gives to test-takers (the test can be taken 250 days a year, during the last three weeks of every month, as opposed to just four times a year when it was on paper) and to improve the measurement of the test. Now, students are getting more relevant test questions.

We're also looking at how to improve the assessment using the computer by integrating more authentic-type tasks that include other cognitive skills that just couldn't be assessed on the paper version. For example, what if we wanted to assess the student's ability to handle a real problem-solving situation? To do that, we would have to give the test-taker an authentic task and provide resources, like books, for him or her to attack the problem.

Maybe, by using a multimedia computer (that allows you to surf the Web, for example), the test-taker could generate a self-produced response to the question. That could then get to the student's problem-solving skills and better assess whether he or she is able to perform in an academic program, and even in a future career. Those are some of the things techonolgy offers that could never even be broached with the paper-based test. And that situation I just described could become a distinct possibility by the year 2003.

* The psychometric research on ETS's computer-adaptive testing is published in Applied Psychological Measurement, Volume 17, Number 2, June 1993, pages 151-156 by Swanson and Stocking, and pages 167-176 by Stocking, Swanson, and Pearlman, and, Volume 17 Number 3 [same journal] September 1993 pages 277-292 by Stocking and Swanson.


PROFILE: THE WRITTEN GMAT VS. THE COMPUTERIZED VERSION

GMAT-Verbal
Number of questions:
56 (paper) 41 (CAT)
Number of minutes:
75 (paper) 75 (CAT)

GMAT-Quantitative
Number of questions:
52 (paper) 37 (CAT)
Number of minutes:
75 (paper) 75 (CAT)

The paper test was given in six 25-minute sections with all test questions scored, plus one separate 25-minute session for trial questions. The CAT is given as two 75-minute sections with 9 to 11 unscored trial questions embedded in each section. In both cases, the total GMAT score is a composite of the Verbal and Quantitative scores.

The Analytical Writing Assessment remained unchanged at 30 minutes for each of two essays.




RELATED ITEMS

THE BEST B-SCHOOLS
COVER IMAGE: The Best B-Schools

TABLE: The Top 25

TABLE: A Report Card for the Top 25 (.pdf)

TABLE: The Up-and-Comers

TABLE: Best and Worst Placement Offices

TABLE: Favorite Hunting Grounds

CRUNCHING THE NUMBERS: A TALE OF TWO SURVEYS

TABLE: The Survey

HOW WE KEPT THE DATA UNSULLIED

AND NOW, EXTREME RECRUITING

CHART: Still a Seller's Market

THE MELTING POT STILL HAS A FEW LUMPS

TABLE: How Smart B-Schools Avoid Culture Shock

IN ASIA, PURSUING WESTERN MBAs--WITHOUT LEAVING HOME (int'l edition)

TABLE: B-Schools Go East: A Sampling

THE OLD WORLD'S NEWFANGLED MBAs (int'l edition)

TABLE: New Directions in Europe's Business Schools

TABLE: The MBA Road (.pdf)

ONLINE ORIGINAL: STRAIGHT TALK ABOUT THE COMPUTERIZED GMAT


Return to main story


SIGNUPABOUTBW_CONTENTSBW_+!DAILY_BRIEFINGSEARCHCONTACT_US


Updated Oct. 8, 1998 by bwwebmaster
Copyright 1998, by The McGraw-Hill Companies Inc. All rights reserved.
Terms of Use