http://www.newyorker.com/arts/critics/books/2007/12/17/071217crbo_books_gladwellNone of the Above
What I.Q. doesn’t tell you about race.
by Malcolm Gladwell December 17, 2007
If what I.Q. tests measure is immutable and innate, what explains the Flynn effect—the steady rise in scores across generations?
One Saturday in November of 1984, James Flynn, a social scientist at the University of Otago, in New Zealand, received a large package in the mail. It was from a colleague in Utrecht, and it contained the results of I.Q. tests given to two generations of Dutch eighteen-year-olds. When Flynn looked through the data, he found something puzzling. The Dutch eighteen-year-olds from the nineteen-eighties scored better than those who took the same tests in the nineteen-fifties—and not just slightly better, much better.Curious, Flynn sent out some letters. He collected intelligence-test results from Europe, from North America, from Asia, and from the developing world, until he had data for almost thirty countries. In every case, the story was pretty much the same. I.Q.s around the world appeared to be rising by 0.3 points per year, or three points per decade, for as far back as the tests had been administered. For some reason, human beings seemed to be getting smarter.
Flynn has been writing about the implications of his findings—now known as the Flynn effect—for almost twenty-five years. His books consist of a series of plainly stated statistical observations, in support of deceptively modest conclusions, and the evidence in support of his original observation is now so overwhelming that the Flynn effect has moved from theory to fact. What remains uncertain is how to make sense of the Flynn effect. If an American born in the nineteen-thirties has an I.Q. of 100, the Flynn effect says that his children will have I.Q.s of 108, and his grandchildren I.Q.s of close to 120—more than a standard deviation higher. If we work in the opposite direction, the typical teen-ager of today, with an I.Q. of 100, would have had grandparents with average I.Q.s of 82—seemingly below the threshold necessary to graduate from high school. And, if we go back even farther, the Flynn effect puts the average I.Q.s of the schoolchildren of 1900 at around 70, which is to suggest, bizarrely, that a century ago the United States was populated largely by people who today would be considered mentally retarded.
For almost as long as there have been I.Q. tests, there have been I.Q. fundamentalists. H. H. Goddard, in the early years of the past century, established the idea that intelligence could be measured along a single, linear scale. One of his particular contributions was to coin the word “moron.” “The people who are doing the drudgery are, as a rule, in their proper places,” he wrote. Goddard was followed by Lewis Terman, in the nineteen-twenties, who rounded up the California children with the highest I.Q.s, and confidently predicted that they would sit at the top of every profession. In 1969, the psychometrician Arthur Jensen argued that programs like Head Start, which tried to boost the academic performance of minority children, were doomed to failure, because I.Q. was so heavily genetic; and in 1994 Richard Herrnstein and Charles Murray, in “The Bell Curve,” notoriously proposed that Americans with the lowest I.Q.s be sequestered in a “high-tech” version of an Indian reservation, “while the rest of America tries to go about its business.” To the I.Q. fundamentalist, two things are beyond dispute: first, that I.Q. tests measure some hard and identifiable trait that predicts the quality of our thinking; and, second, that this trait is stable—that is, it is determined by our genes and largely impervious to environmental influences.
This is what James Watson, the co-discoverer of DNA, meant when he told an English newspaper recently that he was “inherently gloomy” about the prospects for Africa. From the perspective of an I.Q. fundamentalist, the fact that Africans score lower than Europeans on I.Q. tests suggests an ineradicable cognitive disability. In the controversy that followed, Watson was defended by the journalist William Saletan, in a three-part series for the online magazine Slate. Drawing heavily on the work of J. Philippe Rushton—a psychologist who specializes in comparing the circumference of what he calls the Negroid brain with the length of the Negroid penis—Saletan took the fundamentalist position to its logical conclusion. To erase the difference between blacks and whites, Saletan wrote, would probably require vigorous interbreeding between the races, or some kind of corrective genetic engineering aimed at upgrading African stock. “Economic and cultural theories have failed to explain most of the pattern,” Saletan declared, claiming to have been “soaking [his] head in each side’s computations and arguments.” One argument that Saletan never soaked his head in, however, was Flynn’s, because what Flynn discovered in his mailbox upsets the certainties upon which I.Q. fundamentalism rests. If whatever the thing is that I.Q. tests measure can jump so much in a generation, it can’t be all that immutable and it doesn’t look all that innate.
The very fact that average I.Q.s shift over time ought to create a “crisis of confidence,” Flynn writes in “What Is Intelligence?” (Cambridge; $22), his latest attempt to puzzle through the implications of his discovery. “How could such huge gains be intelligence gains? Either the children of today were far brighter than their parents or, at least in some circumstances, I.Q. tests were not good measures of intelligence.”
The best way to understand why I.Q.s rise, Flynn argues, is to look at one of the most widely used I.Q. tests, the so-called WISC (for Wechsler Intelligence Scale for Children). The WISC is composed of ten subtests, each of which measures a different aspect of I.Q. Flynn points out that scores in some of the categories—those measuring general knowledge, say, or vocabulary or the ability to do basic arithmetic—have risen only modestly over time. The big gains on the WISC are largely in the category known as “similarities,” where you get questions such as “In what way are ‘dogs’ and ‘rabbits’ alike?” Today, we tend to give what, for the purposes of I.Q. tests, is the right answer: dogs and rabbits are both mammals. A nineteenth-century American would have said that “you use dogs to hunt rabbits.”
“If the everyday world is your cognitive home, it is not natural to detach abstractions and logic and the hypothetical from their concrete referents,” Flynn writes. Our great-grandparents may have been perfectly intelligent. But they would have done poorly on I.Q. tests because they did not participate in the twentieth century’s great cognitive revolution, in which we learned to sort experience according to a new set of abstract categories. In Flynn’s phrase, we have now had to put on “scientific spectacles,” which enable us to make sense of the WISC questions about similarities. To say that Dutch I.Q. scores rose substantially between 1952 and 1982 was another way of saying that the Netherlands in 1982 was, in at least certain respects, much more cognitively demanding than the Netherlands in 1952. An I.Q., in other words, measures not so much how smart we are as how modern we are.
This is a critical distinction. When the children of Southern Italian immigrants were given I.Q. tests in the early part of the past century, for example, they recorded median scores in the high seventies and low eighties, a full standard deviation below their American and Western European counterparts. Southern Italians did as poorly on I.Q. tests as Hispanics and blacks did. As you can imagine, there was much concerned talk at the time about the genetic inferiority of Italian stock, of the inadvisability of letting so many second-class immigrants into the United States, and of the squalor that seemed endemic to Italian urban neighborhoods. Sound familiar? These days, when talk turns to the supposed genetic differences in the intelligence of certain races, Southern Italians have disappeared from the discussion. “Did their genes begin to mutate somewhere in the 1930s?” the psychologists Seymour Sarason and John Doris ask, in their account of the Italian experience. “Or is it possible that somewhere in the 1920s, if not earlier, the sociocultural history of Italo-Americans took a turn from the blacks and the Spanish Americans which permitted their assimilation into the general undifferentiated mass of Americans?”
The psychologist Michael Cole and some colleagues once gave members of the Kpelle tribe, in Liberia, a version of the WISC similarities test: they took a basket of food, tools, containers, and clothing and asked the tribesmen to sort them into appropriate categories. To the frustration of the researchers, the Kpelle chose functional pairings. They put a potato and a knife together because a knife is used to cut a potato. “A wise man could only do such-and-such,” they explained. Finally, the researchers asked, “How would a fool do it?” The tribesmen immediately re-sorted the items into the “right” categories. It can be argued that taxonomical categories are a developmental improvement—that is, that the Kpelle would be more likely to advance, technologically and scientifically, if they started to see the world that way. But to label them less intelligent than Westerners, on the basis of their performance on that test, is merely to state that they have different cognitive preferences and habits. And if I.Q. varies with habits of mind, which can be adopted or discarded in a generation, what, exactly, is all the fuss about?
When I was growing up, my family would sometimes play Twenty Questions on long car trips. My father was one of those people who insist that the standard categories of animal, vegetable, and mineral be supplemented with a fourth category: “abstract.” Abstract could mean something like “whatever it was that was going through my mind when we drove past the water tower fifty miles back.” That abstract category sounds absurdly difficult, but it wasn’t: it merely required that we ask a slightly different set of questions and grasp a slightly different set of conventions, and, after two or three rounds of practice, guessing the contents of someone’s mind fifty miles ago becomes as easy as guessing Winston Churchill. (There is one exception. That was the trip on which my old roommate Tom Connell chose, as an abstraction, “the Unknown Soldier”—which allowed him legitimately and gleefully to answer “I have no idea” to almost every question. There were four of us playing. We gave up after an hour.) Flynn would say that my father was teaching his three sons how to put on scientific spectacles, and that extra practice probably bumped up all of our I.Q.s a few notches. But let’s be clear about what this means. There’s a world of difference between an I.Q. advantage that’s genetic and one that depends on extended car time with Graham Gladwell.
Flynn is a cautious and careful writer. Unlike many others in the I.Q. debates, he resists grand philosophizing. He comes back again and again to the fact that I.Q. scores are generated by paper-and-pencil tests—and making sense of those scores, he tells us, is a messy and complicated business that requires something closer to the skills of an accountant than to those of a philosopher.
For instance, Flynn shows what happens when we recognize that I.Q. is not a freestanding number but a value attached to a specific time and a specific test. When an I.Q. test is created, he reminds us, it is calibrated or “normed” so that the test-takers in the fiftieth percentile—those exactly at the median—are assigned a score of 100. But since I.Q.s are always rising, the only way to keep that hundred-point benchmark is periodically to make the tests more difficult—to “renorm” them. The original WISC was normed in the late nineteen-forties. It was then renormed in the early nineteen-seventies, as the WISC-R; renormed a third time in the late eighties, as the WISC III; and renormed again a few years ago, as the WISC IV—with each version just a little harder than its predecessor. The notion that anyone “has” an I.Q. of a certain number, then, is meaningless unless you know which WISC he took, and when he took it, since there’s a substantial difference between getting a 130 on the WISC IV and getting a 130 on the much easier WISC.
This is not a trivial issue. I.Q. tests are used to diagnose people as mentally retarded, with a score of 70 generally taken to be the cutoff. You can imagine how the Flynn effect plays havoc with that system. In the nineteen-seventies and eighties, most states used the WISC-R to make their mental-retardation diagnoses. But since kids—even kids with disabilities—score a little higher every year, the number of children whose scores fell below 70 declined steadily through the end of the eighties. Then, in 1991, the WISC III was introduced, and suddenly the percentage of kids labelled retarded went up. The psychologists Tomoe Kanaya, Matthew Scullin, and Stephen Ceci estimated that, if every state had switched to the WISC III right away, the number of Americans labelled mentally retarded should have doubled.