Basic Color Terms and Basic Color Categories

Chapter 11 of Backhaus, Kliegl, and Werner, Color Vision: Perspectives from Different Disciplines. Walter de Gruyter, Berlin and New York, 1998.

C. L. Hardin

Department of Philosophy

Syracuse University

Syracuse, NY 13244


Twenty-five years have passed since the publication of Brent Berlin and Paul Kay’s influential book, Basic Color Terms (1969). After it appeared, there was a flurry of critiques, responses, and further studies. When the work became enshrined in the textbooks, most people took the issue of the nature and implications of basic color term usage to be settled. Strangely enough, exactly what was supposed to have been settled depended on whom you consulted. On the left wing were the unreconstructed cultural relativists who maintained that the Berlin and Kay findings were an artifact of their methods, and that these were riddled with dubious assumptions. As one person recently said on the Internet: "In an earlier posting someone suggested that the 11 ‘basic’ colors were universally recognized in almost all languages. As a student of anthropology, I have often been taught that this is a myth (like the thousands of words eskimos have for snow)." On the right wing were the nativists who found the Berlin-Kay picture totally persuasive. One such person, the author of a notorious recent book on philosophy and color perception, wrote, "The Berlin-Kay basic color categories are simply the product of a set of filters at an early stage of neural processing." (Hardin 1988)

As usually turns out to be the case, the truth lies between these extreme views. Although the Berlin and Kay position was significantly modified in subsequent years, many of its basic tenets have proved to be quite robust, supported by various bits of independent evidence. Nonetheless, there are many important questions that remain unresolved. If the grand multilayered story that the Berlin-Kay work suggests can be completed in its most essential details, it will connect perception, thought, and culture. At the center of the story is the experience of color and its representation in individual consciousness. One branch of the story reaches downward into human biology and the organization of the nervous system. The other branch reaches outward into language and social interaction. Doubtless there will in time be many such stories, but at the present juncture in the history of science, this one is the only viable candidate. There are numerous gaps and holes between levels and within levels. All the same, there is enough on the table to suggest that holes can be filled and links forged.

Here is how the original story went. Berlin and Kay were struck by how easily common color terms could be translated between languages from places as diverse as Tahiti and Mesoamerica. But if, as cultural relativists had suggested, languages divide color space arbitrarily, and moreover, shape the way that their speakers perceive colored objects, how is this possible? To investigate the question, Berlin and Kay proposed criteria to separate the basic from the non-basic color terms of a language. Basic terms are to be those that are general and salient. A term is general if it applies to diverse classes of objects and its meaning is not subsumable under the meaning of another term. A term is salient if it is readily elicitable, occurs in the idiolects of most informants, and is used consistently by individuals and with a high degree of consensus among individuals. To determine the references of the basic color terms of a language, Berlin and Kay used a rectangular array of Munsell color chips of maximum available Chroma (saturation), vertically ordered in ten equal lightness steps, and horizontally ordered by hue, each column differing from its neighbors by a nominal 2.5 Hue steps. (The array was essentially a Mercator projection of the outer skin of the Munsell solid.) The test array was covered by transparent acetate, and each informant was asked, for each basic color term, to mark with a grease pencil (a) the best example, or focus of the color, and (b) the region of chips that could be called by the color term.

The field study on which Basic Color Terms was based used native-speaking informants in the San Francisco Bay Area for 20 languages, supplementing this limited sample with a literature search on 78 additional languages. The synchronic results were that languages vary in numbers of basic color terms, from a minimum of two terms (Papuan Dani) to a (probable) maximum of eleven, Russian and Hungarian being possible exceptions. But no matter how many basic color terms languages might have, their foci tend to cluster reliably in relatively narrow regions of the array, whereas boundaries are drawn unreliably, with low consistency and consensus for any language.

The diachronic conclusion was that if languages are ordered according to numbers of basic color terms, the sequences of encodings of basic color terms are tightly constrained (the conception of successive steps as encodings was subsequently changed by Berlin and Kay). For example, if a language has two basic color terms (a "Stage I" language) those terms will encode black and white. If it has three ("Stage II"), those terms will encode black, white, and red. If it has four ("Stage III"), the terms will be for black, white, red, and either yellow or green. The entire sequence comprises seven stages and eleven basic color terms. Berlin and Kay interpreted these as stages in an evolutionary sequence, and it is this interpretation that has occasioned the greatest controversy. The nature and number of the stages and the rules that govern their development are the points of the Berlin-Kay theses that have been most revised by their authors.

The early emergence of black, white, red, yellow, green, and blue, along with the clustering of the focal examples of each of these terms readily suggests an interpretation in terms of Hering’s opponent process theory. This reading of the matter was strengthened when Chad McDaniel (1972), following a suggestion of Bill Wooten’s, showed that when experimental subjects were asked to match their respective focal choices of red, yellow, green, and blue Munsell color chips with monochromatic lights, they chose lights of unique hue. Evidence for the view that hue categories based upon the perception of four elementary Hering hues are conceptually more fundamental than the other hue categories comes from hue-naming experiments with human adults. That the primacy of these Hering hue categories is biologically rather than linguistically based has been argued for by the congruence between adult human color-naming data and color categorization by human infants as well as other primates. Let us consider these points in turn.

A typical hue-naming experiment asks subjects to look at spots of monochromatic light and describe them with a restricted set of hue names prescribed by the experimenter. They are instructed to estimate the percentage of the specified hue that they see in the stimulus. Subjects tend to be quite reliable in performing this task. Charles Sternheim and Robert Boynton (Sterneim and Boynton 1966), along with later investigators, used an estimation procedure to determine the minimum number of hue terms necessary to describe the spectrum completely. The requirement for a stimulus to be described completely was that the percentage estimate of component hues add up to 100. A total component estimation of less than 100 per cent over a spectral range was interpreted as meaning that another hue term was required, whereas if every portion of the spectrum could be specified in terms of 100 per cent totals of the permitted hues, the names of those hues were taken to constitute a descriptively sufficient set for that spectral portion. Thus the term ‘orange’ proves to be unnecessary, since subjects are able to describe stimuli that look orange entirely in terms of yellow and red, whereas if the terms ‘orange’ and ‘green’ are permitted but ‘yellow’ forbidden, yellow-looking stimuli will not be describable as a combination of orange and green. English speakers find the hue names ‘red’, ‘yellow’, ‘green’ and ‘blue’ prove to be both necessary and sufficient to describe any spectral stimulus.

Hue naming as we have so far described it relies upon language. What about creatures without language? Do they find some categorizations of color continua more natural than others? Four-month human infants know precious little English, and they cannot describe what they see. Nevertheless, by watching their eye fixations one can tell whether they see two stimuli as similar or different. Infants will lose interest in a stimulus that looks similar to its predecessor, but continue looking at a stimulus that they regard as different from what went before. By exposing infants to sequences of colored lights whose dominant wavelengths are 20 nm apart, and recording their eye movements, Bornstein, Kessen and Weiskopf (1976) were able to map out their spectral color categories. These proved to line up rather well with the spectral categories of adults that are mapped with color-naming procedures. Using a rather different subject pool, Sandell, Gross, and Bornstein (1979) trained macaques to respond differentially to colored papers that human beings would see as good representatives of their categories, and then presented them with randomized sequences of colored papers that did not match the training stimuli. Their response rates changed markedly as the stimuli crossed human red-yellow-green-blue category boundaries, and were not sensitive to minor variations from monkey to monkey in the actual values of the training stimuli.

All of this suggests that categorization is in an important respect prior to language. This is consistent with what we know about certain brain-damaged people, who can pass color-vision tests as well as produce color names. Though they are unable to apply the color names to objects correctly, they are able to sort colored yarns correctly. However, it may well be the case that the categories available to these people are a function of their past experience with language and with socially influenced color-sorting experiences. We would like to know if there is a set of default categories that members of our species and our closest cousins bring into the world. The infant data suggest that there is, and that these default categories are based upon the Hering primaries. The macaque data are consistent with this, but one must ask what would happen if there had been a wider variety of training samples. In adult human hue naming, when subjects are permitted certain category names such as ‘orange’ over and above the canonical four, the term is always used and the range of the adjacent categories, in this case red and yellow, is correspondingly narrowed.

This was strikingly illustrated in an intriguing experimental series by Tetsuro Matsuzawa (1985), who worked with a female chimpanzee, Ai. Matsuzawa was interested in whether Ai would generalize from paradigmatic Munsell colored papers in the manner of a human being. First using three achromatic focal chips for training, he got a very nice categorical division of the total Munsell gray scale into black, white, and gray. Next, Matsuzawa trained Ai to recognize a focal chip for each of the eleven Berlin and Kay categories. She was to respond to the presentation of a sample by pressing one of eleven keys, each with a special symbol. After the training period, Ai was shown, in random order, one of 40 Munsell chips drawn from the total Berlin-Kay set and forming a hue circle. Quoting Matsuzawa,

Among 440 probe trials in total, there was no trial in which the chimpanzee used the three achromatic color names or "brown". The chimpanzee pressed the key for either red, orange, yellow, green, blue, purple, and pink although all 11 keys were operative. Again, it was found that each name was used categorically. Twenty-eight out of 40 chips (70%) were consistently named, that is, given the same color name throughout 10 sessions. The other 12 chips (30%) were consistently assigned the names of the two adjacent categories. The response latencies to these border colors were longer than those to the consistently named color chips.

Finally, Ai was presented, one sample at a time and in random order, 215 of the 320 color chips that appear in the Berlin-Kay array (the omitted chips are nearly achromatic). Each chip was shown three times. A male graduate student was asked to name the same chips in the same order, with one trial per chip. The results were compared. Again quoting Matsuzawa,

Both Ai and the human observer divided the color space into eight clusters with a broad area within which a single color name was applied consistently. The chimpanzee applied a single color name to 74% of 215 chips; the human subject applied the same name to 79% of the chips. Areas of consistent color naming were separated by narrower areas in which the names applied to the two adjacent areas were used. There were slight differences between the human and the chimpanzee in the location of these border areas.

Once again, the choice of training chip was not crucial, though it is clear that a chip chosen near to or on the boundary between two categories would likely have altered the results. It is important to bear in mind that Ai was given focal specimens and asked to generalize, whereas human beings are given names, and asked to find focal examples. Nonetheless, that a human being and a chimpanzee can agree so closely on the size, shape and location of these categories in color space is a very important argument for the nativist interpretation of basic color naming.

In turn, the very success of the nativist argument puts a significant burden on visual and cognitive scientists. It is one thing to note with satisfaction that basic infant hues are basic Hering hues. It is quite another to give an account of why there are eleven Berlin-Kay categories but only six Hering categories, why some derived categories such as purple are basic but others such as chartreuse are not, and why the categories are so uneven in size and placement. When one examines the color space in three dimensions rather than just looking at the outer skin, as Berlin and Kay did, the problems are even more pronounced.

Working independently of each other and using different color-order systems and different methods, Boynton and his collaborators in the United States (Boynton and Olson 1987; Uchikawa and Boynton 1987) and Lars Sivik in Sweden (1985) asked subjects to name samples drawn from the whole color space. Their determinations of the sizes and locations of the eleven Berlin and Kay categories are quite comparble. Sivik’s methods involved giving informants a color term and asking them to point to those samples on the pages of the Natural Color System (NCS) atlas that they regarded as examples of the color in question, and to rate each one on a scale of 1 to 5, according to how well they thought it exemplified the color in question. Sivik made no attempt to establish criteria of basicness, but he was able to get information about the graded structure of the categories his informants used.

By contrast, Boynton and Conrad Olson, using separately presented samples from the Optical Society of America’s Uniform Color Space (OSA) asked their English-speaking informants to name the samples with monolexical terms, but put no further restrictions on what those terms were to be. After each informant named each of the 424 color chips, presented in random sequence on two separate occasions, Boynton and Olson assessed the consistency with which each subject named the chips as well as the extent of agreement, or consensus, among informants. During the trials, response latency between each presentation and the production of the name was covertly measured. Boynton and Olson found that each of these three salience measures–consistency, consensus, and response time–neatly separated the chips named by Berlin and Kay’s eleven basic color terms from all of the others. Keiji Uchikawa and Boynton repeated the procedure with native speakers of Japanese with essentially the same results. Later data gathered from testing English-speaking two and four-year old children and a four-year old Japanese child were consistent with these findings.

The Boynton work raises several intriguing issues. Here is one of them. Although all three of the salience measures separate basic from non-basic color terms, none of them distinguishes between the terms for the Hering primaries–which Boynton calls ‘landmark colors’–and the terms for the other basic colors. Greville Corbett and Ian Davies (1996) looked at a number of linguistic as well as salience measures that would not only discriminate basic from non-basic color terms, but also distinguish between the Hering primaries and the other basic terms. The only procedure that they found that reliably made this latter distinction was to ask people to list the color terms that first come to mind. When this is done, the Hering primaries almost always get mentioned first. Why do the other techniques fail to establish such a difference among the primary and derived basics? Boynton is inclined to say that none of the basic colors is more fundamental than the rest, and that whatever native neural machinery makes for color classifications, the category-generating mechanisms for orange are on a par with those for red and yellow. Here is a direct quotation: "I feel it reasonable to suppose that there may be eleven categorically-separate varieties of activity, corresponding to each of eleven kinds of color sensations that are identified by the eleven basic color terms" (Boynton 1996). And this from the man who helped bring us the Sternheim-Boynton color-naming technique that established that red and yellow are more basic than orange!

A second problem has to do with the large differences in size between the regions of color space that are called ‘red’ or ‘yellow’ and the ones that are called ‘blue’ or ‘green’. The latter, but not the former, are found at all lightness levels. To this one might appropriately respond that we call light reds ‘pink’ and dark–or, more accurately, blackened–yellows ‘brown’. This is true enough, but it raises other questions. Consider brown. People differ in their sense of just how singular brown is. Nearly everybody agrees that brown has a certain affinity for yellow and orange, and if anyone is obliged to find the region of color space where it belongs, I would think that they would shove it down toward black, tucked underneath yellow and orange. But many people will hesitate to say that a brown is just a blackened orange or yellow, for brown looks to have a very different quality from those two. People are often incredulous when they are told that, viewed in a bright light through a peephole, a chocolate bar looks to have exactly the same hue as an orange. They are surprised when they see a demonstration in which a projected orange spot is first dimmed, looking orange to the very edge of invisibility, but then blackened by surrounding the orange spot with an annulus of bright white light. When the blackening occurs, the orange spot is transformed into a rich brown. It is as if the original quality has been lost, and replaced by another. Now running a careful Sternheim-Boynton procedure, as Quinn, Rosano, and Wooten (1988) did, shows that brown is, indeed, none other than blackened orange, but the sense of major qualitative alteration persists.

It is important to bear in mind that this appearance of strong qualitative differences is not a general characteristic of blackened colors, most of which resemble their parent hues. Blackened blues continue to look blue, blackened greens, i.e., olive greens, continue to look green. Only oranges and yellows seem to lose the parental connection when blackened. I would suggest that we have here an important phenomenal fact that has not been fully accounted for. To this problem, we might add another. Many light colors of red hue are called pink, particularly if they are whitish and a bit bluish. But whitish reds that are yellowish are called by a variety of non-basic terms, such as ‘salmon’ or ‘peach’, or ‘tan’. None of these terms is used very consistently or with much consensus among users. Boynton comments that this large region is unmarked by a basic color term, in distinction from other regions where basic terms are used, albeit with low consistency and consensus. There is no comparable region elsewhere in the OSA space.

Another question has to do with the absence of basic terms for two regions of binary hue that are marked by such non-basic terms as ‘chartreuse’ and ‘lime’ on the one hand, and ‘aqua’ and ‘turquoise’ on the other. If one takes a simple-minded look at the basic opponent-colors scheme, one finds no reason to expect that the yellow-green binaries and the green-blue binaries would be less salient than the yellow-red binaries that we call ‘orange’ and the red-yellow binaries that we call ‘purple’. And yet, Berlin and Kay tell us that there are many languages with basic terms for orange and purple, but not a single one with a basic term for either chartreuse or turquoise.

‘Chartreuse’ is not the most common of words. Many people think that it falls between pink and purple. Suppose, however, that one had a group of people who knew how to use the term correctly. Would their use of the term be closely analogous to their use of ‘orange’? Several years ago, Aleeza Beare and Michael Siegel (1967) had shown that in color-naming experiments when ‘yellow’ and ‘red’ were permitted but ‘orange’ forbidden, subjects would fully describe the spectral range around 590 nm in terms of ‘red’ and ‘yellow’, in complete accordance with the findings of Sternheim and Boynton. But if ‘orange’ were permitted in addition to ‘red’ and ‘yellow’, the ranges of both of these latter terms would be sharply constricted, and neither name would ever be used to describe a 590 nm stimulus. In similar fashion, in using the OSA chip set, Boynton never found a case in which a chip was called’yellow’ on one occasion and ‘red’ on another, by either the same or different subjects, whereas many chips were called ‘red’ on one occasion and ‘orange’ on another.

David Miller (1997) explored the comparative uses of ‘orange’ and ‘chartreuse’ by using a forced-choice color-naming procedure in which the available hue names were ‘red’, ‘yellow’, ‘green’, and ‘blue’ along with ‘orange’ and ‘chartreuse’. All of Miller’s subjects were told that chartreuse is "a greenish-yellow or yellowish-green about halfway between green and yellow". They viewed monochromatic lights ranging from 430 to 660 nm, and "were instructed that after a warning tone they would receive a hue term and then must push one of two buttons indicating whether the hue is ‘present’ or ‘absent’." The resulting identification functions for the six hue terms are good matches for the curves obtained in the more usual hue-naming experiments. However, the functions for ‘chartreuse’ and ‘orange’ relate quite differently to their neighbors. ‘Orange’ shows the expected behavior, restricting the ranges of ‘red’ and ‘yellow’, whereas ‘chartreuse’ seems essentially redundant, ‘yellow’ and ‘green’ behaving in the presence of ‘chartreuse’ much the way that ‘red’ and ‘yellow’ behave in the absence of ‘orange’. Taken with the Berlin and Kay data, this suggests a difference between the binaries that we would not have expected from the opponent scheme alone.

On the other hand, it must be acknowledged that it is nowhere written in stone that there can never be more than eleven basic color terms, and Berlin and Kay nowhere assert that nature imposes such an upper limit. Zollinger (1984), for example, has argued that ‘turquoise’ is becoming basic in German, and in a much more careful and thorough set of studies, Corbett and his associates (Corbett and Morgan 1988) maintain that in Russian there are two basic blue terms: ‘goluboi’, focused in sky blue, and ‘sinyi’, focused in navy blue. Comparable claims for a division of reds in Hungarian and of blues in Italian seem much more dubious. Even if Corbett’s group is right about Russian, it seems to be a perceptual fact that navy blue and sky blue look much more akin to each other than yellow and brown, and that fact demands a persuasive explanation.

We have seen that there are several lines of evidence to support Berlin and Kay’s distinction between basic and non-basic color terms, with six of the basic terms having as their referents the Hering primaries. The color categories that these six pick out appear to have a prototypic structure, with the foci for the most part positioned where visual science would have expected them to be, the generalization from the prototypic instances being achieved by mechanisms held in common by members of our species and our animal cousins. It is reasonable to expect that visual science will have a central role to play in explaining the details of color category organization, although it seems clear that our understanding of the color-vision system is not yet fully up to the task. But it also seems clear that although perceptual mechanisms can account for the saliences in our experience, cognitive mechanisms must be invoked to explain how and why certain of these saliences are seized on and exploited, while others play a minor role.

This becomes all the more apparent when we direct our attention to the processes by which color categories receive cultural elaboration and expression. We have every reason to suppose that human visual systems function the same way in almost everyone almost everywhere, whereas the stock of basic color terms varies from place to place and time to time. This is, as Berlin and Kay themselves came to see, not just a matter of labels coming to be attached to pre-existent categories, but of a development of the categories themselves. As Kay put it in a recent conversation,

[Eleanor Rosch’s work with the Dani] got Berlin and me thinking about the way in which the evolutionary sequence was conceived in Basic Color Terms. The way we had talked about it there was in terms of the successive encoding of color foci. The way that sort of talk went was: if a language has a word for D, then it necessarily has words for A, B, and C, where A, B, C and D are focal points of color categories. An image one might use for this conception of an ‘encoding sequence’ might contain a number of colored light bulbs, corresponding to these foci. Initially they’re all off; then the earliest languages turn on the black and white bulbs, and then at the next stage you turn on red, and at the next stage you turn on green, and so on. But, looking at Eleanor Rosch’s data gave us the idea that maybe it would be better to think of it in terms of a partition of the color space and to talk about the successive division of the cells of the original partition.

What Rosch (Heider 1972a and 1972b) had shown was that the two basic color terms of Dani language of Papua New Guinea, ‘mili’ and ‘mola’, divided the entire color space into two regions. The distribution of the categories that she charted was not a good fit for the previous Berlin-Kay gloss of such a Stage I color system as ‘black plus most dark hues’ and ‘white plus most light hues’. It would be more accurate to describe ‘mili’ as denoting as cool or dark colors and ‘mola’ as denoting warm or light colors

Kay and McDaniel (1978) proceeded to reconceive the developmental scheme as the successive division of macro-categories into smaller categories focused on the Hering primaries, and then the partition of the Hering categories into the other basic, or "derived" categories plus narrower versions of the Hering six. They suggested thinking of the macro-categories as fuzzy-set unions of the Hering categories, and of the derived categories as rescaled fuzzy-set intersections of pairs of the Hering six. Stage I systems (of which Dani is the only attested example) are followed by systems of the Stage II type, with three categories: a white category, a (red plus yellow) category, and a (black plus green plus blue) category. According to Kay and McDaniel, there were two types of four-term systems. Stage IIIa had a (white), a (red plus yellow), and a (green plus blue) and a black category. Stage IIIb had a white, a red, a yellow, and a (green plus blue plus black) category. One could, they suggested, think of Stage IIIa as having resulted from the (green plus blue plus black) category of Stage II splitting into (green plus blue) and black. One could think of Stage IIIb as having resulted from the (red plus yellow) category of Stage II splitting into separate red and yellow categories.

In the transition from IIIa to IV, the (red plus yellow) category splits into separate red and yellow categories. In the transition from IIIb to IV, (black plus green plus blue) splits into black and (green plus blue), often referred to as ‘grue’. Passing from Stage IV to Stage V, grue splits into green and blue, producing a system with exactly one basic color term for each of the Hering fundamentals. After Stage V, derived terms, based on fuzzy-set intersection, make their first appearance, and this process continues until the Stage VII languages such as contemporary English have developed. Hereafter, we shall call all of this "the Kay-McDaniel sequence" and the sequence plus the fuzzy-set interpretation "the Kay-McDaniel theory". The virtues of the Kay-McDaniel theory were recently set out by Kay in personal conversation:

As far as color per se is concerned the neurology that is devoted to strictly to color yields as output only the six Hering fundamentals. Then, the other categories, derived like orange, or composite like (black or green or blue), are dependent on the interaction of the strictly color-devoted circuitry with some kind of general cognitive circuitry that has to do with some kind of general category-forming operations that are not restricted to color. One way to interpret the Kay and McDaniel story is, then, that after you get the six Hering primaries, the color story is over, and you enter general psychology, a general cognitive psychology, as it operates on color stuff, but using psychological processes that may operate on other stuff besides color.

Two caveats are in order here. First, as we shall shortly see, there are reasons to wonder whether it is indeed the case that color science would have nothing else to contribute to the story. For instance, where else should we turn to find out why the warm category always divides before the cool category? Visual salience is surely needed in addition to general cognitive processes. Secondly, it is not clear that fuzzy logic alone will be able to do all the work that Kay and McDaniel ask of it. Orange is a case in point. According to fuzzy logic, since orange is the product (along with a scaling factor) of red and yellow, focal orange should be an example of both red and yellow, yet Boynton found no case in which a chip was called both red and yellow. Here, it is important to distinguish between the color category to which a colored sample is assigned and the hue components that can be distinguished in that sample. Samples called ‘red’ can have a distinct yellow hue component in them–a fact marked in English by the ‘ish’ suffix–without anyone’s being inclined to label the samples as ‘yellow’. Depending upon instructions, subjects can be driven to respond either to an overall category established by the dominant hue, or to the admixed hue. Someone who is asked to affix to a sample a monolexemic color word of her choice will likely respond quite differently from one who is presented with the same sample with the instruction, "Yellow: yes or no". To ask how the orange category is formed from the red category and the yellow category without relativising the question to the task demand or without making the category-component distinction can easily lead to confusion. Beyond that, the fuzzy-logic model would suggest that the behavior of orange and chartreuse should be the same, but Miller’s study indicates the contrary. Such considerations call the Kay-McDaniel theory into question, but leave theKay-McDaniel sequence intact–for the moment at least.

In the 1970s, while the Kay-McDaniel theory was being hatched, Berlin and Kay were planning the ambitious World Color Survey with the assistance of William Merrifield and the Summer Institute of Linguistics, which trains Protestant evangelists in linguistic techniques. The SIL students, who are dispatched to the most remote regions of the world, were instructed in color term elicitation, interviewing techniques, and the use of a standard randomized array of Munsell color chips as well as a miniaturized color sample array. The objective was to gather data in the field from a wide variety of languages, with 25 monlingual informants for each language. In addition to their efforts, and those of several other workers, Robert MacLaury conducted a wide-ranging Mesoamerican Color Survey, using the WCS methods as well as some innovations of his own. Altogether, the WCS now has analyzed data on 111 exotic languages. Berlin and Kay are now preparing the WCS results for publication, and MacLaury’s book on the Mesoamerican Survey (MacLaury 1996) and its theoretical interpretaion is now in press. These investigations have raised several methodological and theoretical issues. In particular, new data from the World Color Survey have required substantial complications to the Kay-McDaniel sequence (Kay, Berlin, Merrifield, and Maffi 1996). These have been recently discussed from different theoretical perspectives by Kay, Berlin, and Merrifield (1991) on the one hand, and by MacLaury (1992) on the other.

I shall have nothing particular to say about these complications, save for two remarks. The first of these is that the results from some of the languages in which there seem to be very broad categories embracing mid-lighness yellow, green, and blue cease to be quite so puzzling when the categories are seen as marking distinctions primarily in brightness rather than in hue. MacLaury (1992) has suggested a separate brightness sequence for some languages that later merges with Kay-McDaniel hue sequences. Kay is sympathetic to modifying the sequence along these lines. This would correct a bias in the Berlin-Kay tradition so far in favor of hue categories that has been pointed out from the beginning by Hickerson (1971) and others. As Ronald Casson (1996) tells us, "The eight Old English terms that survived and evolved into basic color terms were brightness terms that had minor hue senses (except red and green, which had major hue senses)."

My second comment is that the basic color vocabularies of the great majority of languages are consistent with the Kay and McDaniel sequence. The exceptions to it are striking and interesting, but they are few. Furthermore, skepticism from some quarters notwithstanding, there are solid independent reasons for taking it to be describing an historical process. Historical linguists have performed several reconstructions of earlier states of present-day languages, proto-Mayan, proto-Polynesian, and Anglo-Saxon for example, and have generally found the roots of their basic color terms to be in accord with the Kay-McDaniel sequence. And some of the WCS interviews are best interpreted as marking languages in transition from an earlier to a later stage, with the speakers who use a new basic term tending to be younger than those speakers who do not use the new term. It is of no small interest that in virtually all cases linguistic forms that undergo development in time devolve as well as evolve. According to linguistic typologist Greville Corbett (oral communication), there are just two known counterexamples to this: numeral systems and systems of basic color terms.

There remain the questions of what drives the evolution of basic color terms, and what, beyond the resources of fuzzy-set logic, constrains the patterns of basic color term development. The answer or, more likely, answers to the first question lie at least in part within the domain of social theory. Some authors see the development of basic color terms as related to social complexity, though what one is to understand by "social complexity" is a question that is bound to be provocative. Others link the development to color technology; Ronald Casson (1996) finds a burgeoning of hue terms in early modern English that is contemporaneous with the growth of the dye industry. Berlin and Kay themselves speculate that the development of systems of basic color terms is driven by the need to communicate about color in the absence of standard exemplars and shared assumptions about the environment. MacLaury wants to appeal to individual motivation, for example in a penchant for wanting to mark finer distinctions than one’s stock of basic color terms currently permits.

More to the point of our concerns here is to identify the sources of constraint in the develpment of basic color terms. Why do the Dani divide color space into a warm-light (mola) region and a dark-cool (mili) region? An answer that occurs to us all is implicit in the gloss itself: the colors called ‘warm’ and the colors called ‘cool’ seem to form natural resemblance classes. The division is robust, and the same cross-modal semantic labels are reliably used in different languages: Sivik and Taft (1990, 1991) have determined this for English, Swedish, Russian, and Croatian. An interesting question here is whether red and yellow on the one hand, and blue and green on the other, are lumped together because of their association with warmth and coolness respectively, or whether there are intrinsic resemblances that are marked by the associational terms. I strongly incline to the second opinion, and I suspect that most people do. But if this view is correct, there is an important phenomenal fact to be accounted for, and not much in the way of currently available resources to do the accounting. (It is interesting to notice that Hering gave red and yellow positive values, and green and blue negative values. This assignment of sign to chromatic valence has persisted to the present, although it is now officially viewed as only conventional.) Is this a question for visual rather than cognitive science? I am inclined to think so, but I would like to see the whole matter investigated more fully. An important first step in exploring this matter was undertaken by Katra and Wooten (1996), who found that subjects’ ratings of color chips as "warm" or "cool" closely tracked their opponent-response functions.

If there were to be such an investigation, it would have to take account of some complicating facts. First of all, it is not really correct to say that the warm-cool distinction is a matter of red and yellow versus blue and green. The poles seem to be focal orange on the one side and dark blue on the other. Artists typically see yellowish greens as warm. In establishing his color-order system, Albert Munsell assigned the designation ‘5G’ to a green that was, in his words, "neither warm nor cool". This may have something to do with one of the exceptions to the Kay-McDaniel sequence that turned up in the Mesoamerican and World Color Surveys. Some Northwest American Indian languages have a composite (yellow plus green) category (MacLaury 1987). A second complication is that although hue plays an important role in subjects’ judgments of warmness and coolness, Sivik and Taft’s investigations showed that blackness and chromaticness play a role as well. The problem here is that according to Sivik and Taft, blacker colors are judged as "warm" and whiter colors judged as "cool", whereas not only the Dani, but all other early-stage languages form composite categories in which black and many dark colors are combined with greens and blues, whereas white and many light colors are linked with reds and yellows.

Another striking feature of the Kay-McDaniel sequence, to which the World Color Survey provides no exceptions, is the invariable tendency of the "warm" macrocategory to divide before the "cool" macrocategory. There are many languages, particularly in mesoamerica, that have basic categories for red, yellow, and white, along with an intact "grue"–green plus blue–category. There are, on the other hand, no languages with separate basic terms for green and blue that retain an undifferentiated red plus yellow category. Robert MacLaury has remarked in conversation that this would be readily explained if green were more similar to blue than red is similar to yellow. This seems intuitively plausible, and it is certainly interesting to notice that people seem to argue more frequently about whether a given sample is "really" blue rather than green than whether it is "really" red rather than yellow. And we have already noted the rather striking fact, remarked upon by both Sivik and Boynton, that green and blue cover large areas of phenomenal color space whereas the "warm" region is differentiated much more finely into red, yellow, orange, pink, and brown.

Perhaps we can now begin to move beyond considerations of mere plausiblity on these matters. A suggestive recent study by Fenton (1997), whose investigations were conducted quite independently of the Berlin-Kay tradition, gives some experimantal support to MacLaury’s conjecture. Fenton asked 40 subjects in their twenties to judge by pairs, on a seven-point scale, the similarity of the appearance of each of the "most typical" examples of the colors red, blue, green, yellow, black, and white. Analyzing the results with a weighted multidimensional scaling method, Fenton derived a three-dimensional group color space in which colors judged more similar were closer together and colors judged less similar were further apart. Blue and green were quite close together, as were white and yellow, with red removed from the other colors by the greatest distance, though closer to yellow than to blue or green. Black was closer to blue and green than to red or yellow, which connects with the WCS finding that in early stages of color categorization, black tends to be associated with blue and green.

This may be compared with an ealier experiment by James Boster (1986) that seems never to have been repeated. Boster presented each of his subjects, 27 English-speaking adults that were naive to the purposes of the experiment, with eight color chips, which were focal Munsell examples of red, orange, yellow, green, blue, purple, black, and white. The instructions were as follows:

What I would like you to do is sort these colors into two groups on the basis of which colors you think are most similar to each other. Now, I don’t want you to do it on the basis of anything that you’ve learned in school, like the difference between primary and secondary colors, nor on the basis of any personal associations you might have with the colors, such as which colors you like and dislike, just try to make two natural groupings. Imagine you speak a language that only has two color words, how would you choose to divide up the colors and which colors would you put together in each group.

The initial division was recorded, and subjects were asked to take one of the groups and divide it, then take the other group and divide it. The procedure was repeated until all of the chips were separated. The results of the sort were compared with the outcomes of a sorting procedure undertaken by another group of 32 experimentally naive English speakers, which was similar except that the experimental materials were English color words rather than chips. (Fenton’s subjects also used words rather than chips, being asked to base their similairty judgments on imagined "best examples.") Statistical analysis of the results of the two sorts performed by Boster’s subjects showed them to be closely similar, and both were highly correlated with the Kay-McDaniel basic-term sequence. For example, in the first sort, 67% of the subjects in the non-verbal group and 62% of the verbal group replicated the Dani division, with white, yellow, orange, and red in the first pile, and green, blue, purple, and black in the second. Members of these majority groups, according to Boster,

...completed the task more rapidly than other subjects. They tended to describe the (W, Y, O, R) group as ‘light’, ‘bright’, or ‘warm’ colors, and less frequently as ‘brash’, ‘vibrant’, ‘loud’, and ‘glowing’. They most often described the (G, B, P, K) group as ‘dark’ or ‘cool’ colors, and less often ‘soothing’, ‘receding’, ‘relaxing’, ‘deep’, ‘subdued’, ‘shady’, ‘quiet’, and ‘drab’. In general, most of the descriptions of the "light" colors connote energy and irritation, while the descriptions of the ‘dark’ colors are almost uniformly tranquil.

Boster’s experiment seems to me to be pregnant with possibilities that cry out for elaboration and extension. For example, would the results be the same if performed by speakers of a Stage III or a Stage V language? What would happen if larger numbers of chips had been used, and the division continued? How would chartreuse chips fare when compared with orange chips? Would Russians divide blues into a sinyi , dark blue group and a goluboy , light blue group? Would, say, Turks make such a division? But beyond all this, what would explain the regularities in the partitions? Taken together, the Boster and Fenton studies suggest that perceived similarities determine the development of color categorization, but neither tells us what lies behind those perceptions of similarity. Basic color language seems to be tracking visual saliences, but what governs those saliences remains obscure.

I think that this brief survey gives us good reason for thinking that the chief features of the Berlin-Kay synchronic findings remain intact, as do the chief trends of the Kay-McDaniel sequence. These features and trends are real and robust, but the grand picture calls out for completion. Psychophysicists need to establish the detailed character of the visual saliences that that underly basic color categories and macro-categories, and ask how biological mechanisms might account for those saliences. Cognitivists need to ask about the dynamics that propel the development of color categories, and whether those dynamics are operative in the formation of other kinds of basic experiential categories as well. Philosophers need to mark the proper conceptual distinctions so that the emerging picture can be kept in focus, as well as consider whether this should serve as a test case for theories of how language is fitted to the world. There is work here for everybody.




Beare, A.C. and M.H. Siegel 1967. Color name as a function of wavelength and instruction. Perception and Psychophysics 2(11): 521-527.

Berlin, B. and P. Kay 1969. Basic Color Terms: Their Universality and Evolution. Berkeley: University of California Press.

Berlin, B., P. Kay, and W.R. Merrifield 1985. Color term evolution: Recent evidence from the World Color Survey. Paper presented at the 84th Annual Meeting of the American Anthropological Association, Washington, D.C.

Bornstein, M.H., W. Kessen, and S. Weiskopf 1976. The categories of hue in infancy. Science 191(4223): 201-202.

Boynton, R.M. and C.X. Olson 1987. Locating basic colors in the OSA space. Color Research and Application 12: 94-105.

Boynton, R.M. 1997. Insights Gained from Naming the OSA Colors. In C.L. Hardin and L. Maffi (eds.) Color Categories in Thought and Language. Cambridge: Cambridge University Press.

Boster, J.S. 1986. Can individuals recapitulate the evolutionary development of color lexicons? Ethnology 25(1): 61-74.

Casson, R.W. and P.M. Gardner 1992. On brightness and color categories: Additional data. Current Anthropology 33(4): 395-399.

Casson, R.W. 1997. Color Shift: Evolution of English Color Terms from Brightness to Hue. In C.L. Hardin and L. Maffi (eds.) Color Categories in Thought and Language. Cambridge: Cambridge University Press.

Corbett, G. and I. Davies 1997. Establishing basic color terms: Measures and techniques. In C.L. Hardin and L. Maffi (eds.) Color Categories in Thought and Language. Cambridge: Cambridge University Press.

Corbett, G. and G. Morgan 1988. Colour terms in Russian: Reflections of typological constraints on a single language. Journal of Linguistics 24: 31-64.

Fenton, Mark 1997. Psychological approaches to the investigation of colour: Meaning and representation. Spectrum (The Journal of the Colour Society of Australia) 11: 4-9.

Hardin, C.L. 1988. Color for Philosophers: Unweaving the Rainbow. Indianapolis: Hackett Publishing Company.

Hardin, C.L. and L. Maffi (eds.) 1997. Color Categories in Thought and Language. Cambridge: Cambridge University Press.

Heider [Rosch], E. 1972a. Universals in color naming and memory. Journal of Experimental Psychology 93(1): 10-20.

Heider [Rosch], E. 1972b. Probabilities, sampling and the ethnographic method: The case of Dani colour names. Man 7(3): 448-466.

Hickerson, N. 1971. Review of Basic Color Terms: Their Universality and Evolution, by B. Berlin and P. Kay. International Journal of American Linguistics 37: 257-270.

Katra, E. and B. Wooten 1996. Perceived lightness/darkness and warmth/coolness in chromatic experience. Submitted to Color Research and Application.

Kay, P. 1975. Synchronic variability and diachronic change in basic color terms. Language in Society 4: 257-270.

Kay, P. and C.K. McDaniel 1978. The linguistic significance of the meanings of basic color terms. Language 54(3): 610-646.

Kay, P., B. Berlin, and W.R. Merrifield 1991. Biocultural implications of systems of color naming. Journal of Linguistic Anthropology 1(1): 12-25.

Kay, P., B. Berlin, W.R. Merrifield, and L. Maffi 1997. Color naming across languages. In In C.L. Hardin and L. Maffi (eds.) Color Categories in Thought and Language. Cambridge: Cambridge University Press.

MacLaury, R.E. 1987. Color-category evolution and Shuswap yellow-with-green. American Anthropologist 89: 107-124.

MacLaury, R.E. 1992. From brightness to hue: An explanatory model of color-category evolution. Current Anthropology 33: 137-186.

MacLaury, R.E. 1997. Color and Cognition in Mesoamerican Languages: Constructing Categories as Vantages. Austin: University of Texas Press.

Maffi, L. 1988b. World Color Survey Typology. University of California, Berkeley: Unpublished ms.

Maffi, L. 1990a. Cognitive anthropology and human categorization research: The case of color. University of California, Berkeley: Unpublished ms.

Maffi, L. 1991. A Bibliography of Color Categorization Research 1970-1990. In Basic Color Terms: Their Universality and Evolution, by B. Berlin and P. Kay, 1st paperback edition. Pp. 173-189. Berkeley: University of California Press.

Matsuzawa, T. 1985. Colour naming and classification in a chimpanzee (Pan troglodytes). Journal of Human Evolution 14: 283-291.

McDaniel, C. K. 1972. Hue perception and hue naming. Unpublished B.A. Thesis, Harvard University.

Miller, D. 1997. Beyond the Elements: Investigations of Hue. In C.L. Hardin and L. Maffi (eds.) Color Categories in Thought and Language. Cambridge: Cambridge University Press.

Quinn, P.C., J.L. Rosano and B.R. Wooten 1988. Evidence that brown is not an elemental color. Perception and Psychophysics 43(2): 156-164.

Ratliff, F. 1976. On the psychophysiological bases of universal color terms. Proceedings of the American Philosophical Society 120(5): 311-330.

Sandell, J. H., C.G. Gross, and M.H. Bornstein 1979. Color Categories in Macaques. Journal of Comparative and Physiological Psychology 93: 626-35.

Sivik, L. 1997. Color systems for cognitive research. In C.L. Hardin and L. Maffi (eds.) Color Categories in Thought and Language. Cambridge: Cambridge University Press.

Sivik, L. 1985. Mapping of color names in NCS. Proceedings of the 5h Congress of the AIC, Monte Carlo, Monaco.

Sivik, L. and C. Taft 1990. Semantic variables for the evaluation of color combinations–An analysis of semantic dimensions. Göteborg Psychological Reports 19( 5).

Sivik, L. and C. Taft 1991. Cross-cultural studies of color meaning. Proceedings of AIC-conference on Color and Light ‘91, Sydney.

Sternheim, C.E. and R.M. Boynton 1966. Uniqueness of perceived hues investigated with a continuous judgmental technique. Journal of Experimental Psychology 72: 770-776.

Uchikawa, K. and R.M. Boynton 1987. Categorical color perception of Japanese observers: Comparison with that of Americans. Vision Research 27: 1825-1833.

Yoshioka, T., B.M. Dow, and R.G. Vautin (1996). Neural mechanisms of color categorization in areas V1, V2, and V4 of Macaque monkey cortex. Behavioural Brain Research. 76: 51-70.

Zollinger, H. 1984. Why just turquoise? Remarks on the evolution of color terms. Psychological Research 46 (4): 403-409.