An essay using words, not pictures, by Phil Eklund, Dec 2014
This is a guest post by Phil Eklund, founder of Sierra Madre Games and designer of the board games Origins: How We Became Human, High Frontier, Pax Porfiriana, and Greenland. This essay is to be published in Neanderthal, a board game currently in development by Sierra Madre Games about the origins of art, culture, and language. Pre-orders for Neanderthal will be taken starting May 01, 2015, at www.sierra-madre-games.eu.
Read Phil’s previous guest post on Greenland, games, simulation, and reality.
Are you a visual person or a phonetic one? Do you think better in images or words? Most would insist upon the former. But most are wrong.
A MODERN HUNTER-GATHERER.
Introspect on one of your thoughts. Pick a simple one, perhaps “I should go to the market today.” A string of seven words with grammar and syntax. For a phonetic person, this is a few bytes in temporary memory, enabling a life-sustaining decision within a fraction of a second. How would a visual person express this thought and come to a decision without words? Perhaps with a video of herself walking to the market. This would take megabytes of memory and a minute of time.
THE PRONOUN PROBLEM.
But this is only the beginning of difficulties for the visual person. A simple video of a walking person in no way expresses the pronoun “I” as an analog for oneself. Animals do not recognize or identify with pictures or movies made of themselves. The very concept of a pronoun can’t be expressed in images, perhaps the reason why primitive tongues, for instance in the pre-Columbian New World, do not have the pronoun concept.
UNIT FORMATION.
Other words in our sample thought that cannot be translated into pictures include the modal verb “should,” the preposition “to,” and the adverb “today.” Yet all are vital for the proper decision-making enactment of this thought. The verb “go” can be expressed in a movie, but only in an excessively concrete-bound fashion. For instance, a sequence showing a walking person excludes the possibility of alternatives, such as a bike or car. But for a phonetic person, the verb “go” expresses the action without specifying the means. This “unit-formation” is very much like algebra, where “X” is a unit of any unspecified value. Unit economy enables human cognition to reduce a vast amount of sense information to a minimal number of units, and the unit becomes the link between mathematics and reality.
MEASUREMENT.
A standard named as a word automatically becomes a unit appropriate for the first measurements of time or distance beyond the perceptual level–for instance, “X” days, or “X” feet. Tally sticks, the first archaeological evidence of measurement, date to more than 20,000 years ago.
WORDS VERSUS CONCRETES.
This “algebraic” flexibility of a word encapsulates the essence of something while leaving unnecessary concretes out. A photo doesn’t and can’t. Further, a word offers enormous flexibility in terms of input/output. It can be spoken, thought, gestured (as in sign language), written, grammatically combined with other words, or stored with very little memory. A photo can’t. Words are altered by syntax and grammatical endings. A photo can’t be modified in this way, other than the temporal sequence in which a series of photos are viewed. As Aristotle proved in the Organon, words can be logically combined into propositions, arguments, premises, and conclusions. Finally, a word can be metaphoric, thus opening up mental “portals” and integrating elements through induction. A photo can’t.
For instance, the phonetic person uses the noun “market” as a placeholder for any market, leaving the decision of which market for a later thought. But the concrete-bound visual person must picture a specific market. Imagine what this means if she actually tried to come to a decision without using any words. She would have to run a video in her head of every possible means of transport going to each possible destination. If selecting between three modes of travel and three possible destinations, she would have to watch nine films, recall the details of each one, and then choose among them, making all the decisions at the same time. Add just a few more variables, and the human head would run out of memory space (video is a memory hog). But a phonetic person would think: “I should go to the market today. Maybe that new one I haven’t tried yet. But I’ll take the car, since the forecast is for snow.” This leads to three reasoned decisions, made one after the other, with almost unlimited syntactic variability! This is because every word is an economy of thought worth a thousand pictures.
Aristotle realized that words for qualities or essences rather than concretes, such as “market” and “go,” were the common ground between the objective and subjective worlds, between reality and our metaphors for reality. In this, he disagreed with his teacher Plato, who maintained that qualities were in a higher plane of reality, of which we only experience the shadows. Today, neo-Platonism is the dominant philosophy, although a few Aristotelians, such as myself, remain.
SELF-COMMUNICATION.
Notice that the primary use of words is communicating with yourself, not with others! The key is that a phonetic person is capable of unit-formation, defining “unit” as “a concrete regarded as a separate member of a group of multiple similar members.” For instance, various specific markets are units subsumed under the concept “market.” The ability to regard entities as units is distinctive to modern lingual humans. This is why animals and pre-lingual babies are unable to count beyond the number of concrete objects they can subitize (i.e. metaphysically perceive, up to about seven objects).
THE VERB PROBLEM.
Koko and Kanzi, the famed primates able to communicate using American Sign Language, are quite adept with adjectives and nouns but seem unable to formulate verbs or sentences. What is the problem with verbs? Because every action needs an actor, a verb cannot be pictorially visualized without a noun. This bonds the verb to a specific actor: “I go to the market.” But unlike a picture, the word “go” divorces the action from the actor, as well as from specific means of going. Koko and Kanzi seem able to use gestural words with others but not as a unit of thought.
VERBAL HALLUCINATIONS.
99% of your thoughts are verbal hallucinations. You imagine hearing the spoken word rather than seeing the written word. Try counting in your head from, say, 10 to 20. Introspect. You mentally heard the words “ten,” “eleven,” “twelve,” etc. in your preferred language. Now try again, appending images of the Arabic numbers “10,” “11,” “12,” etc. to the spoken words. This slows down your counting, but with practice, it can be done. Now try imaging the “10,” “11,” “12,” etc. without hearing the words. You will find it impossible. Purely notational icons such as Arabic numerals, punctuation marks, arrows, stop signs, etc. are useful only in their written form. Only icons you can “hear” can be retrieved and processed as verbal hallucinations, which is why you can speak before you can read.
Much of this “aha” cognition takes place below our verbal stream of consciousness, including judgments, reasoning, induction, pattern recognition, and learning. Oceans of information are automatically processed this way using ancient pathways, as revealed in the Marbe experiments at the Würzburg school. I do not call these cognitions “subconscious,” reserving this term for the daily routines we consciously program ourselves to perform automatically.
THE BIG USELESS BRAIN MYSTERY.
The human and Neanderthal brains reached their modern size 200 thousand years ago (kya), yet the archaeological record shows no advances in tools or behavior until 45 kya. What were they doing in all that time with their huge brains? Perhaps building up a social vocabulary used for courtship, dominance, and status games. Perhaps to verbally express who dominates who and who belongs to who. These vocalisms, like all animal communication such as warning cries, birdsong, and snarls, are expressions of behavior but play no internal role on how behavior is decided. But one day, the first word was uttered, defining “word” as a vocalization that can be both spoken and used as a cognitive unit.
THE FIRST WORD.
What was this first word? Perhaps someone’s name! Names associated with rank or class may have been used for millennia and imitated through the generations. But on one fertile day around 45 kya, someone used a verbal hallucination of a name as a cognitive placeholder, perhaps to untangle a social problem. This was the first word, and the first verbal thought, as well as the most momentous invention in human experience. She was not conscious of her thought, as consciousness was still thousands of years in the future. But sharing spoken and mental concepts was an effective means to mentally manipulate the named person. And the spread of words from the social domain to the brain domains associated with technical and natural history knowledge allowed explosive progress in tools and food storage.
Many young birds and mammals use an “imprinting” algorithm to recognize their mother and others of their species. This mechanism, quite distinct from associative learning, occurs in the brief period after the beginning of locomotion and before the onset of fear. It’s very speculative, but perhaps the communicative sound of the word “mama” in a prehistoric family became imprinted on a child, who was able to use this sound in her thoughts as an icon for her mother. If so, the very first word may have been the first word of most children: “mama.”
VERBAL MEMORIES.
The use of words allowed human memories to be stored in the new verbal format. The few exceptions seem to be sensations, such as tastes, smells, and touch. But what about remembered and imagined images anyone can form in one’s head? Today’s brain has the remarkable ability to retrieve strings of words out of storage and use them to create a manipulable image in a special mindspace, sometimes called the tabula rasa (blank slate). By making these verbally-formed images dance in this mindspace, we can use narratization to compare alternatives and come to a decision.
If a person is asked to give details on a particularly vivid memory, it becomes quickly clear he is not examining a mental photograph accurately retrieved pixel by pixel. A simple question, “What is the shape of the frame of your memory?” turns out to have no answer. The image has no frame, no boundary, exactly as if its elements were conjured up from words rather than a certain number of rows of pixels. If the memory is a dynamic one, and the person is asked at what point does the film stop and replay, again, there is no basis for an answer. Nor can the person give a consistent response to the spatial relationship between remembered elements. Every policeman knows that eyewitness accounts are valid only for certain notable elements, as if they were reconstructing the visual crime scene from a description in a book. The gaps are filled logically, using general data the brain has stored about the category. “Why, of course the perp had five fingers on his hand, officer; every hand has five fingers.” If memory were like a surveillance tape, one could in theory run the tape backwards to a certain day when one was looking out a window, stop the film, then write down the license number of a car going by. But the verbal data compression of memory is far more parsimonious than that.
Memories stored as sentences explain why you can’t recall scenes from infancy, before you had the vocabulary to reconstruct images. Infants are able to learn, just as a flatworm in a T-maze can learn. But learning to turn left in a T-maze is distinct from remembering to “straight-left-straight,” a process that requires mastering the verbal concepts of “straight” and “right” and “left.”
“Memory artist” savants, such as Franco Magnani and Stephen Wiltshire, have the remarkable capacity to paint detailed scenes from childhood memories, or from seeing a landscape just once. Comparing their work to photographs is stunning. But the distortions are revealing: depicting scenes from a perspective the painter could not have attained, combining elements from different points of view, shifting spatial relationships, exaggerating some features, and introducing anachronisms–exactly the distortions you would expect if the image was constructed from a rich set of verbal instructions. This is not “photographic memory.”
Are memories verbal or visual? A simple experiment anyone can perform gets to the truth of the matter. Go to your library and open any book to a random page. Look at it for a few seconds, enough to get a mental image. Now try to reconstruct the page from memory. If the page were stored as a picture, you should have a blurred photograph in your mind’s eye. Maybe a few letters would be recognizable, and you should be able to count the lines of text, even if they are too blurry to read. In this photo, you should be able to see on which line of text each of the letters you can recognize are located. Of course, a real remembered reconstruction is vastly different. A few snippets of phrases are remembered, but their position on the page is unknown. This is because the memory is stored verbally.
Incidentally, in this experiment, 999 times out of 1000, the random page you open will contain far more information in verbal format than in picture format. This is because all book writers know that it is far easier to express any significant idea with words than pictures, and each word is worth a thousand pictures. And because the printed word resonates with the hallucinated spoken words with which we all think, it’s unlikely the humble book will ever be supplanted by audiovisual gizmos.
PERCEPTS.
Cognitive scientists do not yet know how animals and pre-lingual humans process percepts, which is the name for concepts formed by direct perception, without the use of words or unit-formation. A baboon, for instance, might see only the tail of a lion, prompting an integration plunking the percept “lion” into its flight or fight algorithm. Evidence suggests that percepts are formed in the “processing” right side of the animal brain and enacted in the “executive” left half. Communication between the halves encrypts these percepts in a proto-word format but is unable to categorize them into units. The new verbal format surmounts this limitation.

Eklund’s upcoming game Neanderthal explores some of these elements of human cognitive development (image taken from the in-beta digital implementation on VASSAL).
COGNITIVE FLUIDITY.
For almost their entire history, neither Neanderthals nor modern humans showed any trace of what we would call “culture,” exemplified by art or religion. According to Leda Cosmides and John Tooby, this is because the ancient human brain was compartmentalized into social, technical, and natural history domains. Information from one domain seems to have been unable to be utilized in other domains. For instance, a human adept in knapping was puzzlingly unwilling to whittle antler or bone, perhaps because stone was mentally processed in the technology domain while bones were processed in the “natural history” domain. Cognitive fluidity in self-communication was achieved only when the new verbal format surmounted this software incompatibility.
As words from one domain got entrained in the context of another, this crosstalk sparked the “cultural revolution” of 40 kya, when the first carved bone and antlers points, cave paintings, figurines, flutes, burials, and other cultural artifacts appear. By mixing domains, humans could for the first time express a myriad of complex innovations: multicomponent tools made from materials other than sticks and stones (the first technology), references to elders after they had died (the first proper names and grave goods), female statues acting as the authority for actions (the first idols), and anthropomorphic animals that could walk and talk (the first jokes).
THE CRUCIAL PRONOUN.
The discovery that the famous cave paintings had been continually retouched for thousands of years suggests they were used as blackboards, perhaps to reinforce the neural pathways used in “blank slate” formation, visually and tactilely reinforcing the neural pathways used to form pictures from words and to regard animals as units. This didactic purpose explains why the paintings were so eerily similar throughout the inhabited world. The handprint stencils found on so many cave walls may have been the first attempts to articulate the pronoun “I,” a crucial concept on the road to consciousness.
-Phil Eklund, Dec 2014
Phil Eklund, rocket engineer, professional game designer and producer, recently moved from Arizona to Germany. He specializes in “experience games,” simulations that cover a comprehensive sweep of all aspects of a subject in unprecedented detail. He has always been fascinated by the rules that run the world, and the first step to discovering such rules is believing that they exist.