Is It From the Birds? Stephen Sondheim Asked the Right Question About Music and Then Preferred Not to Hear the Answer
In November of 1997, Stephen Sondheim sat in his Manhattan townhouse with Mark Eden Horowitz, a senior music specialist from the Library of Congress, and said something extraordinary. Not extraordinary in the way that most Sondheim quotes are extraordinary, which is to say technically precise and laced with a craftsman’s impatience for imprecision. Extraordinary because it was none of those things. It was, instead, the sound of a man who had spent his entire adult life inside music admitting that the existence of music itself was something he could not explain.
A Concordance for Future Scholars
The moment circulates now as a sixty-second clip on social media, stripped of its original context, which was a three-day filmed interview session in which Horowitz, with Sondheim’s manuscripts spread before them, asked the composer to walk through his compositional process show by show. The interviews were intended as a concordance for future scholars. They were the opposite of a talk-show appearance. No audience. No applause. No performance. Just Sondheim, seated alone, head slightly bowed, speaking to the table as much as to Horowitz, working something out in real time.
View this post on InstagramA transcript of the interview clip follows.
Music is a magical art. I don’t know how the human mind ever got to it, because everything else is somehow representational and literal, including painting, but not music. How did that happen? Is it from the birds? What is that from? How do we make music? I can understand vaguely how man learned to speak, because he had to communicate things, but what is this? How did man learn to whistle?
I mean, you know, how do we, and where does the 12-tone scale come from? And blah, blah, blah. And I’m ill-educated this way, so you could probably answer, but it seems to me miraculous. To me, it’s as mysterious as astrology, but unlike astrology, completely believable.
That final line is perfectly constructed. The setup is slow, exploratory, uncharacteristically loose in its syntax, and the payoff lands with the timing of a man who has spent fifty years placing stress on the right syllable. He knows where the laugh is, even in a room with one other person and a camera crew. The performance of the punchline does not cancel the sincerity of the question, though. Both things are happening at once: Sondheim is bewildered, and he is shaping his bewilderment into a deliverable thought. That is what writers do. It does not make the bewilderment false.
Auditory Cheesecake
The question Sondheim is asking is real. It is also old. Darwin raised it in The Descent of Man in 1871, speculating that music might have preceded language as a mechanism for sexual selection, the way birdsong functions in mate attraction. That hypothesis has never been conclusively confirmed or refuted. In the century and a half since, the evolutionary origins of music have generated an extraordinary volume of competing theories and almost no consensus.
Steven Pinker, the cognitive psychologist, famously dismissed music in 1997 (the same year Sondheim was speaking to Horowitz) as “auditory cheesecake,” a byproduct of neural systems that evolved for language processing, spatial reasoning, and emotional regulation. Music, in Pinker’s account, is a pleasure technology that exploits pre-existing cognitive architecture without having been selected for independently. It is, in his framing, an accident of evolution that happens to feel important.
That position was immediately and rightly challenged. The ethnomusicologist John Blacking had argued decades earlier that music-making is a universal human competence, not a specialized talent, and that its presence in every known human culture suggests something more than parasitic exploitation of other cognitive systems. Aniruddh Patel, working at the intersection of neuroscience and music cognition, demonstrated that music and language share neural resources but are not identical processes, and that musical training reshapes the brain in ways that pure language exposure does not. If music were merely cheesecake, it would not leave structural traces in neural architecture.
More recent work has proposed that music is adaptive in its own right: it facilitates infant bonding (lullabies are cross-culturally universal), it coordinates group movement (work songs, military cadence, ritual drumming), it signals coalition membership, and it regulates emotion in ways that have direct survival implications. The anthropologist Joseph Jordania has argued that early hominid group singing and rhythmic movement served a defensive function, producing a coordinated display that deterred predators. Whether or not one accepts that specific mechanism, the broader point stands: music does things in human social life that are not easily explained as side effects of language processing.
So when Sondheim asks “How did that happen? Is it from the birds?” he is asking a question to which the honest scientific answer, even now, is: we do not know for certain. The question is legitimate. What is less legitimate is the framework he wraps around it.
The Option of Representation
“Everything else is somehow representational and literal, including painting, but not music.”
This is wrong, and it is wrong in a way that a man of Sondheim’s cultural literacy should have caught. Painting is not inherently representational. The entire history of abstraction in visual art, stretching from Kandinsky’s first non-objective watercolors in 1910 through Mondrian’s grids, Rothko’s color fields, Agnes Martin’s trembling pencil lines, and the whole of Abstract Expressionism, demonstrates that painting can operate on precisely the same non-referential plane that Sondheim claims is unique to music. When you stand in front of a Rothko and feel something move in your chest, you are not decoding a representation. You are responding to organized color, proportion, and scale in a way that is structurally identical to responding to organized sound. Neither the painting nor the chord “means” anything in the propositional sense. Both produce experience without reference.
Sondheim, who loved puzzles and who approached problems with a logician’s temperament, is drawing a boundary here that does not hold. His category error is instructive, though, because it reveals what he actually means. He does not really mean that painting is always literal. He means that painting can be literal, that it has the option of representation, and that this option gives it an explicable origin story: early humans needed to record what they saw, so they drew on cave walls. Language has a similar origin story: early humans needed to coordinate hunting and warn each other of danger, so they developed vocalizations that referred to things in the shared environment. Music, in Sondheim’s framing, has no such origin story. It does not point at anything. It does not carry survival-critical information. It simply exists, and everyone responds to it, and nobody knows why.
This version of the argument has problems, too. Language is not purely functional. If language existed only to communicate propositional content, poetry would not exist. Lullabies would not exist. Glossolalia would not exist. The musical qualities of speech itself (prosody, rhythm, pitch contour, the rise at the end of a question, the drop at the end of a declaration) are not informational features. They are expressive features, and they sit on a continuum with music rather than on the opposite side of a clean divide. The boundary between speech and song is blurry in practice, and several researchers (including the musicologist Steven Brown) have proposed that music and language descended from a common proto-expressive system that only later differentiated into separate streams. If that model is correct, then Sondheim’s framing of language-as-communication versus music-as-mystery is not a real opposition. It is a retrospective illusion created by looking at two branches of the same tree and asking why one of them has leaves.
You Cannot Fact-Check a Melody
Strip away the sloppy premises, though, and something solid remains. Music’s relationship to meaning is unlike language’s relationship to meaning, and this asymmetry is a structural feature of the two systems, not a romantic invention of composers protecting their guild secrets.
A sentence can be true or false. “The cat is on the mat” is either an accurate description of a state of affairs or it is not. A chord cannot be true or false. A C minor triad is not making a claim about the world. It is not referring to anything outside itself. You cannot fact-check a melody. Music operates in a domain where the very concept of reference, which is foundational to how language generates meaning, does not apply.
Music produces meaning anyway. Not propositional meaning, not the kind that can be paraphrased or translated into another form without loss, but experiential meaning: the sense that something has been communicated, that you have understood something that was not said. When the bassoon opens Stravinsky’s Rite of Spring in that strained high register, you feel physical unease. When Sondheim’s own score for Sweeney Todd drops that Bernard Herrmann chord into the orchestration, the audience’s bodies register dread before their minds process the harmonic information. These are real effects with real neurological substrates. The amygdala responds to certain dissonant intervals. Rhythmic entrainment synchronizes motor cortex activity across listeners. The dopaminergic system fires in anticipation of harmonic resolution. The mechanisms are increasingly describable. The description does not dissolve the mystery, because knowing that dopamine is released when a suspended chord resolves does not explain why organized sound produces subjective experience in the first place. It only pushes the question back one level.
Sondheim’s question, the one underneath his stated question, was not really “where does the 12-tone scale come from?” That question has a technical answer. The equal temperament system is a mathematical compromise that divides the octave into twelve logarithmically equal intervals to permit modulation between keys, and it became standard in Western music through a series of practical and aesthetic decisions between the sixteenth and eighteenth centuries. His actual question was: why does organized sound produce emotion in the absence of reference? Why do human beings, across every culture and every period of recorded history, take vibrations in the air and arrange them into patterns that make other human beings feel things?
That question remains open. The evolutionary accounts explain why music might be useful, but they do not explain why it feels the way it feels. The neuroscientific accounts map the brain activity that corresponds to musical experience, but they do not explain why that brain activity is accompanied by subjective experience at all, which is the hard problem of consciousness wearing a musical costume. The acoustic accounts describe the physics of the overtone series and the mathematical relationships between frequencies, but they do not explain why a minor third sounds sad to Western ears, or whether it sounds sad to ears trained in other tonal systems, or what “sounding sad” even means at the level of physical vibration.
The Puzzle Without a Solution
Sondheim was not, I think, being coy when he asked these questions. He was not performing the standard artist-as-mystic routine, in which the creator claims special access to forces that ordinary mortals cannot comprehend. He spent his entire career attacking that posture. He told interviewers that his college professor Robert Barrow had cured him of the belief that inspiration descended from above, that the revelation of understanding what a leading tone does and what a diatonic scale is had shown him that composition was “something worked out,” not something received. He called art “an attempt to bring order out of chaos” and compared songwriting to solving crossword puzzles. No one in the history of American musical theater was more committed to demystifying the process of making music.
That history is what makes this moment so unusual. Here is a man who demystified everything about how music is made, admitting that the bare fact of music’s existence remains mysterious to him. He cracked every local puzzle. He understood voice leading, harmonic substitution, the precise relationship between syllabic stress and melodic contour, the dramaturgical function of a vamp, the architecture of a twelve-bar modulation. He knew how to build the thing. He did not know why the thing existed to be built.
And he had been asking, in one form or another, for over thirty years. “How did man learn to whistle?” is not an idle example. In 1964, Sondheim opened Anyone Can Whistle with a song built on the same question, given to a character named Fay Apple who cannot do the thing everyone else finds natural. “Anyone can whistle, that’s what they say, easy,” the lyric begins, and then turns: “So someone tell me why can’t I?” The song is not about whistling. It is about the gap between capacities that appear universal and the lived experience of finding them impossible. Fay cannot let go, cannot be spontaneous, cannot perform the act that “anyone” supposedly can. In 1964, Sondheim wrote that question as dramatic psychology, embedded in a character’s specific anguish. In 1997, sitting with Horowitz, the character is gone, the dramatic frame is gone, and the question has become his own. He is no longer writing through someone else. He is asking it as himself, without the protective apparatus of fiction. The altitude has changed: Fay Apple’s question was why she, individually, could not access something innate; Sondheim’s 1997 question is why the innate thing exists at all. But it is the same bewilderment, carried forward three decades, stripped of costume and orchestration.
The “blah, blah, blah” is the tell. That is not Sondheim’s diction. He was a man who chose every word with a jeweler’s attention to weight and setting. Here, the precision abandons him. He is gesturing toward a set of questions he knows he cannot pursue with the rigor he would demand of himself. He is waving off his own inquiry, not out of boredom, but because he recognizes that he lacks the equipment to follow it. “I’m ill-educated this way, so you could probably answer” is simultaneously self-deprecating and self-protective: it acknowledges the gap in his knowledge while declining to fill it. He does not want the answer. He wants the question to remain a question. The inexplicability of music flatters the art form he gave his life to, and the alternative, a fully mechanistic explanation of music as an emergent property of neural computation and evolutionary pressure, would feel reductive to him even if it were true.
That preference for mystery over explanation is recognizable in many brilliant practitioners. A carpenter who builds flawless joints does not need to understand the molecular structure of wood. A poet who writes devastating lines does not need a theory of phonaesthetics. Sondheim composed at the highest level for more than half a century, and his inability to explain why music exists did not impair his ability to make it. The question was, for him, an object of wonder rather than a research problem. He held it up to the light, turned it over, admired its opacity, and set it back down.
The rest of us are allowed to pick it up again.
#aesthetic #art #birds #blah #lyrics #meaning #music #musicals #painting #performance #rothko #scales #sondheim #theatre #whistle #writing