Washington Apple Pi

A Community of Apple iPad, iPhone and Mac Users

Lionspeak: Voice Synthesis in Mac OS X Lion 10.7

© 2011 Lawrence I. Charters

Washington Apple Pi Journal, reprint information

Speech synthesis has been with the Mac since the very beginning. When the Macintosh was introduced to the world in 1984, the machine itself spoke to the assembled audience:

Hello. I'm Macintosh. It sure is great to get out of that bag.

Unaccustomed as I am to public speaking, I'd like to share with you a maxim I thought of the first time I met an IBM mainframe. Never trust a computer you can't lift.

Obviously, I can talk, but right now I'd like to sit back and listen. So, it is with considerable pride that I introduce a man who's been like a father to me: Steve Jobs.

In 1984, this was nothing less than an epic monologue for any computer, much less a tiny computer that Jobs had just pulled out of a bag. The Macintosh operating system, System 1.0, was the first operating system in the world to incorporate speech synthesis.

More than a quarter century later, the Macintosh is still with us, and the Macintosh operating system still includes speech synthesis. But with Mac OS X Lion 10.7, the Mac's ability to talk has taken some big leaps. In addition to the many voices included in Leopard and Snow Leopard, Lion offers the ability to customize speech synthesis by downloading dozens of entirely new voices. These aren't just in English, either, as you can pick from Arabic, Chinese, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai and Turkish voices.

If you are only interested in English voices, you're in luck: Lion has added English-speaking voices with Australian, Indian, Irish, Scottish, South African and British accents. All of them free, though you do need to download them.

Do note that the download itself is non-trivial: Mikko, a Finnish female voice, is over 870 megabytes, and Karen, an Australian female voice, is almost 430 megabytes. It is easy to see why they were not bundled with the Lion distribution, but do be prepared to spend some time waiting for them to download. If you collect all available voices (as of this writing), they'll occupy 20 gigabytes on your hard drive.

A good question at this point might be: OK, so why, exactly, would I need voices on my Mac? It is tempting to say that they are better than voices in your head, but there are practical uses, and entirely impractical but fun uses, too. Whichever path you take, the adventure starts at the System Preferences > Speech preference pane. Two tabs are at the top; select Text to Speech.

Text to Speech (Figure 1) allows you to select the system speaking voice, the rate of speech, and what kinds of things the system can speak with minimal attention (alerts, application alerts, speaking text when a key is pressed, and announce the time). The big button at the bottom, "Open Universal Access Preferences," enables VoiceOver, which is an assistive technology designed to "speak" the Mac interface for those with vision problems.

Speech Control Panel

Figure 1: The Speech System Preference Pane has the icon of a microphone. Once you select it, click the Text to Speech option. The System Voice, by default, is usually Alex. The slider below controls the speech rate (how fast the voice talks). The Play button plays back a short sample of that voice.

All of these options are useful. What about entirely useless applications? The first fun thing to do is to customize the system voice, and for that you go back to the Speech preference pane and press the System Voice button. It gives you a modest collection of choices, plus a Customize button at the bottom (Figure 2).

Speech Control Customization

Figure 2: If you scroll to the bottom of the list of System Voices, a Customize... selection appears. This is the gateway to adding new voices to Lion.

The Customize button is the key to adding voices, and wasting vast amounts of time. Just check off the voices that seem interesting (Figure 3), and wait for them to download; go to a movie, or perhaps fly to Alaska while things are transferred. As with fresh chocolate cookies right out of the oven, I started with a few, but eventually...

Pick a voice to download

Figure 3: To add new voices, just check the appropriate boxes. Do note, however, that many of these voices are hundreds of megabytes in size and, depending on the speed of your Internet connection, could take hours to download.

The Play button, shown on the same line as the Speaking Rate slider, will play a voice-specific sample for each voice. This is quite entertaining. But after a short while, you will want to do: more.

Experiment #1

Fire up TextEdit. Write something, anything. Within TextEdit, go to the Edit menu and scroll down to Speech > Start Speaking. It shouldn't take you long to think of things you can write that sound perfectly ridiculous when spoken.

Experiment #2

Fire up Apple Mail. Select a message; bureaucratic messages are the best. In Mail's Edit menu, select Speech > Start Speaking. Particularly egregious messages may inspire you to pick a suitable voice. Depending on the message, the Novelty voices (Bad News, Bahh, Bubbles, Deranged, Good News, Hysterical) are a good bet. On the other hand, having Milena, the Russian voice, read a software license agreement makes it way more interesting, and does nothing to reduce the clarity.

Experiment #3

Go to the Apple menu, select App Store, and visit the Mac App store. Search for a utility called Text2Tape. Download it (the utility is free). Now write something in TextWrangler (also free from the Mac App store) and save it as a text file. Fire up Text2Tape and have it "read" the text written in the file. Text2Tape will create a sound file for you with the same name as the text file, in AIFF (Apple sound file) format. Double-click the resulting file to hear what it says, or E-mail the file to someone, or -- you'll think of something.

If you'd rather see, or hear, what Lion's new voices sound like first, several files, created with various English voices as well as several non-English voices, have been placed on the Pi Web site at the end of this article. Many of these sound files use the same text as a starting point, but vary the voice used. The text file used to test the voices was: "I am lion! Hear me roar! I am the kitty of your dreams, the feline of your fears, the cunning cat that ate the canary. I am the pride of the pussy world. I am lion! Hear me roar!"

All of the samples use female voices (because I like listening to female voices), and a great many voices, including male voices, are not represented. Note in particular what happens when the sample text is read by a voice not intended for English; you'll be fascinated.

The Mac's first speech, mentioned at the start of this article, was also recreated in two different voices. Alas, the original MacInTalk voice of 1984 is no longer available; for that, see the YouTube video linked at the bottom.

Finally, if you've ever wanted a custom answering machine message designed to screen all your calls, a very lovely Scottish voice provides a comprehensive answering machine greeting. Alex, a male voice, provides a different, simpler message. Feel free to use either.

Or create a better one.

Note on technique: to create the sound files found on the Pi Web site, I created text files using TextWrangler, free on the Mac App store, and saved them to disk. I then launched Text2Tape to "read" the files and save the resulting AIFF audio files to disk. I then used Audacity, an open source audio editor, to convert the AIFF files to MP3 files for use on the Web.

Sound Files

English speaking voices

Australia: Karen (mp3, 177K)

India: Sangeeta (mp3, 164K)

Ireland: Moira (mp3, 202K)

Scotland: Fiona (mp3, 161K)

South Africa: Tessa (mp3, 196K)

UK: Emily (mp3, 187K)

UK: Serena (mp3, 178K)

US: Jill (mp3, 188K)

US: Samantha (mp3, 177K)

US: Vicki (mp3, 239K)

US novelty: Pipe organ (mp3, 472K)

US novelty: Zarvox (mp3, 250K)

Non-English voices, same text (pay very careful to what the voices say)

China: Ting-ting (mp3, 244K)

French Canada: Julie (mp3, 163K)

France: Virginie (mp3, 184K)

Germany: Anna (mp3, 199K)

Germany: Steffi (mp3, 181K)

Japan: Kyoko (mp3, 275K)

Mexico: Paulina (mp3, 208K)

Russia: Milena (mp3, 205K)

Spain: Monica (mp3, 176K)

Sweden: Alva (mp3, 251K)

Recreation of Macintosh 1984 introductory speech

Out of the bag: Alex (mp3, 477K)

Out of the bag: Victoria (mp3, 493K)

Out of the bag: Zardox (mp3, 492K)

Answering machine messages

Simple (mp3, 313K)

Comprehensive (mp3, 1.1 MB)

Resources

Video of the introduction of Macintosh, including the Mac's first public speech, January 1984
http://www.youtube.com/watch?v=G0FtgZNOD44

Text2Tape
http://web.me.com/nilesmitchell/Text2Tape/Welcome.html (or Mac App Store)

Audacity
http://audacity.sourceforge.net/download/mac

TextWrangler
http://www.barebones.com/products/textwrangler/ (or Mac App store)