Synthetic Speech in Flash: the Source Code

Remember the Flash synthetic speech demo, which turned into a talking robot app? Here’s the source code: Download it!

SUPERCALIFRAGILISTICEXPEALIDOSHUS

Please note: This code comes with no warranty, nor support, whatsoever. None. Zip. Nada. If your talking robots become self-aware and enslave humanity, then I will not be held responsible. But if you’re in the mood for tinkering, here’s how it’s strung together:

  • The sound is generating using Linear Predictive Coding (“LPC”).
  • First, some Python code: The analyze_lpc.py script analyzes phonemes.dat (which is just a headerless version of phonemes.aif). Individual phonemes are separated by moments of silence, so the script splits the sound file on those. Each phoneme is converted to LPC data, using code that I ported from the rt_lpc project. I felt like I understood the mathematics 3 years ago, but I doubt I could explain it today.
  • Now, in Flash: Launch the DictCompressor application, and watch the trace messages. Click the screen to open the browser window, then select your cmudict___.txt pronouncing dictionary. (You can obtain the latest CMUdict here.) Flash will convert this to a (smaller) cmudict.dat file, which is what LPCsynth.swf loads.
  • LPCsynth is the application that talks. The LPCSynthHarness.sayItNow() method creates an array of LPCFrames, which are “spoken” in the sampleData() method.  This was never intended for public distribution, so the code is not exactly stellar (the talking bit should be extracted into its own class).

Is this interesting? Did your Flash Player become self-aware after hearing its own voice? Let me know!

It’s funny, originally this was intended for the controlzinc.com website. The robot voice would sing as you clicked, crooning about your mousing habits. I still can’t decide if that idea was brilliant, or terrible.

Synthetic Speech in Flash

Recently, I learned about Linear Predictive Coding (“LPC”). This technique is used in classic arcade games (such as Gauntlet) and the Speak & Spell to synthesize speech.

Here’s my first attempt at LPC speech in Flash: (click & explore)

It’s great, except for one tiny problem: It sounds horrific. Can you feel the cold, robotic love? This voice will stalk your nightmares.

The phonemes were derived from an unrehearsed recording of my voice. I’m confident that it can be improved. Note that direct LPC encodings of my voice, such as this one, sound more acceptable.

EDIT: I made an iPhone version, “Metal Mouth”, with lots of features. Here it is on YouTube and the iTunes Store!

EDIT #2: The source code is available here.