Synthetic Speech in Flash: the Source Code

Remember the Flash synthetic speech demo, which turned into a talking robot app? Here’s the source code: Download it!


Please note: This code comes with no warranty, nor support, whatsoever. None. Zip. Nada. If your talking robots become self-aware and enslave humanity, then I will not be held responsible. But if you’re in the mood for tinkering, here’s how it’s strung together:

  • The sound is generating using Linear Predictive Coding (“LPC”).
  • First, some Python code: The script analyzes phonemes.dat (which is just a headerless version of phonemes.aif). Individual phonemes are separated by moments of silence, so the script splits the sound file on those. Each phoneme is converted to LPC data, using code that I ported from the rt_lpc project. I felt like I understood the mathematics 3 years ago, but I doubt I could explain it today.
  • Now, in Flash: Launch the DictCompressor application, and watch the trace messages. Click the screen to open the browser window, then select your cmudict___.txt pronouncing dictionary. (You can obtain the latest CMUdict here.) Flash will convert this to a (smaller) cmudict.dat file, which is what LPCsynth.swf loads.
  • LPCsynth is the application that talks. The LPCSynthHarness.sayItNow() method creates an array of LPCFrames, which are “spoken” in the sampleData() method.  This was never intended for public distribution, so the code is not exactly stellar (the talking bit should be extracted into its own class).

Is this interesting? Did your Flash Player become self-aware after hearing its own voice? Let me know!

It’s funny, originally this was intended for the website. The robot voice would sing as you clicked, crooning about your mousing habits. I still can’t decide if that idea was brilliant, or terrible.

Son of Strange Attractors

I rewrote my strange attractor generator, in Flash:

Try it. Click to generate new attractors.

The attractor coefficients are still chosen randomly. But now, attractors that explode/collapse are rejected. Also, attractors that create “boring” shapes (by drawing the same pixels repeatedly) are discarded. It’s a little slow, but I’m sure the speed could be improved using Pixel Bender.

Also, here’s the source code. (Compile with Flash CS5.)

A Formant Sequencer in Flash

I started an open source project: fseq-flash is a formant sequence editor. Current features include:

  • Import AIFF files
  • Audition and edit formant sequences in real time
  • Vowel- and function-drawing tools
  • Export .syx files for the Yamaha FS1R

Click to launch. Press the space bar to play the sound.

You can push your formant sequence to the Yamaha FS1R, using software such as K_Take’s FS1R Editor. Click the “Save .syx” button, and follow the instructions in K_Take’s documentation. This is a lot of fun, and breathes new life into the FS1R.

This project became much deeper than anticipated! The code includes FFT analysis (thanks Gerry Beauregard), pitch detection, a formant detection algorithm, and an AIFF parser to read AIFF files. The interface was a challenge to design and implement, and there are still many unfinished features.

My energy is shifting to other work, so I’ll enhance fseq-flash when time permits.

Filtered Noise Sequencer

Here’s something fun — I made a 16-step sequencer in Flash, that plays filtered noise (or sine waves, when the filter is narrow):

Filtered Noise Sequencer

Drag and resize the blue blocks to change the filter frequency and width.

This sequencer is not using expensive bandpass filters. The oscillators are sine waves, which are frequency modulated with white noise. It may not sound inherently musical, but you can produce great hihats, bass thuds, and airy pitched noises.

Here’s the source code. (Requires Flash CS5 to compile.) Have fun!

Twang… in Flash?!

Yep, this is a “Twang Player” prototype, built in Flash. There’s only one embedded song, a groggy rendition of the ditty from the first video:

(click to launch)

The next version of Twang will record & save songs (this is done), and share them in some capacity (a bit more complicated). So that’s where the Twang Player comes in. The Flash version looks like it wants to be touched & strummed. I need to revisit the design and convey that Twang Player is just a music box, you can’t compose anything in the browser! …Yet.

There are performance issues on some of my machines, too. This version uses Vector.<Number> objects to handle data, but it looks like ByteArray or even BitmapData structures are the way to go.

Aquatic Sound Generator in Flash

Here’s something from the vaults. Aquasound was built with these requirements in mind:

  • Generate sounds that aquatic animals might make
  • Sounds can be “combined” somehow
  • Sounds can emote

This was never used in production. I wonder if I could turn this into something? Like a paid iPhone app? ;)

Double-click the envelopes to add/remove control points. Drag lines up & down to change their curviture. The best feature is the “Combine With” dropdown, which splices the current sound with your selection. Also the “Emote” menu will play sounds with different expression.

The audio algorithm is reverse-engineered from my beloved FS1R. I generated formants in two ways (toggle the “Tonal” checkbox to hear both), the “atonal” version is closer to ring modulation than actual formants. It’s more fun if you don’t understand what the controls are doing, but if you insist: Pitch controls the overall pitch of the sound. Freq controls the center frequency of the formant (like a bandpass filter). LFOFreq and LFOWeight control a low-frequency sine wave, which can be applied to other controls via their “___LFOAmt” curves. Amp is amplitude, Width is formant width (think: width of the bandpass filter), Skirt adds distortion. Each voice has two formant generators, check “Formant Active” to enable them.

May all your bloops and crackles be happy ones!

Strange Attractors in Flash

Have you seen (or played) the demo for Polynomial, the space shooter? Quick! Watch the video:

I spent a couple hours generating strange attractors in Flash, just a simple 2D version for now. Click to play:

Here Be Strange Attractors

Click the black region to generate new polynomial coefficients and redraw. You will have to click many times to generate something interesting. That’s the nature of fractals, I’m afraid. Some coefficients are automatically thrown out if the drawing exceeds a certain size. Unfortunately, the inverse is not true: the code isn’t smart enough to trash any drawings that shrink to microscopic size.

I believe that you can stabilize any coefficients by scaling the values of each coefficient, gradually nudging them larger/smaller until the drawing is stable. I’ll try this when I get more time. I’ve been gung-ho on my first proper iPhone app, trying to finish it before Christmas! Stay tuned…

Also, here’s the source code for the strange attractor harness! Enjoy.

Flash 3D: A change of heart?

Yesterday, I posted a damning critique of Flash’s native 3D.

Today I noticed that if you right-click on yesterday’s SWF and show the redraw regions, you can see that it’s redrawing the contents of the entire stage, even though I put the scene in a scrollRect. Is it seriously rendering a scene that’s thousands of pixels wide before displaying it ?!?!?!? Oh, no. No they DIDN’T.

Today I ported the scene to the Away3D rendering engine. Here’s the result:

It’s beautiful, and provides access to low-level drawing routines, light sources, normal maps, … It was speedy at first, then slowed down considerably when I added the glowing floors. (Each glow is 16+ triangles right now, for various reasons including: I can’t render objects in my own custom order.) This makes yesterday’s version look performant, I’m reluctant to admit.

Possible next steps:

  • Reduce the native 3D rendering area, see if performance improves?
  • Grow beyond Flash, embrace the future and try Unity 3D?
  • Dump this project, finish that iPhone game I started, make a million dollars in 2 weeks?

Flash 3D makes me sad

I’ve been dabbling with Flash 10’s native 3D support. Try my engine:

Click to set the focus. Use the arrow keys to move. Touch blocks to illuminate.

I’m disappointed with two things:

1). How much time is required to create a 3D engine, even a grid-based one like mine. I’ve been wrestling this project for 4+ hours every day, for a week. I feel like I must be lagging behind, but there are ten thousand things that will go wrong when developing in 3D. The paradigm is uniquely punishing, there are always edge cases where some polygons aren’t drawn correctly. This project hasn’t been a joy.


2). Flash’s native 3D is not suited for a high-performance application like this one. It would be fine if I was only spinning a few DisplayObjects in space. However, the scene above displays up to 125 Bitmaps simultaneously. (Light all 25 bulbs (3 Bitmaps each), stand in the corner facing them, and the 25-segment walls.) 125 Bitmaps would be child’s play in OpenGL. But after you light a few blocks, Flash Player chokes pretty hard.

Here’s another version that uses a BlendMode on the lightbulbs. It looks great, but its performance is even less acceptable.

Here’s an early version that uses my own 3D computations, and the Graphics API. Also it has a limited field of view, which I widened for the latest builds. The performance is surprisingly high. I abandoned my custom 3D when I reached this point; drawing lines around each cube face was expensive, so I switched to Bitmaps, and the native 3D.

The cube faces are set to width & height of 100. However, the bitmaps are higher resolution, a 200×200 region is shown. They’re being downsampled at 100×100 before they’re rendered, not by my choice.

At runtime, I get periodic warnings like these:

Warning: 3D DisplayObject will not render. Its dimensions (8238, 1628) are too large to be drawn.

What?! How is this happening? I swear that any blocks behind the camera are being removed from the Stage. (Actually, this is difficult to verify. If I shrink the scene, Flash magically applies the 3D perspective with a weird projection, and distorts everything in Lovecraftian dimensions.) Please, Adobe, tell me that you’re not rendering the scene at 8000 pixels wide, then scaling it down to my 700×400 window, frame after frame?

Also note that you, the developer, are responsible for drawing the DisplayObjects in the correct depth order (farthest to nearest), Flash doesn’t handle it automatically. This is known as “2.5D“, and it’s wildly inconvenient.

So, I’m pretty disappointed with Flash 10’s native 3D. Even with my limited 3D experience, I dislike how it renders the scene (I’m not alone in this) and the performance is obviously sub-par. This technology will not bring 3D games to the web, it cannot.

I need to decide whether to endure its shortcomings for 4 more weeks, or if I should abandon this project altogether. There are moments when you realize you’ve outgrown something you used to love, and this may be one of mine.

Synthetic Speech in Flash

Recently, I learned about Linear Predictive Coding (“LPC”). This technique is used in classic arcade games (such as Gauntlet) and the Speak & Spell to synthesize speech.

Here’s my first attempt at LPC speech in Flash: (click & explore)

It’s great, except for one tiny problem: It sounds horrific. Can you feel the cold, robotic love? This voice will stalk your nightmares.

The phonemes were derived from an unrehearsed recording of my voice. I’m confident that it can be improved. Note that direct LPC encodings of my voice, such as this one, sound more acceptable.

EDIT: I made an iPhone version, “Metal Mouth”, with lots of features. Here it is on YouTube and the iTunes Store!

EDIT #2: The source code is available here.

Toaster Bro alpha: 10 days later

Did I mention my new game, “Toaster Bro”? Play Toaster Bro alpha version 1!

(You will need Flash Player 10.)

Ten days have elapsed since I shared this version with friends (who are unwittingly being used as play testers). It’s time for a “wrap-up” meeting, because I want to examine what went wrong/right, and instead of a meeting it’s a blog post:

Continue reading

8-bit NTSC artifacts using Pixel Bender

By request, here’s a quick ‘n dirty test harness, and sample code, for NTSC artifacts in the style of the 8-bit Nintendo Entertainment System (NES):

Click the animation to change scale & scroll speed.

Source code & .fla:

The .pbk code is not optimized yet. The code is fairly explicit, I tried to explain how it works in the comments. Blargg’s pages have better explanations tho.

The test harness lets you select two flavors of the effect. The numbers 8 and 12 denote the width of the lowpass window used for applying crosstalk. 12 is more processor-intensive, but will look “smoother”, which may not be what you want. The mathematics can be reduced to a few (long) lines, which should reduce processor overhead; I want to do this in the future. unic0rn left some nice comments suggesting more routes to optimization.

The filter still needs some tuning. Areas of solid (non-black) color have diagonal stripes in them. I believe that normalizing the strengths of the filters will fix this.

To be continued…