Visual SFX

VisualSFX: Image Based Granular Synthesis

VisualSFX is an application that can generate sound effects from given images. It uses granular synthesis to sample the input image pixel by pixel, and produce “grains” for each pixel based on its RGB and/or YUV data. Combining all of those grains together gives you a sound effect, and can be used to make interesting sounds from very mundane images. Here are some images and the sound effects generated with them!

That last one is my favorite. It sounds similar to the well-known THX start-up sound, and so it visualizes how that sound effect is just a slow-rising gradient of frequencies.

I made this for a semester-long audio project class, and I got the idea from having toyed around with some other granular synthesizers in various DAWs. I’ve actually used some of the sound effects made in this app for my game projects; it’s a powerful tool if you can visualize what you want your frequency spectrum to look like. There are a bunch of features I would’ve liked to add, but never got the time to. Here’s a short list:

Adding the option of supplying a second image and a source sound effect. Right now, the individual grains are just basic waveforms, and a lot of other granular synthesizers allow you to take the source from a sound effect and turn it into little grains. Adding a second image would allow me to use that image as a “time reference map” of sorts, where its X axis would be time into the produced sound effect, and the Y axis would be time into the given sound source. This way, you would be able to pitch shift AND time seek into the source sound effect at the same time, for even more nuanced grain control.

Implementing a STFT (short-term fourier transform) to translate given sound effects into images. This was something I was originally planning to do but quickly cut it after I realized how complicated it was. With the STFT, I’d be able to get snapshots of the spectrum of a given sound at regular intervals, and I could translate that spectrum data back into an image, so you could “sample” the sound effect visually in an image editor. Once you get the image representation of a sound effect, you can warp it with all kinds of cool visual effects and then translate it back into a sound effect to see what those effects would do to the audio spectrum.

Finer tuned control over the grain generation. Right now, the only thing that the pixel data effects is the volume, stereo panning, and volume envelope falloff. With more time, I would’ve added a number of extra grain attributes that the pixel data would affect in order to make more interesting grains, like:

  • different colors being associated with different waveforms (R = sin, G = saw, B = triangle, etc.)
  • changing the envelope type (R = linear, G = log, B = exponential, etc.)
  • being able to generate multiple grains from a single pixel

Overall though, I’m pretty happy with it. If you have any cool ideas or questions about VisualSFX, feel free to email me at izzyabdussabur@gmail.com or contact me at @BBQSteakTips on twitter. If you’d like to toy around with the source or build it yourself, the github repository is here: https://github.com/izzy-sabur/VisualSFX

Thanks for reading!