Successors to SoundFonts (my message to E-mu)

Dear E-mu Systems,

The SoundFont format is a very versatile music sampling format. I've used it for almost a year now, and I have found it very useful to create GM compatible sets out of samples. SoundFont editors like Polyphone and Viena have made the experience much easier by adding things to help my SoundFont creation workflow. Unfortunately, there are many limitations that are internal to the SoundFont format, and I really want to tell you all what they are.

Sample limitations:
Bitdepth is either 16 or 24 bits. No inbetweens. Come on, worse bitdepth than the Roland SOUND CANVAS SC-55 MkII from 1993[1]? (18 bit[1] vs 16 bit). Even though I have to admit that 24 bit can be a significant improvement from 16 bit, it was only added in SoundFont version 2.04 (2005[1])!
Not only that, all samples are the same bitdepth. This means that if you mix 16 bit and 24 bit samples, and you want to keep the 24 bit quality of the 24 bit samples, the 16 bit samples must also be read as 24 bit. This bloats the 16 bit samples, and the result is a bloated SoundFont, all because you wanted to use 24 bit samples.
Stereo samples are two mono samples that are linked. If the linking messes up, than you've got weird sounds.
The length of the sample name is limited to 20 characters! (Someone on the Polyphone forums wanted to make a standard naming scheme, but got stopped by the 20 character limit)

Instrument limitations:
An inability for filter type and depth to be changed harks back to the EMU8000/EMU10k1/EMU10k2, which only had a basic 12dB/oct lowpass resonant filter. Think about the difference high pass filters could make to sounds. For example, it could make synthesiser sounds much more varied, instead of boring low pass filter sweeps all the time. 
There are no modulation or vibrato envelopes, only delay. You can't make realistic synthesized vibrato on instruments using only a vibrato delay. It is needed to have an attack, so the vibrato gradually and expressively affects the already realistic sound.
Limited sample playback features exist. You only have an option to turn on or off the loop, or choose to use the rest of the sample as a release. (Another person wanted 'Bidi Style' playback on the Polyphone forums and didn't get it due to format limitations).
No negative attenuation is available. Instrument too quiet? Unfortunately, too bad. You can't increase the volume further! 
In a similar way to the samples, the length of the instrument name is limited to 20 characters. (Could be an issue soon)
The SoundFont standard is also stuck in the 16 bit world with instrument generators (probably to guarantee AWE32 support)! (65536 instrument generators run out very quickly!)

Preset limitations:
Also stuck in the 16 bit world with preset generators. (65536 preset generators don't run out as easily as instrument generators, but increasing the amount of preset generators would open the door to so much more presets per SoundFont.)
Only one bank select (usually set to MSB). Many extended MIDI standards (like XG, GM2 and SD-90) used the LSB banks, so multiple SoundFonts would need to be used. This is not very practical as inconsistent standards define what SoundFont uses which LSB bank (bassmidi uses .sflist files and varranger uses one soundfont with xxx@ before the number, reducing your character limit to 16 from 20. VArranger's format also has the limitation that only one lsb can be used per msb/program change combination).
Length of the preset name is limited to 20 characters (Just like samples and instruments, could become an issue.)

Modulator limitation:
These wonderful things were introduced in SoundFont version 2.01, and allow you to control the parameters of the SoundFont with MIDI controller changes.[1] Not only that, they can react to velocity, key number, aftertouch, pitch wheel and pitch wheel sensitivity.[2] Unfortunately, they have a limitation as well.
This one limitation is the fact that you cannot put an amplitude envelope or LFO on the modulators, making it not possible to pan over time, for example. (Someone wanted amplitude envelopes for panning on the Polyphone forums, but the SoundFont format limits mean't the user didn't get it). Also, it would make my life easier for these panning patches, like the missing patchs remaining in the SC-88Pro map.

File format limitation:
RIFF holds SoundFonts back from going over 4GB, as this outdated 32-bit format has a built in 4GB limit. Most apps use signed 32-bit variables to gauge the size of a SoundFont, but that halves the limit to 2GB. Me and Stgiga are getting closer to reaching 4GB, and I (maybe Stgiga?) need to be able to go further. Stgiga said that apps should update to RF64 after fixing the signed number issue, however RF64 use is an issue with SoundFont file format standards, as the de facto container format for SoundFonts is RIFF (32-bit). Any RF64 SoundFont would be out of spec. Wonder why? All because you're being so stubborn and you don't want to let go of AWE32 support.

Now for the things that I really want, but can't get with SoundFonts:
A concert hall sounding piano. An ultra glassy and chorused FM electric piano. A realistic rotary organ sound. Some amazing cathedral reverb for a church organ. A screaming overdriven or distortion guitar. Bass boosted 303's and Moog's. A delay on my sequenced sawtooth pluck. Sound effects with ambiance. Some light room reverb on a drum kit. Booming 808 kicks. Punchy 909 kicks and snares. You know what I'm talking about. There is only one way we can get such. We need to be able to add full DSP effects to each sound, allowing us to tailor the sound to what we want. We want to usher in a new standard in creativity. We want to break the barriers that make SoundFonts SoundFonts. I believe that we shouldn't be limited to the file format. In addition I would like individual amplitude envelopes for each DSP effect.
Something that could seriously increase the realism of pianos, is that subtle, but audible, pedal sound. You should be able to play sounds depending on the pedal control change value. It's not that important, but that is something that could make pianos sound just that little bit more special.
Of course, there's one thing I dislike about the sound from a SoundFont: it sounds the same all the time! You should be able to randomize a parameter every time you play a note, or even change the sample completely for round-robin sampling goodness. Subtle things really make the biggest difference. For your information, someone wanted that again, and that someone didn't get it. Guess the reason. File format limitations.
Ever heard a Motif? They can make a whole song with just keys being pressed. That is amazing. Shouldn't we get the same feature? I would slam a built in sequencer and/or arpeggiator into the format to allow these multitimbral setups to come to life! Oh my gosh please add sequencers/arps into the format! Please! Anyway, the possibilities would be unlimited with sequencers and arps in the format.

Also, there is one other subtle issue.  Someone on the Polyphone forums has complained about the sample looping rules. E-mu, I understand that the intention is to prevent artifacts in looping, but at least two people have complained about it. E-mu there is one thing you can do. Abolish 6.3 and let us do what we want with the sample loop start and ends. Do we really need a leeway of 4 (FOUR) samples? New algorithms obsolete that requirement.

This means:

It's time for a successor to SoundFont 2.04!

E-mu, I've got a few options for you to consider for upgrading the SoundFont standard.

Option 1: Do nothing
You might ask yourself: SFPack/SFArk/SF3/SF4/SF2Pack are all 'out of spec' if RF64 SoundFonts are 'out of spec'? You're right. Since that is the case, RF64 might as well be an interim solution to the 4GB limit. However, all of the other inherent issues with the SoundFont standard still apply.

Option 2: Add RF64 support to the official standard
Another option is to just add an appendix to the part of the spec that says that RIFF is the container format used, saying that 64-bit RIFF can be used as well. This is an interim solution, which allows people to add more detailed samples without loops. This, however, does not fix any of the other flaws in the standard. Keep 32-bit RIFF for files less than 4GB, so your beloved AWE32 compatibility is not gone, as that soundcard can't even get close to 4GB.

Option 3: Overhaul the SoundFont standard
For option 3, I kindly ask that you add:
CC32 (Bank select LSB) support
32-bit instrument and preset generator counts (>4 billion vs 65536)
Upgrade RIFF to RF64 (16EB vs 4GB)
Allow custom bitdepths (18-bit, 32-bit+)
DSP effects (May not be possible with the current SoundFont infrastructure or file format structure)
Increase maximum sample/instrument/preset name lengths from 20 to 32+

Any of the options listed above is fine. However, if you decide on option 1, I will eventually start talking on the Polyphone forums about a new music sampler file standard with all the features that I want. If you choose option 2, I will have to emphasize to you that this is only a temporary solution, and a revamp of the SoundFont standard is still needed.

Features of my proposed standard include:
Completely new non-RIFF format expandable to 16EB (64-bit) or even 256 Million Geopbytes (128-bit)
All bitdepths (16, 24, 32, anything in between)
Native stereo and surround sample support (No links)
All sample, instrument and preset features of SF2 and SFZ formats (including pedal release, exclusive classes and round robin sampling)
256 characters in sample, instrument and preset names
Adjustable filter depth and types (Depth: 6, 12, 18, 24, 36, 48, anything in between; Types: low pass, high pass, band pass, notch)
Complete DSP set (Reverb, chorus, rotary speaker, distortion, flanger, wah, delay, EQ and more) - Also choose control change to map to (CC94 for SC-88Pro EFX?)
Full amplitude envelopes on LFO, modulators, pan and DSPs 
Parameter randomization for realism
Up to 2 million presets with program change, bank select MSB and bank select LSB
Instrument and preset generators increased to 32 bits (4 billion of each) or 64 bits (16 quintillion of each)
Negative attenuation on instruments and presets
Bidi Style and reverse sample playback 
Sequencers and arps - React to MIDI tempo.
Either one file implementation, which makes it easier to distribute, or multi file implementation (like SFZ) for development or for flexibility.

In conclusion, even though SoundFonts have stood the test of time (Over 13 years ago!), they are starting to age, and their appeal has declined due to newer formats like NI's proprietary Kontakt standard that everyone's using now. E-mu, I believe that the weaknesses of the file format can be corrected, and that the SoundFont format would then be relevant for years to come. These improvements start now, and everyone can help.

P.S. : Synthfont has been providing exclusive features to its sf2-based products for a long time. If you're not happy about that, please tell them. 

Yours sincerely, 

Strix SoundFont Team

[1] Wikipedia information licensed under cc-by-sa 4.0.
[2] According to Polyphone.

Here are the Google Slides that explain the features that may be added to the new format. The Google Slides

Davy, the author of Polyphone, discussing his take on SoundFont3. https://github.com/davy7125/soundfont-standard-v3
Comments