December 21, 2014
January 29, 2017
Part I: here is a quote from a recent email reply:
"The complexity vs. CPU load is an issue that I struggled with while writing this book. I've seen the code for the Korg Wavestation, Trinity (aka Triton) and original OASYS. If I printed code like this, no one would buy the book as it is very librarian/bookkeeping in nature, tons of lookup tables, arrays of pointers to arrays of pointers to buffers of coefficients, etc... In fact, if you look at the struct-based Global Parameterization in Chapter 8, you will see something close to the Wavestation's code (it was written in pure C, not C++). And, it is not pleasant or easy to de-tangle. In the book I am trying to present these relatively new technologies and provide C++ code that gets you started. How far you want or need to take it is up to you. I needed to make the synth projects able to display the underlying technology (BLEP, virtual analog filtering, etc...) while still providing a robust modulation matrix to allow flexibility and user programmability.
You should not begin to approach optimizing the code if you don't have the book yet and don't understand the relationships between CVoice and its derived classes, and CModuationMatrix, and CFilter and how the synths are wired up. In addition, you need to look at the grey box on the first page of Chapter 9 which discusses departing from the book and writing your own CVoice and derived classes, and possibly doing away with the Modulation Matrix in favor of hard-coding the architecture. All of these will increase polyphony but will decrease flexibility and user programmability."
Going back several years of teaching this, I had originally designed ridiculously simple synths with fixed architectures, no modulation matrix, no CVoice base class, etc... all hard-wired with no user programmability. These did a good job at showing the underlying technologies but were not flexible. For that, I decided to go back and re-design them with a CVoice base class (see all the various commercial synth voice architectures in Chapter 9 for motivation there). In addition, I decided to add the modulation matrix rather that hard-wiring since this is the way it is done in the real world.
You can immediately find ways to optimize the code by writing your own Voice class, or eliminating it and doing all your processing in the plug-in's core functions. You can also hard-wire your architecture with more inlined code and nested functions.
Part II is next topic.
January 29, 2017
There are several mathematical bottlenecks that can be optimized further. One is the tanh() function which is used all over the place in the Filters. I did write a tanh() lookup table that you can find in lookuptables.h and you can use this instead of calling the function, but I did not see a remarkable improvement in efficiency with that - trig functions are implemented very efficiently in CPUs these days.
The major bottleneck is the abundance power-of-two function calls, pow(2, N) which are required for: pitch calculation/modulation, filter cutoff frequency modulation, and portamento/glide. In real synths, they do not call the pow(2,N) directly but rather use a lookup table, or two of them (one coarse, one fine) to eliminate this function. It is critical that the error produced is held to the absolute minimum since it controls pitch calculation/modulation. I chose to stick with the function call, and let informed readers who are advanced programmers handle optimizing it as you can easily wind up with a synth that is out of tune - even a few cents of tuning offsets are enough to render the synth useless. I welcome any pow(2,N) replacement methods - lookup tables or otherwise - to be posted here on this Forum.
A second bottleneck is the pre-warping and calculation of the filter coefficients. These involved repeated calcs with the tan() function. One of my students wrote a lookup table for these calculations last semester; I will see if he will post that here. The only issue is that the lookup table will be dependent on sample rate, so you will need one table for each supported sample rate.
The other inefficiencies are in the way the patches are wired including CVoice and the modulation matrix, which I discussed in the previous post. Again, it is very difficult to straddle the line there and provide interesting, fun, useful, programmable and extendible synths without an abundance of structure-based embedded code so that even beginners can approach it, so I spit the difference here. Hopefully, people will "take the ball and run with it" and use the code as a foundation to improve upon greatly using their own C++ sensibilities.
Here's a good example: a few years ago I designed a very efficient modulation matrix that used C++ function pointers. When I presented it in class, no one had ever seen a C++ function pointer before (not taught in basic C++ classes I suppose, or they had forgotten it). This was an immediate turn-off for the students who mostly ignored it, just staying on Facebook the whole time, or looking at their laps and tapping their thumbs on something. I could never have published that code without a similar reaction from most readers. And, the code is now lost from an old hard-drive crash
Chapter 8 (modulation matrix/global parameters) is a turning point in the book. Both of these things make the code more complicated. The modulation matrix does eat up a some more CPU cycles versus hard-wiring your sources and modulators but is more flexible and simple to program (i.e. enable/disable source/destination pairs on the fly). Global parameterization on the other hand greatly improves efficiency by eliminating redundant function calls, but it is not very much fun to look at - structures of embedded structures, all straight C - and turns everyone off that looks at it. But, that's more like what real code looks like.
Chapter 9 immediately leads off with the notion that you should make decisions about how you want to proceed, using the supplied CVoice object or creating your own, or eliminating it altogether as well as using the Modulation Matrix.
January 29, 2017
One of my biggest fears with this book is that people will simply download the code and compile it without reading the book either first or concurrently, then base their opinions directly on that without the book's information.
Think about it: I spend five Chapters slowly building up NanoSynth (a monosynth that is the basis for MiniSynth Analog Modeling synth) in Chapters 3-7. You are supposed to start with NanoSynth MIDI and trap MIDI messages, then move to NanoSynth OSC and add two oscillators, then NanoSynth FILTERS, then NanoSynth EG-DCA at which point you will have a hard-wired, monosynth that is actually pretty remarkable even for its simplicity. Makes great fat-bass sounds. The point of doing PolyNanoSynth is to give you a taste of attempting polyphony and showing how things become exponentially more complex just by adding a single extra note of polyphony. You are supposed to be following along, learning progressively here. And, even if you only have an intermediate understanding of C++, you are realizing "hey, I could optimize this or that" even if it makes the code less readable.
After that, each chapter piles on another technology/strategy/theory and the synths become very much more complicated while still remaining essentially pure C++. And, as the Chapters progress, it is always up to the reader to decide how far to take optimizations, or (hopefully) spin off their own, personal variations on the synth architectures.
So, if you are downloading the code and compiling it blindly without going through the book's progression, then most likely you are only getting part of the information and skills. Super advanced programmers might download the code and then pull out only the parts they want - which is totally fine with me - in fact, you'll see in Chapter 8 that I keep the C++ objects intact so they can all be used as standalone processing/rendering objects specifically for this purpose.
Lastly, and this is not related to optimization, I strongly urge everyone to add the suite of delay effects in Chapter 13 to every synth, not just MiniSynth. These effects massively change the nature of the patches, all for the better.
December 21, 2014
Thank you for the detailed and informative reply Will.
Am digging into the book now, spent most of yesterday afternoon reading it. I like the way it goes in depth into say, creating an oscillator. Think it will take me some time to go through it fully.
I totally understand the rationale behind keeping the code simple as the intention is to teach. It's great that there is such a forum with respect to the book to discuss about possible optimizations to the code, etc. Will be hanging around
Oh, am also curious to hear what you think are possible optimizations using the Accelerate framework (for mac).
January 29, 2017
Thanks for understanding the point of view here. One thing I have learned over the course of the last 2 books is that it impossible to please everyone!
RE: Accellerate - some of my students use the Accellerate framework in the iOS class that I teach (called MMI505) for projects involving FFT/convolution processing, where it clearly is faster than coding from scratch. But, Accellerate also has some very interesting DSP-core-like functions such as multiply/accumulate operations for filtering, which could be useful for multiband stereo EQs, band-splitting filter-banks, and even in perceptual reverb algorithms, which often include comb, delayed comb, and APF filtering. On the synth side it could also be applied to the filtering and oscillator operations (interpolation).
Three years ago, one of my grad students named Stephan did a bunch of testing with a few of these lower level DSP sub-functions compared to the straight math operations, all in XCode and running on an iOS device. He found no difference in speed using them - again, not FFT/convolution, just the MAC stuff. However, he never tested Accellerate in the context of an AU plug-in, so that is something to consider. I will throw this problem at my grad students in my advanced plugin class (MMI606) first thing next semester, and hopefully we can have some measurements for you all.
December 21, 2014
Another question if I may, Will. In CSound, many of the commands can run at "k-rate" aka slices of the sample timing or "i-rate", everytime an instrument is invoked.
I noticed that most of the commands in the book source code run at frame rate. (Pls correct me if I'm wrong)
Are there any that would run at a different rate (maybe every 256/512 frames) and not alter the nature of the synth?
January 29, 2017
I have not looked at CSound since about 1989 or so. I do not understand your phrase "commands" since our plugins are just straight C++ and not a scripting or meta-language.
The synth book references 3 different APIs - RAFX, VST3 and AU. Is your question about one of these, or about the base synth code that is common to all of them, i.e. the synth objects?
Can you give me a concrete example of a "command" from the book-code that runs at the "frame rate?" Is this referring to audio rendering or GUI control or both?
December 21, 2014
I mean that most of the rendering (I'm looking only at AU) happens within this loop.
// --- the frame processing loop
for(UInt32 frame=0; frame<inNumberFrames; ++frame)
// processing frames
In CSound, there is something called ksmps / control rate. Say it is set at 128.
A variable running at k-rate would be updated (44100 / 128) times per second. Assuming sample rate is 44100.
So the practice is that some variables can be put at k-rate to save cpu cycles.
In commercial synths (say a virtual analog synth) , are the calculations done in real-time like CQBLimitedOscillator or read from a table, like CWTOOscillator?
January 29, 2017
All of the synth parameter variables should be updated on a sample by sample basis for the highest quality rendering. This includes the oscillator, filter, EG and DCA modulated variables. You can add your own code to skip over sets of modulation calculations, but you would need to do this manually in the update() method in each of the synth objects, and if using the modulation matrix, you would need to apply the changes there too. Either way, the audio quality will suffer as you skip over more and more value updates, though you will definitely save CPU cycles. Both my books favor audio quality over everything else, including CPU usage.
The audio quality issue also depends on the modulation parameters that you are making more coarse by skipping over their calculations/updates. For a very slow LFO, you might be able to get away with skipping these variable updates since the LFO output varies so slowly. But, if you were using a pitched oscillator for modulation (the Rush Tom Sawyer "growl" is an example of this with pitched oscillator applied to the filter modulation, added on top of the filter EG), then skipping blocks of samples would not work well, or at all, in some cases as the modulation values are changing so quickly. The same thing would apply to an EG that had very short segment durations. For example, if the attack time was very low, say 64 samples long - nearly instantaneous - and then you skipped over 128 or 256 of the sample updates, you would miss the attack event entirely and wind up somewhere in the decay or sustain segments; clearly this would be unacceptable. And there is another issue - we have to plan for the worst case scenario with the user and assume they will be applying the fastest modulation changes that we allow to make the coding easiest. If you decided to base your parameter-update-skipping-time based on user control settings, the code could quickly become a mess. In addition, what happens if the user changes something while rendering?
In addition, if you are trying to be "virtual analog" then skipping blocks of modulation updates might not be 'legal' in that sense.
The GUI control variables in AU and VST3 are only updated on the synth buffer size boundaries - in AU, the update() function is called prior to the loop that renders the buffers of data, so these variables are only updated on a per-buffer basis, rather than every sample period. This uses less CPU, but it also makes the controls blocky - any control changes that happen during the rendering loop are ignored in this way. In VST3, the updates occur on sub-block boundaries, which is a little smoother, though the Steinberg programmers give you a little hint there in the SDK when they write "TODO: maybe make this sample-accurate"
A feature of the RackAFX API is that it renders on a frame-by-frame basis, so GUI control changes are applied on every sample period, if they are changed. If the are not changed, no extra calculations or variable-fetches take place. Both AU and VST3 can be modified to do the same thing with AU being the easiest, but to avoid unnecessary fetches, you need to use a few tricks and override some functions. If you use RackAFX and Make AU, you will see how I override the methods to make GUI control updates happen as soon as they occur, with no added function calls, fetches, or calculations when changes are not occurring.
But, it seems that you are more concerned with the rendering operations than GUI variable updates which happen on a much slower basis.
The QBLimitedOscillator (BLEP) algorithm is part of the family of quasi-bandlimited algorithms which are designed specifically to avoid lookup tables; see the References (Vesa Valimaki and Julius Smith have published numerous papers on this). "Virtual Analog" synth plugins almost always refer to this kind of algorithmic generation that does not depend on lookup tables or additive synthesis. See the Leary/Bright Patent, which is the basis for the Korg Kronos and Krome VA oscillators - they are also calculated in real time.
Wavetable oscillators and sample oscillators use the lookup table approach and include all synths labeled "wavetable" as well as the sample based synths like the Triton/Karma and most 80's and 90's digital synths.
December 21, 2014
U mentioned that you did a efficient modulation matrix that uses function pointers. I understand that the code is lost.
Could you perhaps describe how it was implemented? What exactly made it more efficient than what is currently in the book?
(I ran the profiler in xcode briefly, it's true that the modulation matrix does takes up quite a lot of cpu.)
January 29, 2017
That mod matrix used function pointers to call the update function on each receiving object after the modulation values had been summed. If no modulation occurred, no function was called.
Yes, the mod matrix is a CPU hog. It needs to zip through the rows in the matrix on each sample interval. This is one of the ways you can optimize, by hardwiring your voice architecture and bypassing it. Please see the mod matrix example in the MMA DLS Level II spec which is in the references. It is the basis for the mod matrix in the book.
Thanks very much for all the explanations, Will. That's what it's all about, the book can only be a trigger, a guide and mentor on the road to understand all this stuff and come up with your own way of doing it. Learning how to be an efficient C++ programmer has not much to do with building a synthesizer per se, and I think the book is a great success, and extremely useful for people that actually do want to learn, and everybody who judges it by the source code alone (or the performance of the unoptimized example projects) is really missing the point. It's for learning, not ready made.
As Will explained already above, the krate of Csound is almost what you get if you only update your variables on every process block. I am pretty sure though that Csound then uses ramping to smooth stuff out, and if you need it, the control rate can be the same as the audio rate. It's a really old language and it made sense back in the days to have a much lower control rate to the audio rate, but I doubt that this will give you an enormous performance boost for your own synth, the math stuff will hit much harder. Csound is conceptually something completely different than a "simple" synthesizer plugin, as you usually write your scores using a programming language too. Here, everything is about the orchestra, so to say in Csound lingo.
As you are living in a C++ universe, you can really do everything you want, but you will have to mess with timing offsets and internal counters to get a specific control rate. The processing block size in VST or AU, which is your hearts pulse in plugin land, can be anything. If you really want to know, you will need to profile, after you have applied all the lookup tables that Will mentioned above.
Most Users Ever Online: 36
Currently Online: W Pirkle
Currently Browsing this Page:
Guest Posters: 1
Newest Members:CoraDias, Edoardo, knooierd, daniel, Merril Bradshaw, BillPlunkett, Pajczur, michaelwayneharwood, RickM, rainbow wind
Moderators: W Pirkle: 187
Administrators: Tom: 66, JD Young: 80, Will Pirkle: 0, W Pirkle: 187