Thanks for that story, Jerry. I know exactly where you're coming from on the subject of optimization.
My story (and I'm not going to try to one-up you!) also includes assembly language programming - 8086/87. It was a machine tool control project that brought me over here to the 'States in 1982 for my employer, Gettys Mfg., who made servo motors and drives, and a strange control system called a "tracer/duplicator" which, when attached to a milling machine, would copy models into real metal by means of tracing over the model with a probe, and controlling servos to move a milling cutter over a raw metal blank. Obviously we were eventually replaced with CNC, but in our heyday our system was used extensively in the automotive industry centered around Detroit/Windsor. and throughout UK and Europe.
The project was canceled in '83 when the company was bought out, I stayed here anyway; offshoot company started by original team to continue the work; I rejoined that team in '89 I think it was, some years after I'd earned my Green Card in '85, and I got involved with the Mk II version, which is where the fun - and my story - starts.
I had to add a lot of extra features (including adding real CNC capability), so I nailed all our existing code into a C framework. To do that I had to remap the register usage and function call structure to match the method the compiler used; quite a task, but at the end of the day I could now write new C functions to be called from the assembly language code under the C framework!
But yes, there was a lot of optimization to do. Our processor complement increased from 2 x 8086/87 to 3x 286/287 on Intel Multibus I, which was not much of an improvement because of bus/shared memory access constraints. The REAL boost came when we graduated to a pair of 486DX, which had the math co-processor built in. That reduced the time for the math co-pro context switch down from some 93 processor cycles to just a handful. Given the amount of floating point math we need (two and three dimensional trig), that made a heckuva difference. I reduced our mainline interrupt loop time from 10ms down to 1.25 ms if I recall, and the beast was still begging for more work to do!
Was your TI DSP the Sharc? I did some work on that while I was software manager at Tech 80, up here in Minneapolis. We got bought out and things went south, but it was fun while it lasted.--