damn, cross compiling c++ to msdos is kinda cool
it's also surprising how some small quirks affect the speed, c++ compiler vs. lib assembly:
row-by-row memcpy is more than 2x faster than a naive loop (above FPS is for an emulated 486, frame gets drawn entirely, with bg, fonts etc. each time)