So, looks like I've got Tail Call Optimized Recursion working in my #Lisp implementation. The trick is to make sure that you're letting the invocation of the child function happen without any other work to be done.
My first implementation evaluated subexpressions and multiplied. This version passes an accumulator down, and the interpreter is smart enough to turn that into an iterative (ish?) pass through the EVAL loop, rather than an extra frame on the stack. Take that, 1000 factorial!