I've got a `parmap` function implemented with about a 4x speedup on my 8 core, 16 thread machine.
I suspect I might be memory bandwidth bound, since the test compute is trivial, but I'm still pretty happy with these benchmark results.
All the way up to a 1000-element vector the OS process overhead dominates the Rust-managed benchmark, while the Alan-managed benchmark (which is noisy due to a sample size of 1) can measure from around a 100-element vector and up, and measures *only* the `map`/`parmap` operation, indicating approximately 29ns per iteration in series and an amortized 9ns per iteration in parallel.
Anyone who remembers Alan v0.1 will understand just how *vast* the performance difference is. This is due to Rust and the LLVM's compiler optimizations, but I'm still excited to see Alan being useful for high performance use cases.
Now, to build a proof-of-concept GPGPU mechanism and demo! :D