Perhaps you saw the post series "Python is not a great language for data science"... well, here's

Haskell IS a Great Language for Data Science

https://jcarroll.com.au/2025/12/05/haskell-is-a-great-language-for-data-science/

#haskell  
#rstats 

Haskell IS a Great Language for Data Science

I’ve been learning Haskell for a few years now and I am really liking a lot of the features, not least the strong typing and functional approach. I thought it was lacking some of the things I missed from R until I found the dataHaskell project. In this post I’ll demonstrate some of the features and explain why I think it makes for a good (great?) data science language.

Irregularly Scheduled Programming
@jonocarroll I'm suprised by the seq_len(1e9) example. I thought that even without compuler tricks the ALTREP would work to make it faster than the 114s it took on my computer. 🤔
@jonocarroll saw your blog posted on the data science reddit, pretty hilarious or sad depending on your mood
@defuneste yeah, quite a few people entirely missing the argument being made and complaining about something different. I wonder if they even read the post. r/programming was no better.
@jonocarroll not reading your blog was on point with reddit (kind of) but the comments show a total lack of curiosity which feel weird for DS...
@defuneste @jonocarroll that comment thread was quite the ride.

@jonocarroll sorry if this has been said before, but this is only fast in haskell because you're not actually evaluating the result ^^. running length x takes around a minute before running out of memory on my machine (with 108 it finishes after around 20s).

that's to be expected though for two reasons:
1) you're running this in ghci, which is an interpreter that is optimized for fast interactive use and performs no optimizations at all (although even compiled with -O2, this runs out of memory for me)
2) you're comparing singly linked lists of boxed integers (just about the least cache and memory efficient representation you could use here) to unboxed (probably?), vectorized arrays.

using unboxed arrays from vector instead takes ~12s (compared to ~10s for R on my machine)

i don't doubt that optimizations can give haskell an advantage but they won't make a difference here since you're only running two presumably already well-optimized functions

@prophet ah, that makes sense - I was originally worried that I was shortcutting the result by assigning it to _ but now I see the same is true of an assigned but not-yet-evaluated value. Thank you! I'll update the post.

I couldn't get a clear answer elsewhere but it sounds like you may know - are the rewrite rules like rev/rev/id and map/filter fusion generally implemented or are these purely up to the user to write/activate?

@jonocarroll it's a hint to the optimizer! this means that it will generally try to apply the rule whenever possible, but it might not have a chance to do so depending on other optimizations it does.
e.g. if you have reverse (id reverse list)) (with reverse defined via foldl) and the optimizer inlines the id first, you get reverse (reverse list), which is rewritten to list list
but if it inlines the outer reversefirst, you will get foldl (:) [] (id reverse list) and (after inlining id) foldl (:) [] (reverse list), none of which will trigger your rewrite rule

this is a contrived example and it's not quite as bad in reality since a lot of list functions have carefully chosen INLINE/INLINABLE pragmas to make sure that rewrite rules have enough opportunities to fire, but you still cannot generally rely on rewrite rules.
i actually looked at the core (~optimizer output) of this code (the magical incantation for getting readable output is ghc -O2 -ddump-simpl -ddump-to-file -dsuppress-all -dsuppress-uniques) and it still contained two calls to reverse! (i'm not quite sure why though since it didn't seem to inline anything interesting)

if you really want fusion, there are libraries like massiv (https://hackage.haskell.org/package/massiv) that can guarantee fusion by encoding into the types if an array is actually materialized in memory or just an intermediate step of the computation

massiv

Massiv (Массив) is an Array Library.

Hackage
@prophet so reverse . reverse isn't necessarily optimised for a finite list? I did look a the dump myself but figured I was using it wrong because I saw the double reverse. I know a lot of APLs have optimisations like this, or "last" being equivalent to the first element of a reversed (linked) list
@prophet updated - thanks again!