| Threads | https://www.threads.net/@insta.chav?xmt=AQGzcOdTYpLLpI7Mb0NiisCrqSqOFiQ4Wqgix8LY7UXJU3w |
| Threads | https://www.threads.net/@insta.chav?xmt=AQGzcOdTYpLLpI7Mb0NiisCrqSqOFiQ4Wqgix8LY7UXJU3w |
Perhaps you saw the post series "Python is not a great language for data science"... well, here's
Haskell IS a Great Language for Data Science
https://jcarroll.com.au/2025/12/05/haskell-is-a-great-language-for-data-science/
I’ve been learning Haskell for a few years now and I am really liking a lot of the features, not least the strong typing and functional approach. I thought it was lacking some of the things I missed from R until I found the dataHaskell project. In this post I’ll demonstrate some of the features and explain why I think it makes for a good (great?) data science language.
Welcome to dataHaskell (revived)! https://www.datahaskell.org/blog/2025/11/11/welcome-to-datahaskell.html
by @mchav
@jonocarroll would appreciate if you joined as an advisor or tastemaker of sorts.
https://datahaskell.org/blog/2025/11/11/welcome-to-datahaskell.html
Debugging skill level:
🟢 Beginner: print statements
🟡 Intermediate: debugger
🔵 Expert: taking a shower
Wrote a new article where I checkpoint the work we’ve done so far enabling Kaggle style EDA-to-model workflows in Haskell.
Oh, no! My R package {safespace} is in a broken state - won't someone (new to PRs) help me fix it???
I'm renewing my offer to guide newbies through the R package building / fixing / reviewing process during Hacktoberfest - see this post
https://jcarroll.com.au/2024/10/01/a-safe-space-for-learning-how-to-make-pull-requests/
Open a pull request on https://github.com/jonocarroll/safespace to get a mentored review of your submitted changes with zero risk of breaking anything valuable if you mess it up completely.
Please boost for visibility!
As October rolls around once more, the term Hacktoberfest might pop across your feeds; an effort aiming to encourage people to contribute to open-source software, particularly if they’re new to that. In this post I’ll describe what I’m offering towards that goal.
Have some pretty cool examples of feature engineering using program synthesis on Haskell data frames.
Given a function space, we run a breadth first search to find what functions (and their compositions) have the highest correlation with a target variable.
https://github.com/mchav/dataframe/blob/feature_engineering/app/Main.hs#L32
Earlier this year, the second #AIMO (artificial intelligence mathematical olympiad) concluded, with the winning team solving 34/50 in the final set of math problems (that had been selected to be harder for AI than the first AIMO).
The competition was restricted to open source models and run with a limited amoutn of compute. The AIMO has now conducted a retest of these problems both for the top two teams from that competition (NemoSkills and imagination research), as well as OpenAI's o3 model, both with comparable levels of compute resources, and with high resources. Unsurprisingly, the high resource models did better, with the high resource o3 model scoring as high as 47/50, or even 50/50 if given two tries at each question. On the other hand, the gap between the open source models and the commercial models for a fixed amount of compute was relatively slight.
More details of this experiment are available at https://aimoprize.com/updates/2025-09-05-the-gap-is-shrinking
Starting a series on programming synthesis
https://mchav.github.io/an-introduction-to-program-synthesis/
Introduction This post kicks off a hands-on series about program synthesis—the art of generating small programs from examples. We’ll build a tiny, FlashFill-style synthesiser that learns to turn strings like “Joshua Nkomo” into “J. Nkomo” from input/output pairs. We’ll see how to define a tiny string-manipulation language, write an interpreter, and search the space of programs to find one that solves our toy problem.