Wesley Burr

@wsburr
23 Followers
137 Following
45 Posts
Stats prof at Trent University; member of TIES and SSC

When #teaching #Rstats / #statistics courses, I (and several colleagues of mine) made the experience that it is indeed pretty hard for a lot of students to cope with the file system on their computer. They have questions like: How do I know the "path" of a file? How do I control in which directory something is saved? WHY DO I NEED THIS?!?

I don't want to make fun of these students because I know that this could be because operating systems are increasingly obscuring file/directory systems from their users.

But if I want to teach students to use a scripting/ #programming language independently, that's a real problem!

So my questions to you are: Do you have the same impression when teaching? And if so: How do you deal with this from a teaching perspective? To be honest, I don't want to use precious course time to teach the absolute basics of computers' file systems in the first session(s).

This morning's fun has been working on a Golang CLI (it has better PDF processing options then Deno) to interface with https://ollama.com/benhaotang/Nanonets-OCR-s via local Ollama. It's a bonkers good OCR model based on a specially trained Qwen.

This is a page from equimundo's "State of American Men", and this is the result: https://ray.so/kPoztjf

(My Chrome alt txt xtsn also did πŸ‘πŸΌ extracting meaning from the πŸ“ˆ)

Page 6 results were bonkers cool, too.

Heat wave kicks in tomorrow, so shld have code up by then.

@foone
Not trivial with the error correction and redundency in QR codes. This is a working QR code after all.
We're 2.5 years into this gold rush, and I still haven't seen any gold. I've seen people selling picks & shovels. I've seen "gold experts" selling maps to the gold. I've seen CEOs announce they're going "gold-first". I've seen people selling land where they claim there's gold. But no actual gold.

I've just downloaded the most "academic scientist" software ever. Some modelling software for a specific purpose, which consists of an .exe file, and an Excel Spreadsheet. The spreadsheet has 20 tabs, each with dozens of parameters, graphs, outputs, various buttons in cells to initialize the model, run the model, do things. And all that allegedly feeds the data into the .exe and gets the result, and displays it in the spreadsheet.

There is even a tab in the spreadsheet containing, across 9,000 rows... raw C code, one line per row? With a warning to modify at your own risk? What?

Of course, in 2025, no copy of MS Excel is going to allow a spreadsheet to just...run an .exe file. As soon as I went near that big red "START" button all sorts of system warnings popped up saying "Oh no you fucking don't!".

It takes such a certain mindset to think "Ah yes, the most efficient way to distribute my idea is as C code compiled to an .exe and an insane Excel file to drive it".

An LLM "creates textual claims, and then predicts the citations that might be associated with similar text. Obviously, this practice violates all norms of scholarly citation.

At best, LLMs gesticulate toward the shoulders of giants."

@emilymbender , Jevin West, and I contributed to this perspective piece in PNAS. We took a skeptical position; others are very much enthusiasts. Before you pillory me for some random quote in this article, we strongly disagree with some of the claims in the other perspectives.

https://www.pnas.org/doi/10.1073/pnas.2401227121

I've been looking forward to building this #Lego set!
@SCO_SOC <https://www.sco-soc.ca/> is looking to upgrade its website, but they’re a small society without much $. Need folks to be able to renew & pay for membership via the site. Any good, low cost recommendations? (πŸ‡¨πŸ‡¦ hosting & company ideal). @LeaGrie
Home | sco-soc

sco-soc

With this one the thrust is basically there are a number of "seemed like a good idea at the time" type approaches to reusing data analysis work that deliver benefit in the short term, but will get you absolutely wrecked by complexity and technical debt over the long term. I have found only one scalable way to manage the complexity of building data science capability. Yes it involves writing lots of packages πŸ“¦ πŸ“¦ πŸ“¦ πŸ“¦ πŸ“¦ πŸ˜…

https://www.milesmcbain.com/posts/data-analysis-reuse/

#rstats #DataScience

Before I Sleep: Patterns and anti-patterns of data analysis reuse

A speed-run through four stages of data analysis reuse, to the end game you probably guessed was coming.

Before I Sleep

The sum total of my knowledge about running checks for CRAN submission.

https://github.com/coolbutuseless/CRAN-checks

#RStats

GitHub - coolbutuseless/CRAN-checks: Notes about extra CRAN checks

Notes about extra CRAN checks. Contribute to coolbutuseless/CRAN-checks development by creating an account on GitHub.

GitHub