There's so much noise around the impact of #AI I'm sometimes reluctant to share here - I know many people are just 100% no, but FWIW, it's incredibly useful for biomedical research, and coding. It has made it dramatically easier to deal with the utter mess that is bioinformatics file formats. It accelerates a trend I'd noticed running regular software teams for decades .. requirements and testing are more important than ever (they were always important but now they are almost everything). Examples, architecture, data artifacts, and yes still occasional human supervision.. I'm teaching copilot a little #fsharp daily right now..

@dplattsf I didn't know bioinformatics file formats were even an issue.

Thanks for posting 🙃

@knowprose you're probably writing the ironically if you've ever dealt with bio data, but in case you haven't https://xkcd.com/2054/
Data Pipeline

xkcd

@dplattsf I sincerely did not know. I have gaps in knowledge like everyone else.

So I try to fill them.

Definite win with xkcd. 🙃

The specific use case i did not know.
The general one i have lived.

@knowprose cool and no offense meant, but if you're in the field, our file formats are a horrible joke. they're a necessary evil but suffer from a little https://xkcd.com/927/ , plenty of abandonware as someone writes code for their PhD, learning to code as they go, evolving requirements as science moves quickly, I briefly lost sanity recently dealing with one sequence for data format requirements and had #gemini write a small standup comedy skit on bioinformatics file formats

@dplattsf hah. I have seen it between agencies in various governments.

I assumed until your post that more scientific stuff would have morphed differently.

And I assumed wrong. Yuk.

@knowprose I sometimes long for a simple, traditional data modeling exercise like running a pizza business or something, everything seems easy compared to biology data (especially when you've never done it before - I'm sure there are tricky pizza edge cases like pineapple...).

@dplattsf all the competing standards misalign with the structures they are in.

I expect it might be different frames of thinking.

Standards help.

Standards are only talked about when it's already a Charlie foxtrot.

@knowprose I can imagine LLMs would be powerful for regularizing data across org boundaries. that’s very analogous to biology issues. one study uses „healthy“ another uses „controls“ and the third uses „healthy controls“. You need a tool with language fluency to sort it out (we used to use Regular expressions :( )

@dplattsf yeah, but then you get the outliers that might defy the semantic attractors which creates a shiny new problem.

To deal with that I think you would want a llm trained only on the relevant data. That could help.

But it would have to be audited, I think.

@knowprose for biomedical data, the generalist LLM is really helpful. It’s such a broad subject - diseases are so varied and occur in so many contexts that you want a big vocabulary to deal with them. Tiny linguistic cues can be the difference between dealing with pulmonary hypertension and soil contamination with hydrocarbons. Easy to separate for a widely read entity (you, the LLM or maybe the summer intern).

@dplattsf they still torture interns

Poor souls. 🤣