@Glyg Pas mal du tout ce #LinkML, je ne connaissais pas et c'est super efficace. Je pense que je vais l'adopter pour mes projets (et comme j'ai dΓ©cidΓ© de travailler avec #Python et #Streamlit Γ§a va bien dans ce sens).

Yeah πŸŽ‰ My "sql query generator" got merged. I'm now a contributor to @linkml

https://github.com/linkml/linkml/pull/3127
#linkml #opensource

Create new generator sqlvalidationgen by FlorianK13 Β· Pull Request #3127 Β· linkml/linkml

More information on this PR in Issue #3123 This PR adds: generator for sql validation queries Validations for required, range, pattern, enum, identifier, uniqueness (both single and combined via...

GitHub

It took me ~a year to translate Neurodata Without Borders to linkml+pydantic with full abstraction over array/storage backend. Now that I did that, it is taking me ~hours to make interfaces to put NWB in SQL dbs, web APIs for editing and serving NWB datasets (where you can download arbitrary slices of the individual datasets instead of a bigass 100GB HDF5 file), and interconversion between hdf5, dask, and zarr.

Anyway open data in neuroscience is about to get real good.

#neuroscience #linkml #OpenData #OpenScience

LinkML is the most XKCD 927 technology I have ever seen.

#linkml #xkcd #rdm

Here's an ~ official ~ release announcement for #numpydantic

repo: https://github.com/p2p-ld/numpydantic
docs: https://numpydantic.readthedocs.io

Problems: @pydantic is great for modeling data!! but at the moment it doesn't support array data out of the box. Often array shape and dtype are as important as whether something is an array at all, but there isn't a good way to specify and validate that with the Python type system. Many data formats and standards couple their implementation very tightly with their schema, making them less flexible, less interoperable, and more difficult to maintain than they could be. The existing tools for parameterized array types like nptyping and jaxtyping tie their annotations to a specific array library, rather than allowing array specifications that can be abstract across implementations.

numpydantic is a super small, few-dep, and well-tested package that provides generic array annotations for pydantic models. Specify an array along with its shape and dtype and then use that model with any array library you'd like! Extending support for new array libraries is just subclassing - no PRs or monkeypatching needed. The type has some magic under the hood that uses pydantic validators to give a uniform array interface to things that don't usually behave like arrays - pass a path to a video file, that's an array. pass a path to an HDF5 file and a nested array within it, that's an array. We take advantage of the rest of pydantic's features too, including generating rich JSON schema and smart array dumping.

This is a standalone part of my work with @linkml arrays and rearchitecting neurobio data formats like NWB to be dead simple to use and extend, integrating with the tools you already use and across the experimental process - specify your data in a simple yaml format, and get back high quality data modeling code that is standards-compliant out of the box and can be used with arbitrary backends. One step towards the wild exuberance of FAIR data that is just as comfortable in the scattered scripts of real experimental work as it is in carefully curated archives and high performance computing clusters. Longer term I'm trying to abstract away data store implementations to bring content-addressed p2p data stores right into the python interpreter as simply as if something was born in local memory.

plenty of todos, but hope ya like it.

#linkml #python #NewWork #pydantic #ScientificSoftware

GitHub - p2p-ld/numpydantic: Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)

Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond) - p2p-ld/numpydantic

GitHub

@smallcircles

thanks for the pointer to #LinkML

I need someone to help me think about a #LinkML puzzle…

I have a an abstract class A, which has an attribute "factors" with range Factor. I want to make a class B, that inherits from A, and sets the "factors" to a concrete list of instances of the "Factor" class. Then, I will have instances of class B, that all share this list of "Factor"s in the "factors" attributes.

I can't seem to come up with how to do this mix of class and instance. Can LinkML even represent it?

#FediHelp #LinkedData

Taking a good look at #LinkML for a work thing to document data that needs to be shared between aquaculture labs, to articulate clear, agreed schemata for CSV and NetCDF files. LinkML comes with a tool to take compact tables and explode them into RDF triples 10-50x the size for no obvious benefit. Are people really still materialising triples for ideological reasons? @jonny surely you don't do that with your stuff, right?

i'll say more about what this is in the morning, but anyway here's a #LinkML transcription of #ActivityStreams that will also get the implicit definition of an Actor in #ActivityPub later, along with all the other fun stuff that brings like generic dataclasses and pydantic models for programming with, sql, graphql, json schema... yno all the formats.

https://github.com/p2p-ld/linkml-activitypub

GitHub - p2p-ld/linkml-activitypub: LinkML Schema representation of ActivityPub

LinkML Schema representation of ActivityPub. Contribute to p2p-ld/linkml-activitypub development by creating an account on GitHub.

GitHub

So im almost finished with my first independent implementation of a standard and I want to write up the process bc it was surprisingly challenging and I learned a lot about how to write them.

I was purposefully experimenting with different methods of translation (eg. Adapter classes vs. pure functions in a build pipeline, recursive functions vs. flattening everything) so the code isnt as sleek as it could be. I had planned on this beforehand, but two major things I learned were a) not just isolating special cases, but making specific means to organize them and make them visible, and b) isolating different layers of the standard (eg. schema language is separate from models is separate from I/O) and not backpropagating special cases between layers.

This is also my first project thats fully in the "new style" of python thats basically a typed language with validating classes, and it makes you write differently but uniformly for the better - it's almost self-testing bc if all the classes validate in an end-to-end test then you know that shit is working as intended. Forcing yourself to deal with errors immediately is the way.

Lots more 2 say but anyway we're like 2 days of work away from a fully independent translation of #NWB to #LinkML that uses @pydantic models + #Dask for arrays. Schema extensions are now no-code: just write the schema (in nwb schema lang or linkml) and poof you can use it. Hoping this makes it way easier for tools to integrate with NWB, and my next step will be to put them in a SQL database and triple store so we can yno more easily share and grab smaller pieces of them and index across lots of datasets.

Then, uh, we'll bridge our data archives + notebooks with the fedi for a new kind of scholarly communication....