whoop whoop full provenance-preserving roundtrip serialization to JSON in #numpydantic 1.6.0.

sure you could serialize a complex, nested data model with a ton of arrays in a precious high-performance data backend as a bunch of random JSON numbers and test the parsers of fate, or you could serialize them as relative paths with an indication of an interface loading class to a json/yaml file and distribute them without losing all the nice chunking and whatnot you gone and done. Whenever they review my PR, if you use numpydantic with @linkml , then you get all the rich metadata and modeling power of linked data with arrays in a way that makes sense, with arbitrary array framework backends rather than some godforsaken rest/first tree or treating arrays as if they're the same as scalar values -- and now complete with a 1:1 JSON/YAML serializable/deserializable storage format.

Another day closer to linked data for real data with tools that feel good to use. Another day closer to p2p linked data.

next up is including hashes, multiple sources, and support for more types of constraints (now that i'm actually getting feature requests for this thing, which is weird to me).

https://numpydantic.readthedocs.io/en/latest/serialization.html

Serialization - numpydantic 1.6.0 documentation

pushing da boundaries of the python type system to make parameterized callable classes lol trust me it makes sense
https://github.com/p2p-ld/numpydantic/pull/8

#numpydantic

Make NDArray callable as a functional validator by sneakers-the-rat · Pull Request #8 · p2p-ld/numpydantic

Problem, when you use a numpydantic "wrap" validator, it gives the annotation as a handler function. So this is effectively what happens @field_validator("*", mode="wrap&qu...

GitHub

Here's an ~ official ~ release announcement for #numpydantic

repo: https://github.com/p2p-ld/numpydantic
docs: https://numpydantic.readthedocs.io

Problems: @pydantic is great for modeling data!! but at the moment it doesn't support array data out of the box. Often array shape and dtype are as important as whether something is an array at all, but there isn't a good way to specify and validate that with the Python type system. Many data formats and standards couple their implementation very tightly with their schema, making them less flexible, less interoperable, and more difficult to maintain than they could be. The existing tools for parameterized array types like nptyping and jaxtyping tie their annotations to a specific array library, rather than allowing array specifications that can be abstract across implementations.

numpydantic is a super small, few-dep, and well-tested package that provides generic array annotations for pydantic models. Specify an array along with its shape and dtype and then use that model with any array library you'd like! Extending support for new array libraries is just subclassing - no PRs or monkeypatching needed. The type has some magic under the hood that uses pydantic validators to give a uniform array interface to things that don't usually behave like arrays - pass a path to a video file, that's an array. pass a path to an HDF5 file and a nested array within it, that's an array. We take advantage of the rest of pydantic's features too, including generating rich JSON schema and smart array dumping.

This is a standalone part of my work with @linkml arrays and rearchitecting neurobio data formats like NWB to be dead simple to use and extend, integrating with the tools you already use and across the experimental process - specify your data in a simple yaml format, and get back high quality data modeling code that is standards-compliant out of the box and can be used with arbitrary backends. One step towards the wild exuberance of FAIR data that is just as comfortable in the scattered scripts of real experimental work as it is in carefully curated archives and high performance computing clusters. Longer term I'm trying to abstract away data store implementations to bring content-addressed p2p data stores right into the python interpreter as simply as if something was born in local memory.

plenty of todos, but hope ya like it.

#linkml #python #NewWork #pydantic #ScientificSoftware

GitHub - p2p-ld/numpydantic: Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)

Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond) - p2p-ld/numpydantic

GitHub