@dbat the same issue exists in the research data management world with #DataLad / #gitAnnex. One thing that I am doing for our storage servers is regularly run #duperemove on it. It requires filesystem support (xfs/btrfs), but deduplicates on an extent basis, so below the file level. If the difference between two versions only affects a small part of a file it should be able to help. I wonder if it could be run as a post-commit hook, or something like that.

And here is my published dissertation @umphy, about quantifying the natural CO2 exhaust at the Starzach site in Southwest Germany (my result: ~10t/d):

http://hdl.handle.net/10900/176213

I used a lot of #FOSS software and hardware for all of it and it was amazing. Honorable mentions: #gitAnnex, #dataLad, #KiCAD, #OpenSCAD, #PlatformIO, #Arduino, #TexLaTeX. I just wish I'd used #nix / #nixOS sooner.

licensed #OpenAccess under #CreativeCommons CC-BY-4.0

#PhDLife

#forgejoAneksajo #gitAnnex #dataLad crowd:

Anyone else running into this experience-crippling #forgejo bug causing the activity page (de facto landing page for every user) to take extremely long to load (for me 10 seconds)?

https://codeberg.org/forgejo/forgejo/issues/9040

@datalad @forgejo

bug: extremely slow feed load times with many pages

### Can you reproduce the bug on the Forgejo test instance? No ### Description My forgejo instance has really long load times. Loading the Homepage takes about 700ms at best and more than 3 seconds at worst. Page: 704ms Template: 111ms ### Forgejo Version 12.0.0+gitea-1.22.0 ### How are yo...

Codeberg.org

Some more `git subtree push` quirks:

• `git subtree push` (obviously) does not push  #gitAnnex files to the remote. Syncing annexed files there is unergonomic.
• `git subtree push` also strips commit signatures (e.g. GPG and as such #OpenTimeStamps timestamps). The truth lies in the monorepo only. Understandable, but very uncool.

git submodules have neither problem, but without tools like  #datalad you can't commit at once.

#git #gitSubTree

I've overheard someone at #distribits saying that the hash value in the distribits logo is from the first commit to #DataLad and I've just checked and can confirm that this is true (except for the year in the middle, 360f isn't exactly a valid one). Relatedly, I've learned that DataLad apparently was called DataGit in the beginning. I don't actually know what to do with this information.

Do you have big data to share in a forge? Try #forgejoaneksajo! It's an active soft fork of @forgejo, adding git-annex support.
@matrss gave a nice talk about it on @distribits, which you can watch there. Thank you!

https://www.distribits.live/talks/2025/risse-forgejo-aneksajo-a-git-annex-datalad-forge/

PS: I wish #GinGNode (gin.g-node.org) will update to this at some point in the future! It goes one step further, adding DOIs to datasets/code.

#distribits2025 #forgejo #distribits #gitannex #DataLad

Forgejo-aneksajo: a git-annex/DataLad forge

Apply established software development practices to your (meta-)data projects.

distribits

A new article is just released in the

👩‍💻 𝒲𝒾𝒩𝑜𝒟𝒶 ℒ𝒶𝒷 𝒥𝑜𝓊𝓇𝓃𝒶𝓁 🗞️

Read about:
🔸 The tool DataLad
🔸 An event review to enhance DataLad skills

As always in englisch & german:
https://winoda.de/2025/09/25/drei-tage-datalad-workshop-hackathon-in-aachen/

credits to @fabr

You can still read our latest articles, if you missed them:
🔸 LOM (Learning Objetive Matrix)
🔸 How acronyms influence our work (CARE, FAIR)

credits to @AvSchroeder

Upcoming on 29.09:
🔸 Certification of repositories

#Blog #article #winoda #LOM #Lernzielmatrix #DataLad #repositories

Drei Tage DataLad-Workshop & Hackathon in Aachen – WiNoDa Knowledge Lab

Three days of DataLad-Workshop & Hackathon in Aachen https://winoda.de/en/2025/09/25/three-days-of-datalad-workshop-hackathon-in-aachen/ In everyday research, a lot of mostly heterogeneous data is generated, which is often processed and analyzed collaboratively. This involves complex workflows and ML pipelines consisting of numerous transformation and […] #WiNoDaKnowledgeLabJournalen #Data #DataCompetenceCenter #DataLad #Git #Report
Three days of DataLad-Workshop & Hackathon in Aachen – WiNoDa Knowledge Lab

Hello Leipzig, Germany! We're looking forward to the #DataLad workshop on September 29th/30th at @ufz: https://events.hifis.net/event/2531

   

Data Management with DataLad - A Two Half-day Workshop

DataLad is a free and open source distributed data management system that keeps track of your data, creates structure, ensures reproducibility, supports collaboration, and integrates with widely used data infrastructure. It is based on the version control tools Git and git-annex, with added features for reproducible science and streamlined data transport. In this workshop you will learn the basics of working with DataLad in your next project. Duration: The workshop spans two half-days:...

HIFIS and Helmholtz Events (Indico)