Mastodawn

@liw How do you plan to handle multiple clients accessing the same remote storage at the same time? Some locking at the server, perhaps? Or in S3? What happens if the client crashes while holding a lock?

These things currently bother me with #restic as my laptop might be holding a lock while it loses internet access, and it fails to recover from that automatically. Hence sometimes I've been running for months without backups.

https://doc.obnam.org/obnam/arch.html

@HenrikLievonen I aim to not needing to control client concurrency. The backup repository contains chunks. Chunks are never modified. Each backup is a chunk that is the root of a set of chunks. Each such root chunk is owned by a client. If two clients make backups at separate times, they don't interfere with each other.

Obnam architecture

@liw How about deleting chunks? I guess the idea is that a laptop has only append-only credentials by default, but maybe the server or a desktop computer could have a cron job running daily to remove backups that are too old, with suitable credentials of course. Moreover, the desktop and laptop might have shared a chunk that the desktop no longer needs but the laptop still does. How does the desktop know not to delete the chunk? Or the laptop just begins to need the deleted chunk?

@HenrikLievonen The server doesn't delete chunks on its own volition. It has no way of knowing which chunks belong to which backup. A client, with suitable authorization, can delete backups. It mustn't delete chunks that are used by any other backups, even if the backups belong to other clients. At the moment I have no idea how to enforce that, but I haven't finished thinking about it.

@liw By server I mean that the server computer/process might be running a client with delete permissions, as for many it might be convenient to enforce retention policies at the central server.

Deleting data in a sound way is hard, and definitely requires a lot of care to get right. Here using some formal verification tools like TLA+ might be helpful, or even required.

@HenrikLievonen Deleting backups is one of the interesting technical problems in developing backup software, certainly.

https://creativeprojects.github.io/resticprofile/

Rainer Jan 18

@HenrikLievonen @liw If I may give you a tip:

Also check out resticprofile, the friendly wrapper for restic configuration profiles.

I use the functionality that triggers a webhook in case of failure and sends me a Discord message.

So far I was too lazy to establish prometheus and an alert manager for my servers and laptops, since resticprofile also supports metrics for prometheus.

Currently, this works wonderfully via a notification via a chat service.

HTH

resticprofile

Configuration profiles manager for restic backup

resticprofile

@liw Regarding restricting uploads, you can at least restrict the upload by hash of the object by pre-signing x-amz-checksum-sha256 header. The documentation also seems to imply that if-none-match header could be used to prevent uploads if the file exists already.

Jamey Sharp Jan 18

@liw I care about backup software but don't have a lot of focus available at the moment, so I tried to read this and got lost. I was kind of hoping to see something oriented around use cases and user stories, which I might have some hope of providing feedback on with my current attention span, rather than internal architectural details. Have you already written something at that level? I don't see anything that looks like that in a quick glance through the obnam3 tag on your blog.

@jamey I'm afraid I don't have much in the way of high level documentation.

I have https://doc.obnam.org/obnam/arch.html as a high level software architecture overview. If you're short of focus that might still be too detailed?

I have https://doc.obnam.org/obnam/obnam.html for detailed acceptance criteria and how to verify them, but that's too detailed for you.

Obnam architecture

Jamey Sharp Jan 18

@liw Those both look like good documents which, indeed, I'm too sleepy and distracted for at the moment.

I'll just say the main thing I would love to have out of a backup tool's storage layer is something that reliably makes backups even during times when the main storage is unavailable, such as when cloud storage is unreachable or an external backup drive is not plugged in, with the ability to sync updates to those stores later. The way I currently do that is to write my backups to the same disk that I'm backing up, then later manually trigger rsync from the backups to a remote server or to a USB disk. My guess is that backup software which is designed for this use could do better; perhaps keeping only an index of already-known chunks locally, for instance, rather than the complete backup.

@jamey That is an interesting use case. I shall record it in an issue ticket.