the expectation of being able to run docker whenever in CI jobs is probably the single worst outcome of free GitHub Actions minutes because reproducing it in a bring-your-own-compute environment is borderline impossible unless you make every machine single-tenant
even if you make every machine single-tenant, most configurations of Forgejo Actions runners would enable malware to escape the build container, persist itself and infect all future releases
it's possible that LXC or firecracker-containerd would solve my problem here
@whitequark about 2 weeks ago I learned that “incus” is the new LXC
@jpm @whitequark isn't it the new LXD? While under the hood both use LXC.
@IngaLovinde @whitequark don’t know the details, might be the other way around? Every lxc command is replaced with an identical incus command. The only reason I found out was because I was trying to deploy a new LXC guest and it failed due to linuxcontainers dot org refusing service to LXC tools now.

@jpm @whitequark it used to be LXC for low-level container functionality (a bunch of CLI tools with a bunch of logic on top of kernel containers interface) + LXD on top of LXC, for nice UX to manage the containers.
Then Ubuntu did a takeover of LXD, so incus was born as a libre community fork of LXD.
But incus still uses LXC under the hood, and e.g. on alpine LXC is still a dependency of incus: https://pkgs.alpinelinux.org/package/edge/community/x86_64/incus

I'm not sure which LXC tools are you referring to or how or why linuxcontainers dot org would refuse service to them (seeing how LXC is still listed there https://linuxcontainers.org/lxc/ ) or which LXC commands are you referring to.

incus - Alpine Linux packages

@IngaLovinde @whitequark I don't know the details of it, but this explains what they did and why my existing setup broke: https://discuss.linuxcontainers.org/t/important-notice-for-lxd-users-image-server/18479 - migration to Incus was easy though. I don't deploy new images very often, and tend to stick with older stable releases until support officially ends because I just want things to work and don't care about the new shiny unless I actually need it.
Important notice for LXD users (image server)

Introduction The image server at https://images.linuxcontainers.org has been in operation since early 2014, first offering images to LXC users through the download template and then once LXD came around, it was expanded to also provide LXD container and eventually virtual-machine images. The infrastructure used to build and distribute those images has always been purposefully kept community owned and operated. That’s so that every distribution represented on the image server is on equal footing...

Linux Containers Forum
@jpm @whitequark ...but that's about LXD, not LXC. Just as I was saying.
@IngaLovinde @whitequark ok, i never really paid much attention to the LXD/LXC split because it was never really explained very well when I originally set it up a few years ago, all I know is that when I ran `lxc launch images:debian/13/cloud new-instance` it told me to go away and install incus.

@whitequark so it turns out the folks who run linuxcontainers dot org have basically told canonical to play hide and go fuck yourself and blocked container image downloads from LXC tools. Theres a very simple LXC-to-incus migration script available, and then everything just keeps on trucking after you change lxc to incus in commands lines.

Also, nested containers works, I’ve got podman running inside an incus guest without elevated privileges.

@whitequark i followed the very so-so documentation for non-docker build environments in gitlab and was able to get nested working on incus, formerly LXD. It did require a fair amount of tinkering on the container side to get it the way i wanted.
Having experimented with running docker images on incus, there was also middling success there.
All this to say it works well as infrastructure but you’d likely have a not insignificant amount of tuning before your build scripts ran satisfactorily. 🤷🏻‍♀️
@aizuchi thanks for confirmation; I'm fine with fiddling with this thing, I've sunk a week or two of FTE equivalent into it already, what's another week...
@whitequark lol i hear that. And if forgejo supposedly works with it on their end, it might go more smoothly?
Good luck!
@whitequark I've tried a bunch of options to get the Forgejo Actions runner to spawn microVMs lately: crun-krun, crun-vm, kata-containers, runcvm. Some obstacles: the runner parses container-opts and only passes those along that it knows about. This excludes e.g. annotations. It also hardcodes an alternative entrypoint which can interfere with those of the runtime (e.g. for crun-vm). One of them does not support exec, so is a non-starter (I think it was crun-krun). kata does not work with podman
@matrss ouch. have you considered patching the runner? I've resigned to needing that at some point & already run some random git commit
@whitequark there was an open PR for the entrypoint already when I looked into this. The options parsing is a big hunk of code that I just didn't want to touch, and it wasn't a hard blocker anyway: you can't set `--annotation=run.oci.handler=krun`, but `--runtime krun` does work and should achieve the same thing 🤷

@whitequark to be most compatible with GitHub Actions the VM also needs to run systemd as init. Some of them bring their own init though.

In the end runcvm seemed most promising, it starts a "standard" qemu VM and can do systemd, but startup was super slow.

@matrss have you tried firecracker-containerd?
@whitequark No, for some reason your post is the first time I've heard of it. I will check it out when I find some time.
@matrss please let me know how it goes, I think it's the most promising option so far
@whitequark I tried it out but didn't get it to work. I followed both their "quickstart guide" and their "getting started guide" independently once. Building everything went fine, but for some reason the "devmapper snapshotter" setup produced an error on its first invocation (the second invocation didn't) and in the end trying to start a container with firecracker-ctr simply times out after a minute.

@whitequark I don't think I will spend more time with it, as I have no idea about containerd and firecracker and feel like I would just waste time with it.

To be fair, they explicitly say that the project is in a very early state.

@whitequark Instead I have hacked a bit on a crun wrapper that spawns an incus VM and executes inside of that (similar in concept to crun-vm and runcvm). I got it to launch a VM with podman run and run programs inside with podman exec. Next hurdle to get it to work for the Forgejo runner are mounts. I think I will explore that direction a bit more.

@whitequark Docker and friends are definitely semi-trusted environments. You have to at least know the people who built the image IMO.

Does something like Firecracker help isolate them? It’s another layer, which feels somewhat pointless, but might be necessary for compatibility and security.

@samir well how am I supposed to jam Firecracker into Forgejo Actions runner?
@whitequark I was thinking the other way around. The runner goes inside a firecracker instance.
@samir then malware that escapes the container (trivial with dockerd.sock exposed) can steal the Forgejo Actions secret
@samir there is an "emphemeral" runner function in it but it's incredibly immature and the example in the repo is completely insecure in a way where it shouldn't be used

@whitequark Oh I see, different threat vector to what I was imagining. And yes, it has root, so you have no chance.

My mind is spinning towards single-use tokens from a secrets provider but it’s an immature thought so I’ll spare you.

I hope you figure it out.

@whitequark Also something I really dislike about this stuff is the lack of network isolation, like in distro land all the assets are fetched based on a controlled/trusted script and typically only from a single server, and then everything else is ran entirely isolated.

Like I'm quite surprised that I haven't heard of CI stuff being used as a malicious botnet.
@whitequark isn’t there something like bsds jail or bwrap for that? Or is that still not secure enough? What would the threat be there? O.o
@lixou it uses docker/podman by default but I don't trust Linux containers enough
@whitequark Ignoramus (me): Is this because you don't have anything analogous to nested virtualization at the Docker level?
@whitequark could you use a micro vm like firecracker so that the CI job thinks it has a full machine? The snapshot mechanism might also allow faster startup times.

@th current plan involves using firecracker or crosvm per tenant, which still has the problem i describe in the follow-up post

is there a way to make firecracker pretend it's docker? if not i can't really use it with Forgejo Actions

@whitequark I have read a paper about doing so, but not used it myself. https://github.com/firecracker-microvm/firecracker-containerd
GitHub - firecracker-microvm/firecracker-containerd: firecracker-containerd enables containerd to manage containers as Firecracker microVMs

firecracker-containerd enables containerd to manage containers as Firecracker microVMs - firecracker-microvm/firecracker-containerd

GitHub
crun/krun.1 at main · containers/crun

A fast and lightweight fully featured OCI runtime and C library for running containers - containers/crun

GitHub

@whitequark @th at work we have an environment (on kubernetes, but could be ported to other things) where technically it’s running inside a container but docker works without the awful docker-in-docker hacks, this needs some careful application of user namespaces. I think various commercial offerings like exe.dev and bunny.net (their magic containers product) do things along these lines using kata containers.

The real problem is there are almost too many ways to do this and you get to integrate them yourself…

@dgl @th yep. it's a nightmare. I've spent weeks looking at all the various solutions and theorycrafting, it's so unpleasant
@whitequark Don’t all modern CI systems run each job in an ephemeral VM? It’s about the only security boundary that I’d think you could defend against someone able to run arbitrary code these days, unless you lock down the environment so much that CI can’t do things it needs to do.
@david_chisnall Forgejo Actions runners offer you a choice of "Docker", "Podman", "LXC", and "lol rawdog it on the host"
@david_chisnall right now I'm using rootless Podman and I think it's defendable enough that I'm okay offering it to friends (who may still click on Approve & Run from a sketchy source, mind) but it's not letting cibuildwheel or other Docker-expecting applications run which is a problem

@whitequark

Yup, if your threat model is ‘friends who aren’t actively attacking your infrastructure and will do at least a bit of checking before they hit approve on PRs from other people’ that’s probably fine. I don’t think I’d trust any of those mechanisms for a high-profile project though, given how often privilege-elevation bugs in the Linux kernel are found.

I wouldn’t have thought any of these were easier for management than simply booting a VM with a bunch of preinstalled tools and a CoW base image, and the CI job settings exposed via something like qemufwcfg or a tiny FS on another virtual device.

@david_chisnall @whitequark the rabbithole of self hosted CI is a nightmare, the reality is there is no secure method, the sandboxing options are all flawed.

If I were to make a professional suggestion it would be to spin up a new temporary VM for each job on
someone else's infrastructure and hope for the best.

Good luck
@Baa @david_chisnall I am very well aware of this and if there was a reasonable way to do this with Forgejo Actions I would've already been doing it

@david_chisnall

  • I'm not using VMs because nested virtualization is awfully slow and I designed the runner system I'm using to run on top of commodity cloud compute
  • even if I were to use VMs anyway (or if I set up the bare metal I'm looking at right now, etc) then this still leaves the problem that forgejo-runner can't spawn a VM per build, only a container per build, meaning malware can persist itself
  • @whitequark my experience with podman so far has been "docker's default is to run code inside the container as the container's own root user and podman's default is to have a UID 1000 inside the container run everything" so 90% of the fixes for Containerfiles and containers I've pulled off the net was just to give it the flag to run "root" on the inside. the "podman-docker" or "docker-podman" package gives you a compatible socket in the right path that docker CLI and tools that speak to the docker socket directly can be happy, which may take you almost all the way to workflows requiring "docker-in-docker"? I hope at least one of these is new & helpful information to you
    @timotimo this seems pretty much completely irrelevant
    @whitequark I was thinking cibuildwheel should be able to run with that, but a second look lead me to the "container-engine" option (or CIBW_CONTAINER_ENGINE) that you can set to "podman". is it not running in practice even though it should work in theory?
    @timotimo that fails with inability to find /dev/net/tun and /dev/fuse (the latter might be fixed by fuse-overlayfs, the former not sure)

    @whitequark I've been able to do this just now:

    podman run --rm -it --security-opt label=disable --user podman quay.io/podman/stable podman run -it --rm --user root registry.fedoraproject.org/fedora-toolbox

    that's an image specifically built to use podman inside podman (or docker I guess?) and I'm running it as user and without --privileged and inside of it is a fedora toolbox and inside the fedora toolbox itself I was able to curl codeberg.org

    This might be a good place to start from. Not sure what exactly makes the error about tun/tap not happen with this image, however

    @timotimo try running cibuildwheel (this is the actual workload that's been failing; it's not my workflow but a friend's so i only have a limited amount of insight into what's it doing)
    pyreading

    Python packages used by Arcalibre

    Codeberg.org
    @whitequark I think I'll have time to look more closely this evening!
    @whitequark @david_chisnall are you on using systemd? If so, are you tailoring your security options for the service to be highly restrictive? If yes then no, I might have some good starting place for those options at the office.
    @c0dec0dec0de @david_chisnall I am using systemd but I don't see how this would help considering the attack surface I'm concerned is "the kernel" and maybe "Podman", not "the Forgejo Actions runner" (which is the service I'd be configuring)
    @whitequark @david_chisnall minimize blast radius for the process tree running Podman. We’re doing it with the Jenkins agent config at work, though admittedly there’s only so much you can do.