RFC: Shell environment inside the container?

(1) By Warren Young (wyoung) on 2023-03-25 12:30:24 [link] [source]

The current stock Fossil container is built with a custom, stripped-down version of BusyBox inside on the off chance that someone will need it for debugging. This adds a fair bit of build time, binary size, and, if it were not for the Fossil chroot feature, attack surface.

I had cause to come up with a way to strip the container down to just the Fossil binary recently, using BusyBox only as a setup script runner in an intermediary build stage, subsequently discarded. Since I have yet to find use for this internal debugging environment, I'm tempted to switch the stock container over to this method in the interest of faster build times, smaller images, and the option to run the container as non-root without expanding the potential attack surface.

Would this cause problems for anyone?

(2) By John Rouillard (rouilj) on 2023-03-25 17:46:22 in reply to 1 [link] [source]

Does building without a shell prevent using the CGI extension feature of fossil?

Also removing a shell could impede debugging. Maybe this is addressed by running fossil shell from a docker exec command??

(3) By Warren Young (wyoung) on 2023-03-26 04:18:20 in reply to 2 [link] [source]

Does building without a shell prevent using the CGI extension feature of fossil?

If your question presupposes that you're mistreating the container as a kernel-less VM by injecting some other interpreter into the container along with the scripts, and you're worried about /bin/sh being needed to transfer control to these scripts, then no, that doesn't happen. The bDirect flag is set by the caller in src/extcgi.c.

If instead you're treating the container like containers were meant to be treated, we first have to get around the fact that there's a limit to how much customization you can do without being morally obligated to write your own Dockerfile, at which point why do the limitations of mine matter at all?

One way to use my container images as an immutable base with minimal customizations is to bind-mount the /ext directory into the container. The thing is, if my understanding of the fundamentals are correct, the called scripts will be run on the host side, under host-side interpreters, accessed by host-side shebang paths.¹

If you try to split the difference by copying only the scripts into the container and using only the executable resources you find within, that implies you're writing your /ext scripts in the nearly pure POSIX Almquist shell dialect provided by BusyBox's /bin/sh, not something far more powerful. Really?

In order to even do that, you have to get around the fact that /bin/sh is blocked by the container's internal chroot jail. You can override that at container build time by setting a non-root $USER, but again I ask, really?

Then, having done all this and put up with all of these limitations, your script will affect…what? There's little inside the container worth affecting. Having a concrete example of a script to talk about would at least move this discussion out of the land of pure speculation.

removing a shell could impede debugging

I did bring that up above, yes. Your point? Are you using this container today? Did you need the shell to debug something? If so, what? If not, how do you anticipate it could be necessary?

If we're going to limit ourselves to vague speculation in deciding this question, we might as well go back to a Fedora-based container because someone might need a full package manager and text editors and compilers and and and inside the container. At that point, why have a container at all? Run Fossil on the bare host and be done with it.

I need a specific case where the shell was essential to getting things running [again] if I'm going to keep it.

Containers let you do a whole lot of debugging from the outside. I see no good reason to have ping inside the container, for instance. We aren't doing any complicated networking inside the container, thus have no expectation that we'd get a different result by pinging from inside vs pinging on the host.

^{^} You might as well not use containers at all in that case since you've just shattered its boundary.

(4) By John Rouillard (rouilj) on 2023-03-26 15:58:09 in reply to 3 [link] [source]

Hi Warren:

Your tone seems hostile and I am not sure what I have done to deserve that. So this will be the end of my replies on this.

If your question presupposes that you're mistreating the container as a kernel-less VM by injecting some other interpreter into the container along with the scripts, and you're worried about /bin/sh being needed to transfer control to these scripts, then no, that doesn't happen. The bDirect flag is set by the caller in src/extcgi.c.

That is exactly what I was asking. As a member of the ASPCTC (American Society for the Prevention of Cruelty to Containers) I do not consider this mistreatment. The forking/exec of the CGI might require the shell. Glad to see it does not.

If instead you're treating the container like containers were meant to be treated, we first have to get around the fact that there's a limit to how much customization you can do without being morally obligated to write your own Dockerfile, at which point why do the limitations of mine matter at all?

It looks like you want to have your fossil container used as the container for the Fossil project. If your goal is a containerized build only environment (for the project) or have people deploy using it for a specific use case so be it. It's your creation and your choice.

Containerizing fossil provides a quick way for others to spin up a server using e.g. fly.io or another run container as a service.

However, there are people who use docker who's are docker consumers and not developers. Modifying build flags in an existing Dockerfile, adding a binary to a mount point/volume should be doable by these folks.

Reconstructing the Dockerfile to add a shell and busybox to extend the capabilities of the Dockerfile is significantly more involved as I have found out trying to Dockerize an app I work on. If your choice is to have them build their own Dockerfile that's fine.

One way to use my container images as an immutable base with minimal customizations is to bind-mount the /ext directory into the container.

Agreed.

The thing is, if my understanding of the fundamentals are correct, the called scripts will be run on the host side, under host-side interpreters, accessed by host-side shebang paths.

That's not my understanding at all. All execution started from the container is limited to inside the container. The executables have no access to the host's namespace or filesystem (except as provided by volume/mount). Similar to how any executable started from a chroot is limited to the file system under the new root (assuming you don't include mknod or other root tools that allow escaping from the chroot).

If what you say is true, that simply running a binary located on an external bind mounted volume will run in the hosts's (not containers) namespace that would fundamentally (IMO) eliminate the ability of containers to provide isolation.

Can you give me a pointer to the namespace/containerization documentation that gave you that impression.

(Note that the container boundary may not be totally impenetrable (even barring bugs). I know work was done by RedHat to improve the container security boundary. I am not up to date on the current status as I am no longer in contact with the people doing that work.)

Then, having done all this and put up with all of these limitations, your script will affect…what? There's little inside the container worth affecting. Having a concrete example of a script to talk about would at least move this discussion out of the land of pure speculation.

How about: https://sqlite.org/checklistapp/doc/trunk/README.md. I seem to remember mention of this being run under the /ext path of fossil. Any such script can save/manipulate data stored on an external volume or bind mount.

(5.2) By Warren Young (wyoung) on 2023-03-26 22:41:00 edited from 5.1 in reply to 4 [source]

Your tone seems hostile

In a text medium, "tone" is in your mind.

The emotion behind my prior response is exasperation. I read your unspecified worry as a call for speculation on the entire universe of possible uses of shell scripting. Even if we ignore the fundamental impossibility of that open-ended request, containers are not about serving infinite possibilities from a single image generated by a single Dockerfile. They're exactly the opposite: limiting the scope of operations to some practical, specifically-delimited set that accomplishes some reasonable end. Excepting cases where the "docker create" command can customize things on the fly, each different end requires a different Dockerfile.

I do not consider [injecting external binaries into the container] mistreatment.

The misuse I refer to is to "docker cp" those binaries into the container each time you pull a fresh one to get a new version of Fossil. That's treating the container as a pet. If those customization commands were done as ad hoc one-offs, you just lost all that work.

The Docker solution to this problem is the Dockerfile: a scripted method for recreating a particular container anytime anything of consequence changes, so that it can be updated automatically in computer time, not manually in human time. If you're writing your own Dockerfile, then the limitations of mine don't matter; you can make yours do whatever you want.

This is why I keep harping on the "containers are not VMs" argument.

It looks like you want to have your fossil container used as the container for the Fossil project.

My immediate motivations for this container are:

Be suitable for the one specific use case I know of, to provide the containers serving the Fossil repos interlaced with the static elements on my public web site.

As complex as my use case is, I don't need the shell.
Be usable in general, if only as an example to get started with in broader contexts.

The current examples in the container doc don't show extcgi, and couldn't as-is since the only available script interpreter is currently jailed away for multiple good reasons.
Generate static Fossil binaries in as low-fuss a manner as possible.

The container isn't even run in this case, merely instantiated long enough to extract the binary, then thrown away. Time spent fetching the BusyBox sources and compiling a custom version suited to serving a Fossil repo with minimal expansion of the attack surface is utterly useless.

All three of those primary use cases will continue to be served by a single static binary container.

It's trivial to extend that list to include cases that do require /bin/sh and more, but to do so goes against the container ethos. If you want your container to do something materially different, out of scope from the current offering, you need a different Dockerfile.

All execution started from the container is limited to inside the container.

It appears you're right.

Start by creating a script written with a shebang line that names an interpreter that's available on the host, but not inside the test container. I used this:

#!/usr/bin/perl
print "Hello\n";

Executing that script as "./hello" on the host works, but it fails when run via a container created from this trivial Dockerfile:¹

FROM rockylinux:9.1-minimal
CMD /ext/hello

We can't use Alpine or BusyBox as a base image for this because their shells give an unhelpful and misleading "not found" error in this test, implying that /ext/hello doesn't exist when it plainly does. Bash gives a much more useful error when you then say:

$ docker build -t extruntest .
$ docker run --volume `pwd`:/ext --rm -it extruntest

As you say, when the kernel tries to make use of the shebang line, it looks inside the container for the named interpreter, not outside. It says:

/bin/sh: /ext/hello: /usr/bin/perl: bad interpreter: No such file or directory

If you rewrite the test script as follows, it does work:

#!/bin/sh
echo "Hello"

Okay, so I learned something today, and I do thank you for inspiring that investigation, but I don't see that this actually advances this debate over my prior reply to you. It just gets us back to the deduction that you must be wanting to write your extcgi scripts in the Almquist shell dialect, being the only other thing inside the current container, the one thing I'm proposing to remove. You're seriously suggesting that someone would do that and thus miss it when it's gone?

https://sqlite.org/checklistapp/doc/trunk/README.md

Wapp isn't in the container today. Having established that extcgi doesn't launch scripts via /bin/sh, the current discussion is irrelevant to that case.

If your point is that someone could build a Fossil server container with Wapp inside it so it could run this script or something like it, then we're back to the need for a custom Dockerfile.

Reconstructing the Dockerfile to add a shell and busybox to extend the capabilities of the Dockerfile is significantly more involved

It's a one-line change. You select a base image that has the tools you want (let's reuse Rocky Linux) and then, in the current 2-stage build, change "FROM scratch AS os" to "FROM rockylinux:9 AS os". In the proposed 3-stage build system, you'd do that to the "AS run" stage instead.

^{^} EDIT: Yes, I'm aware that I'm using the /bin/sh -c form of CMD in the Dockerfile. If you change it to "CMD [ "/ext/hello" ]" to force a direct exec() call, the error merely becomes less helpful.

(6) By Warren Young (wyoung) on 2023-03-27 05:04:14 in reply to 5.2 [link] [source]

It's a one-line change.

That realization tipped me over the edge. I not only went ahead and did this, I've documented the steps for doing these minor changes to the Dockerfile to address these "but what about" cases.

Within reason, I'm willing to spend time showing how you can do similar things. When I talk about the need for local custom Dockerfiles, I'm not speaking of total reengineering. You can do a lot with small changes, either to the file itself or to one of the $D?FLAGS type variables used in the container* targets of Fossil's top-level Makefile.