Unprivileged containers: mount namespace
The mount namespace isolates the filesystem mount points for a process, so you can have different mounted filesystems.
Code
We are using the same base class as in part 2 of this series. All code is available at bitbucket
import os from base import ContainerBase from system import libc cb = ContainerBase() cb.namespace_flags = libc.CLONE_NEWUSER | libc.CLONE_NEWNS cb.run(os.system, 'bash') cb.wait()
Note that the flag for a mount namespaces is CLONE_NEWNS
and not CLONE_NEWMOUNT or similar. This is for historical
reasons; it was the first namespace.
Running this short script will put you in a shell with both a new user namespace and a new mount namespace.
You can now create a new directory and try various mount commands. It will become apparent that not all commands will work as a regular user. The following sections will discuss what will work and what will not.
Directory and file bind mounts
It is possible to use bind mounts. A directory can be bind mounted to another directory and the directory will appear to be at both places in the file system hierarchy.
$ mkdir mnt $ mount --bind /etc mnt $ ls mnt DIR_COLORS grub.d printcap DIR_COLORS.256color grub2.cfg profile DIR_COLORS.lightbgcolor gshadow profile.d GREP_COLORS gshadow- protocols ImageMagick-6 gss pulse ....
Can also bind mount a file on to a file
$ echo > fmnt $ mount --bind /etc/passwd fmnt $ head -2 fmnt root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin
From an another terminal in the root namespace you will just see the empty file and directory.
When done detach the mount points with umount:
$ umount mnt $ umount fmnt
Or just exiting the namespaced shell will also destroy the mount points.
There is no need for the file or directory that you are covering to be owned or have any particular permissions.
$ ls -l /etc/shadow ----------. 1 nfsnobody nfsnobody 1722 Sep 29 14:18 /etc/shadow $ mount --bind passwd /etc/shadow $ cat /etc/shadow root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin
Of course this change to /etc/shadow is only visible in your namespace, it has no effect on the rest of the system!
Limitations
If you are really root, rather than just root in a user namespace, then you can bind mount a directory that has other file systems mounted beneath it. The bind mount does not include those other filesystems and any file that is 'under' the mounted file system will be visible.
For example if /home is a separate file system, then if you mounted / onto a directory 'mnt', then you would usually see an empty directory at mnt/home.
In a user namespace however, it is simply not possible to bind mount
a file system that has other file systems mounted beneath it by itself.
You can however user the --rbind
option to recursively
bind the filesystem and the other filesystems beneath it.
Why? I speculate that this is done to prevent revealing files that are beneath the mounted filesystem.
Other file system that can be mounted
At this stage you are also able to mount the tmpfs, ramfs and devpts filesystems. Later on in this series it will be possible to mount the proc and sysfs filesystems.
It is worth noting that overlayfs filesystems can be mounted on Ubuntu but not on Fedora, as Ubuntu modifies the standard kernel to allow that.
What is not possible
- You cannot mount regular filesystems even from file images that you can access.
- In a non-user mount namespace you can un-mount a filesystem that is mounted in the parent namespace. You cannot do this in a user namespace.