pam_cgfs - cgroup management for unprivileged LXC containers.
LXC has supported fully unprivileged containers since LXC 1.0. Fully unprivileged containers are the safest containers and are run by normal (non-root) users. This is achieved by using user namespaces by mapping between a range of UIDs and GIDs on the host to a different (unprivileged) range of UIDs and GIDs in the container. That means the uid 0 (root) in the container is mapped to an unprivileged user id (something like 1000000) outside of the container and only has rights on resources that it owns itself.
Cgroup management of fully unprivileged containers means restricting the resources used by these containers like limiting the CPU usage of a container, or the number of processes it is allowed to spawn, or the memory it is allowed to consume. It is clear that the fully unprivileged containers are run by normal users and there is a need to limit and manage resource consumption among the containers. But unprivileged cgroup management is not easy with most init systems. So, the pam_cgfs.so came into existence.
The pam_cgfs.so module can handle pure cgroupfs v1 (/sys/fs/cgroup/$controller) and mixed mounts, where some controllers are mounted in a standard cgroupfs v1 hierarchy (/sys/fs/cgroup/$controller) and others in cgroupfs v2 hierarchy (/sys/fs/cgroup/unified). Writeable cgroups are either created for all controllers or, if specified, for only controllers listed as arguments on the command line. Pure cgroup v2 mount is not covered by the pam_cgfs.so module.
The cgroup created user/$user/n will be for the nth session under cgroup kernel controller hierarchy.
Systems with a systemd init system are treated specifically, both with respect to cgroupfs v1 and cgroupfs v2. For both, cgroupfs v1 and cgroupfs v2, the module checks whether systemd already placed the user in a cgroup it created user.slice/user-$uid/session-n.scope by checking whether $uid == login uid. If so, the login user chown the session-n.scope, else a cgroup is created as outlined above (user/$user/n) and chown it to login uid. If the init system has already placed the login user inside a session specific group, the pam_cgfs.so module is smart enough to detect it and re-use the cgroup.
In essence, the pam_cgfs.so module takes care of placing unprivileged (non-root) users into writable cgroups at login and also cleaning up these cgroup hierarchies on logout, so they are free to delegate resources to containers as needed that have been provided to them.
- -c controller-list
- Takes a string argument which sets the list of kernel controllers and named controllers delimited by commas in-between “,”. Named controllers need to be specified in the form “name=$namedcontroller”. Can use “all” enable all cgroup resource controller hierarchies. Specifying “all” and other controllers explicitly returns PAM_SESSION_ERR.
MODULE TYPES PROVIDED¶
Only session module type is provided (and needed).
- Default configuration is added at the end of these files.
session optional pam_cgfs.so -c freezer,memory,named=systemd # default configuration # user writable cgroups are created under freezer, memory and named cgroup systemd hierarchies. # /sys/fs/cgroup/$controller/user/$user/n for freezer,memory. # /sys/fs/cgroup/systemd/user.slice/user-$uid/session-n.scope for systemd. session optional pam_cgfs.so -c all # user writable cgroups are created under all cgroup controllers. session optional pam_cgfs.so -c all,memory,freezer # invalid argument and returns PAM_SESSION_ERR
Venkata Harshavardhan Reddy Allu <email@example.com>