Understanding Linux Cgroups
Cgroups are one of the Linux Kernel features making containers a reality. In this post, we are going to learn what Linux Cgroups are and how they work through simple explanations, illustrations and examples.

What is a cgroup
Linux cgroup or control group is a mechanism that can be used to hierarchically organize processes and control the amount of system resources used inside that processes hierarchy (cpu, ram, read/write speed on devices, etc).
Resources controls are achieved through specific controllers that should be enabled. Here is an overview of some of the controllers, associated to system resources they control:
Cgroup implementation
Cgroup implementation inside the Linux Kernel can be divided into two parts:
-
core code: hierarchical grouping of processes and other stuff not imlemented in controllers
-
controllers: separate subsystems for each resource type (cpu, ram, etc), implementing resource tracking and limits along the hierarchy
Cgroup versions
There are currently two versions of cgroup: version 1 and version 2. Version 1 was initially released in Linux 2.6.24 and over time, after problems related to inconsistencies between controllers and the complex management of the cgroup hierarchy, version 2 was created and officially released in Linux 4.5 to fix that.
Cgroup version 2 is intended to replace version 1 but for now, the version 1 continues to exist and is unlekely to be removed for compatibility reasons. Controllers available in version 1 are progressively ported to version 2. Missing controllers on version 2 can still be used through version 1, while other controllers in version 2 are in use.
Cgroup pseudo-filesystem (cgroupfs)
Cgroup functionalities are exposed to users through the cgroupfs pseudo filesystem, which is by default mounted at '/sys/fs/cgroup' although it can be mounted elsewhere. The cgroup version used by default in recent Linux distributions is version 2.
$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
If you wonder how to mount the cgroupfs somewhere, here is the command:
mount -t cgroup2 none $MOUNT_POINT
Through the cgroupfs, we can enable/disable cgroup controllers, create/remove cgroups, add processes into cgroups to control specific resources usage. We will see that in details in the Cgroup manipulation examples section.
Cgroup hierarchy
Cgroups are organized into a parent-child hierarchy. First there is the root cgroup, the parent of any cgroup present on the system. It is natively provided by the cgroupfs, which is mounted by default at '/sys/fs/cgroup'.
$ ls /sys/fs/cgroup/
cgroup.controllers cgroup.stat cpu.stat dev-mqueue.mount io.pressure memory.pressure sys-fs-fuse-connections.mount system.slice
cgroup.max.depth cgroup.subtree_control cpuset.cpus.effective init.scope io.prio.class memory.stat sys-kernel-config.mount user.slice
cgroup.max.descendants cgroup.threads cpuset.mems.effective io.cost.model io.stat misc.capacity sys-kernel-debug.mount
cgroup.procs cpu.pressure dev-hugepages.mount io.cost.qos memory.numa_stat proc-sys-fs-binfmt_misc.mount sys-kernel-tracing.mount
Creating a directory inside the root cgroup creates a child cgroup, which in turn can have children and so on. Resources distribution across a cgroup hierarchy is top-down (from parents to children).
Only controllers enabled in a parent cgroup can be configured in a child cgroup (enabled, disabled, used to limit resources, etc). This allows resources distribution to children. For more about cgroups parent-to-children resources distribution schemes, have a look at cgroup-resources-distribution-models
Cgroup interface files
Files inside the cgroupfs are cgroup interface files. They are read-write and read-only files that are used to configure the cgroup or get configuration information and statistics about the cgroup and its controllers.
For instance, for a given cgroup, through its read-only 'cgroup.stat' file, we can get the number of its descedent cgroups (children, grandchildren and so on). Also, through the 'cgroup.max.descendants' interface file for instance, we can set the cgroup's maximum number of descendant cgroups.
The 'cgroup.*' files are cgroup core interface files and the others are controllers specific interface files. The 'cpu.*' files for instance are interface files for the CPU controller. Directories represent children cgroups.
Cgroup utilities
In addition to playing inside the cgroupfs directory directly, to manipulate cgroups, there are command line tools we can use to configure cgroups and their associated controllers behaviors. Those tools are provided by the 'cgroup-tools' package. Here is a list of few of them. Follow the links for full command manual and usage examples:
-
cgcreate - create one or more cgroups by defining one or more 'Controllers:Path' couples for each of them, through the '-g' flag. Path represents the cgroup directory path inside the cgroupfs. Controllers represent the list of controllers that should be available in mounted hierarchies (the available cgroupfs on the system) where the cgroup will be created, separated by comma. A wildcare can be used to indicate all the available Controllers.
-
cgexec - execute a program inside a specific cgroup within choosen controllers
-
cgset - set parameters for specific cgroups
-
cgget - show parameters of specific cgroups
-
cgdelete - remove cgroups
Cgroup manipulation examples
How to create a cgroup
Simply create a directory inside the cgroupfs:
$ cd /sys/fs/cgroup/
$ sudo mkdir mycgroup
or use the 'cgcreate' utility as follows:
# Syntax
# cgcreate -g Controllers:Path
# Create a cgroup called mycgroup in hierarchies
# where the cpu and memory controllers are available
$ cgcreate -g cpu,memory:mycgroup
The cgroup interface files are automatically created for the new cgroup. The interface files of the active controllers of that cgroup are also visible:
- cpu.* files - can be used to track and control the CPU resources consumption of processes belonging to the cgroup and its descendants
- io.* files - can be used to track and control the read and write IO speed on specific block devices, for processes belonging to the cgroup and its descendants
- memory.* files - can be used to track and control the Memory resources consumption of processes belonging to the cgroup and its descendants
- pids.* files - can be used to track and control the number of tasks that processes belonging to the cgroup and its descendants can create
- cpuset.* files - can be used to track and control the usage of a specific set of CPU and memory node placement for processes belonging to the cgroup and its descendants
The list of controllers that can be used by a given cgroup are controlled by the parent cgroup's cgroup.subtree_control file. Here is the content of the newly created cgroup directory:
$ ls /sys/fs/cgroup/mycgroup/
cgroup.controllers cgroup.max.descendants cgroup.type cpu.stat cpuset.cpus io.max memory.current memory.max memory.stat pids.current
cgroup.events cgroup.procs cpu.idle cpu.uclamp.max cpuset.cpus.effective io.pressure memory.events memory.min memory.swap.current pids.events
cgroup.freeze cgroup.stat cpu.max cpu.uclamp.min cpuset.cpus.partition io.prio.class memory.events.local memory.numa_stat memory.swap.events pids.max
cgroup.kill cgroup.subtree_control cpu.max.burst cpu.weight cpuset.mems io.stat memory.high memory.oom.group memory.swap.high
cgroup.max.depth cgroup.threads cpu.pressure cpu.weight.nice cpuset.mems.effective io.weight memory.low memory.pressure memory.swap.max
The resources control settings of a non root cgroup are those of its nearest cgroup ancestor if it has no specific configurations.
How to remove a cgroup
Ensure the cgroup doesn't have children cgroups or live processes (not zombies). Then, simply remove the cgroup directory from the cgroupfs:
$ cd /sys/fs/cgroup
$ sudo rmdir mycgroup
or use the 'cgdelete' utility as follows:
# Syntax
# cgdelete -g Controllers:Path
# Delete the cgroup called mycgroup in hierarchies
# where the cpu and memory controllers are available
$ cgdelete -g cpu,memory:mycgroup
How to list available cgroup controllers
Simply have a look at the cgroup's 'cgroups.controllers' file:
$ cat /sys/fs/cgroup/mycgroup/cgroup.controllers
cpuset cpu io memory pids
That file contains the names of the controllers that are available to the cgroup. This means that the cgroup can control these controllers associated system resources for processes it manages.
That list of available controllers to the cgroup is controlled by the parent cgroup's 'cgroup.subtree_control' file:
$ cat /sys/fs/cgroup/cgroup.subtree_control
cpuset cpu io memory pids
Let's add the 'misc' controller to that file and see the impact:
# See the available controllers for the parent cgroup
$ cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc
# Add the misc controller into the parent
# cgroup's cgroup.subtree_control file
$ echo "+misc" > /sys/fs/cgroup/cgroup.subtree_control
# That misc controller is now available
# to the child cgroup called mycgroup
$ cat /sys/fs/cgroup/mycgroup/cgroup.controllers
cpuset cpu io memory pids misc
An empty 'cgroup.subtree_control' file in a parent cgroup directory simply means that no controllers are available to the child. In that case, the processes of the parent cgroup plus those of the child and its eventual siblings will share system resources according to the resource control settings of the parent cgroup.
How to add processes into a cgroup
To add an existing process into a cgroup, simply add its PID to the cgroup's 'cgroup.procs' file:
$ echo 'PID' > /sys/fs/cgroup/mycgroup/cgroup.procs
or use the 'cgexec' utility to run a program inside a cgroup:
# Syntax
# cgexec -g Controllers:Path Command
# Run bash inside the cgroup called mycgroup (from /sys/fs/cgroup)
# where the cpu and memory controllers are available
$ cgexec -g cpu,memory:mycgroup bash
How to list the cgroups of a specific process
Simply have a look at the '/proc/PID/cgroup' file for that process:
# List the cgroups of the process with PID 16461
$ cat /proc/16461/cgroup
0::/mycgroup
How to view and edit cgroup controllers parameters
Directly view and edit the cgroups interface files ('cgroups.*' files from the cgroup directory) or use the 'cgget' and 'cgset' utilities as follows:
# Syntax
# cgget [-r Param1 -r Param2 ...] CgroupName1 [CgroupName2 ...]
# cgset [-r Param1=Value1 -r Param2=Value2 ...] CgroupName1 [CgroupName2 ...]
# Show all parameters values for the cgroup called mycgroup
$ cgget cgroup
mycgroup:
cpuset.cpus.partition: member
cpuset.cpus.effective: 0-1
cpuset.mems:
cpuset.mems.effective: 0
cpuset.cpus:
cpu.weight: 100
cpu.stat: usage_usec 21205578248
user_usec 10674775420
system_usec 10530802828
nr_periods 0
nr_throttled 0
throttled_usec 0
(...)
# Show the io.max parameter value for the cgroup called mycgroup
$ cgget -r io.max mycgroup
mycgroup:
io.max:
# Set the value of the io.max parameter for the cgroup called mycgroup
$ cgset -r io.max="8:0 rbps=max wbps=100000000 riops=max wiops=max" mycgroup
# Verify
$ cgget -r io.max mycgroup
mycgroup:
io.max: 8:0 rbps=max wbps=100000000 riops=max wiops=max
Practical example: limiting processes memory usage
Now let's have a look at a practical example of limiting processes memory usage with a cgroup. Here is the 'memory_allocator.sh' script that allocates 10MB of memory every second until 200MB:
#!/bin/bash
# Directory for memory allocation
ALLOC_DIR="/dev/shm/mem_alloc_$$"
mkdir -p "$ALLOC_DIR"
# Clean up on exit
trap 'echo "Cleaning up..."; rm -rf "$ALLOC_DIR"' EXIT
# Allocate 10MB every second until 200MB is reached
for i in {1..20}; do
dd if=/dev/zero of="$ALLOC_DIR/block_$i" bs=1M count=10 &>/dev/null
allocated_mb=$((i * 10))
echo "Allocated ${allocated_mb}MB"
sleep 1
done
echo "Total 200MB allocated. Holding for 60 seconds before cleanup."
sleep 60
We are going to create a cgroup that limits memory usage to 100MB and then run the 'memory_allocator.sh' script through that cgroup and see what happens.
Let's create a new cgroup called 'mycgroup' in hierarchies where the memory controller is available.
# Create the cgroup called mycgroup.
# We only have one cgroup hierarchy in this case
# which is mounted at /sys/fs/cgroup
$ cgcreate -g memory:mycgroup
Let's list the current settings for the memory controller:
# Listing memory settings of the newly
# created cgroup called mycgroup
$ cgget mycgroup | grep memory
memory.events: low 0
memory.events.local: low 0
memory.swap.current: 0
memory.swap.max: max
memory.swap.events: high 0
memory.pressure: some avg10=0.00 avg60=0.00 avg300=0.00 total=0
memory.current: 0
memory.stat: anon 0
memory.low: 0
memory.swap.high: max
memory.numa_stat: anon N0=0
memory.min: 0
memory.oom.group: 0
memory.max: max
memory.high: max
Now let's set the memory usage limit for that cgroup to 100MB:
$ cgset -r memory.max=100000000 mycgroup
# Verify
$ cgget -r memory.max mycgroup
mycgroup:
memory.max: 99999744
Very good. Now let's run the 'memory_allocator.sh' script through that cgroup and see what happens:
$ cgexec -g memory:mycgroup ./memory_allocator.sh
Allocated 10MB
Allocated 20MB
Allocated 30MB
Allocated 40MB
Allocated 50MB
Allocated 60MB
Allocated 70MB
Allocated 80MB
Allocated 90MB
Killed
Ah! It seems like the script has been stopped right after allocating 100MB of RAM because the cgroup limits its memory usage to 100MB.
Here is the Kernel log that indicates that the 'memory_allocator.sh' process has been killed by the Kernel Out-Of-Memory (OOM) Killer because the memory cgroup of the process was out of memory:
$ dmesg
(...)
[43362.458500] Tasks state (memory values in pages):
[43362.458505] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[43362.458514] [ 2484] 0 2484 1816 701 57344 0 0 dd
[43362.458541] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=mycgroup,mems_allowed=0,oom_memcg=/mycgroup,task_memcg=/mycgroup,task=dd,pid=2484,uid=0
[43362.458611] Memory cgroup out of memory: Killed process 2484 (dd) total-vm:7264kB, anon-rss:1028kB, file-rss:1776kB, shmem-rss:0kB, UID:0 pgtables:56kB oom_score_adj:0
That's all. Hope you better understand Linux cgroups now.
Want to report a mistake or ask questions ? Feel free to email me at gmkziz@hackerstack.org. I will be glad to answer.
If you like my articles, consider registering to my newsletter in order to receive the latest posts as soon as they are available.
Take care, keep learning and see you in the next post 🚀