vim -R /usr/include/linux/capability.h
Containers provide a certain degree of process isolation via kernel namespaces. In this module, we’ll examine the capabilities of a process running in a containerized namespace. Specifically we’ll look at how Linux capabilities can be used to grant a particular or subset of root privileges to a process running in a container.
It would be worth taking a few minutes to read this blog post before beginning this lab.
Capabilities are distinct units of privilege that can be independently enabled or disabled.
Start by examining the Linux process capabilities header file.
vim -R /usr/include/linux/capability.h
Turn on line numbering in vim.
:set number
The capability bit mask definitions begin around line 103. Each capability is defined by a bit position followed by a short description. For example, the first capability is CAP_CHOWN
(bit 0). This capability grants permission to change the ownership of files.
103 /* 104 ** POSIX-draft defined capabilities. 105 **/ 106 107 /* In a system with the [_POSIX_CHOWN_RESTRICTED] option defined, this 108 overrides the restriction of changing file ownership and group 109 ownership. */ 110 111 #define CAP_CHOWN 0
Exit vim:
:q
Now examine the capabilities bit mask of a rootless process running on the host.
grep CapEff /proc/self/status
CapEff: 0000000000000000
The bitmask returned is 0x0
meaning this process does not have any capabilities. For example, since bit 0 is not set (CAP_CHOWN
) a process with this CapEff
bitmask can not execute chown
or chgrp
on a file that it does not own.
Now examine the capabilities bit mask of a root process on the host.
sudo grep CapEff /proc/self/status
CapEff: 000000ffffffffff
Notice that all 38 capability bits are set indicating this process has a full set of capabilities. In a later exercise, you’ll discover that a container running as root has a filtered set of capabilities by default but this can be changed at run time.
The capsh
command can decode a capabilities bitmask into a human readable output. Try it out!
capsh --decode=ffffffffff
0x000000ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39
NOTE: for those interested, capability '38' allows the following:
* CAP_PERFMON relaxes the verifier checks further: * - BPF progs can use of pointer-to-integer conversions * - speculation attack hardening measures are bypassed * - bpf_probe_read to read arbitrary kernel memory is allowed * - bpf_trace_printk to print kernel memory is allowed
NOTE: for those interested, capability '39' allows the following:
* CAP_BPF allows the following BPF operations: * - Creating all types of BPF maps * - Advanced verifier features * - Indirect variable access * - Bounded loops * - BPF to BPF function calls * - Scalar precision tracking * - Larger complexity limits * - Dead code elimination * - And potentially other features * - Loading BPF Type Format (BTF) data * - Retrieve xlated and JITed code of BPF programs * - Use bpf_spin_lock() helper
In comparison, '200' indicates the immutable capability (CAP_LINUX_IMMUTABLE):
capsh --decode=200
0x0000000000000200=cap_linux_immutable
A non-null CapEff value indicates the process has some capabilities. You will discover that the capabilities of a container are often less than what a root process has running on the host.
Start by running a rootless container as --user=32767
and look at it’s capabilities.
podman run --rm -it --user=32767 registry.access.redhat.com/ubi8/ubi grep CapEff /proc/self/status
CapEff: 0000000000000000
Next run a rootless container as --user=0
and look at it’s capabilities. Note that it’s capabilities are filtered from that of a root process on the host.
podman run --user=0 --rm -it registry.access.redhat.com/ubi8/ubi grep CapEff /proc/self/status
CapEff: 00000000a80425fb
Now run a container as privileged and compare the results to the previous exercises. What conclusions can you draw?
sudo podman run --user=0 --rm -it --privileged registry.access.redhat.com/ubi8/ubi grep CapEff /proc/self/status
CapEff: 000000ffffffffff
Run a container in the background that runs for a few minutes.
podman run --user=0 --name=sleepy -it -d registry.access.redhat.com/ubi8/ubi sleep 999
Use podman top
to examine the container’s capabilities.
podman top sleepy capeff
EFFECTIVE CAPS AUDIT_WRITE,CHOWN,DAC_OVERRIDE,FOWNER,FSETID,KILL,MKNOD,NET_BIND_SERVICE,NET_RAW,SETFCAP,SETGID,SETPCAP,SETUID,SYS_CHROOT
Try out the following very useful shortcut (-l
). It tells podman
to act on the latest container it is aware of:
podman top -l capeff
Podman top
has many additional format descriptors you can check out.
podman top -h
How could you determine which capabilities podman filters from a root process running in a container?
From a previous exercise we know that a root process on the host has a capabilities mask of CapEff = 000000FFFFFFFFFF
From a previous exercise we know that a root process in a container has a capabilities mask of CapEff = 00000000A80427FB
Hint: Below is an example that uses the Linux binary calculator bc
to add hexadecimal numbers (0x9 + 0x1) = A
.
sudo yum install bc -y
echo 'obase=16;ibase=16;9+1' | bc
A
Suppose an application had a legitimate reason to change the date (ntpd, license testing, etc) How would you allow a container to change the date on the host? What capabilities are needed to allow this?
Run a container, save the date then try to change the date.
podman run --rm -ti --user 0 --name temp registry.access.redhat.com/ubi8/ubi bash
savethedate=$(date)
date -s "$savethedate"
date: cannot set date: Operation not permitted Mon Apr 8 21:45:24 UTC 2019
exit
You have been given a container image to deploy (quay.io/bkozdemb/hello
). The application needs to use the chattr
utility but must not be allowed to ping
any hosts. Use what you’ve learned about capabilities to properly deploy this application using podman
.
For example, ping
succeeds but chattr
fails. We want the opposite.
podman run -it --name=chattr_no_ping --rm quay.io/bkozdemb/utils bash
ping -c1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.035 ms --- 127.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.035/0.035/0.035/0.000 ms
cd /tmp
touch file
chattr +i file
chattr: Operation not permitted while setting flags on file
Exit from the container:
exit
Challenge #1: One approach would be to use your favorite binary calculator (bc
) to calculate the difference in CapEff
between a host root process (0xffffffffff)
and a containerized root process (0xa80427fb)
.
0xFFFFFFFFFF - 0x00A80427FB ------------ 0xFF57FBD804
echo 'obase=16;ibase=16;FFFFFFFFFF-A80427FB' | bc
FF57FBD804
To produce a human readable list, use capsh
to decode the vector.
capsh --decode=FF57FBD804
0x000000ff57fbd804=cap_dac_read_search,cap_net_broadcast,cap_net_admin,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_lease,cap_audit_control,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39
Challenge #2: To allow a container to set the system clock, the sys_time
capability must be added. Add this capability then try setting the date again.
sudo podman run --rm -ti --user 0 --name temp --cap-add=sys_time registry.access.redhat.com/ubi8/ubi bash
savethedate=$(date)
date -s "$savethedate"
Mon Apr 8 21:46:18 UTC 2019
And exit from the container:
exit
Challenge #3: Drop all capabilities then add linux_immutable
. The trick with this is the container must run as root because linux_immutable
is a filtered capability.
sudo podman run -it --name=chattr_no_ping --rm --cap-drop=all --cap-add=linux_immutable quay.io/bkozdemb/utils bash
The chattr
command should succeed in making a file read only:
cd /tmp
touch file
chattr +i file
rm -rf file
rm: cannot remove 'file': Operation not permitted
lsattr file
----i--------------- file
Remember to reset the file attributes so the container can shutdown cleanly:
chattr -i file
lsattr file
-------------------- file
On the host, check the capabilities of the container. This will require you to open another terminal:
sudo podman top chattr_no_ping capeff
EFFECTIVE CAPS LINUX_IMMUTABLE
Exit the container (in the original container):
exit
Domain | ||
Workshop | ||
Student ID |