sudo podman run --security-opt seccomp=/path/to/default/profile.json <container>
Secure Computing Mode (seccomp) is a kernel feature that enables you to filter system calls to the kernel from a container. The combination of restricted and allowed calls are arranged in profiles, and you can pass different profiles to different containers. Seccomp provides more fine-grained control than capabilities, giving an attacker a limited number of syscalls from the container. This exercise examines how seccomp works and how it can be employed to provide container security.
The default seccomp profile for containers is a JSON file and can be viewed here: https://github.com/docker/docker/blob/master/profiles/seccomp/default.json. It blocks 44 system calls out of more than 300 available. Making the list stricter would be a trade-off with application compatibility. A table with a significant part of the blocked calls and the reasoning for blocking can be found here: https://docs.docker.com/engine/security/seccomp/.
Berkeley Packet Filter (BPF)
Seccomp uses the Berkeley Packet Filter (BPF) system, which is programmable on the fly so you can make a custom filter. You can also limit a certain syscall by also customizing the conditions on how or when it should be limited. A seccomp filter replaces the syscall with a pointer to a BPF program, which executes that program instead of the syscall. All children to a process with this filter will inherit the filter as well. The command line flag used to operate with seccomp is --security-opt
.
The following is an example of how to explicitly define the default seccomp policy for a container:
sudo podman run --security-opt seccomp=/path/to/default/profile.json <container>
mkdir ~/seccomp
cd ~/seccomp
vim 1_chmod.json
1_chmod.json
text below and paste it into the terminal window.Press i
for Insert, then cut and paste control + v
, then escape and write the file esc
, :wq
.
This example seccomp file will be used to disallow the chmod
, chown
, or chown32
systemcalls.
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"name": "chmod",
"action": "SCMP_ACT_ERRNO",
"args": []
},
{
"name": "chown",
"action": "SCMP_ACT_ERRNO",
"args": []
},
{
"name": "chown32",
"action": "SCMP_ACT_ERRNO",
"args": []
}
]
}
Lets take a closer look at how this file restricts these system calls.
"syscalls": [
{
"name": "chmod", (1)
"action": "SCMP_ACT_ERRNO", (2)
1 | chmod: is the command and system call which may change the access permissions to file system objects (files and directories). It may also alter special mode flags. The request is filtered by the umask. The name is an abbreviation of change mode. |
2 | SCMP_ACT_ERRNO: Basically this blocks the chmod syscall. Man page: The thread will receive a return value of errno when it calls a syscall that matches the filter rule. |
Lets apply these restrictions to a container by using the --security-opt
flag to point to our 1_chmod.json
file. We will start a container and try to run the chmod command on a file.
sudo podman run --rm -it --security-opt seccomp=1_chmod.json redhatgov/alpine chmod 400 /etc/hosts
chmod: /etc/hosts: Operation not permitted
This FAILED
because we explicitly denied this syscall to the container via the seccomp profile. This profile can be updated to enable or disable syscalls for your application.
Not all container operating systems use the same mappings for syscalls. While one OS may use a direct mapping of the binary chown
to the syscall chown
it is not always the case for other operating systems.
To find out what syscalls your application or command is using under the covers we need to profile or trace the application and the syscalls it is making. To do this we use a tool called strace
.
Strace is used to identify the underlying syscall being made by the operating system. Lets walk through a example where the syscalls being made do not map directly to our syscalls in the 1_chmod.json
file.
Let’s run a instance of Fedora and use the same 1_chmod.json
profile to try and limit the chmod
command inside the container.
sudo podman run --rm \
-it \
--cap-add SYS_PTRACE \
--security-opt seccomp=1_chmod.json \
redhatgov/fedora \
chmod 400 /etc/hosts && echo $?
The command should return a zero. This means that the container was able to chmod
the /etc/hosts
file. We want to limit this action and need to map the correct syscall to out seccomp profile. To identify the correct syscall we need to use strace
.
Lets run the same command again and add the strace
program to our command to trace the chmod
command.
sudo podman run --rm -it --cap-add SYS_PTRACE --security-opt seccomp=1_chmod.json redhatgov/fedora strace -P /etc/hosts chmod 400 /etc/hosts
stat("/etc/hosts", {st_mode=S_IFREG|0644, st_size=174, ...}) = 0
fchmodat(AT_FDCWD, "/etc/hosts", 0400) = 0
+++ exited with 0 +++
Create a seccomp profile using the new mappings, for system calls, for chmod
& chown
. Check your answer below.
Create the following profile using vim, or your favorite editor.
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"name": "fchmodat",
"action": "SCMP_ACT_ERRNO",
"args": []
},
{
"name": "fchownat",
"action": "SCMP_ACT_ERRNO",
"args": []
}
]
}
We have now found the correct syscall to add to our seccomp profile. Let’s create a seccomp profile with our new syscall mapping. Now we can create a seccomp profile called 2_chmod_fedora.json
using vim, or your favorite editor. You can copy and paste the seccomp profile above into this profile.
Now that you have your new profile created, let’s run the container again and see if our new seccomp profile blocks chmod
& chown
from working.
sudo podman run --rm -it --security-opt seccomp=2_chmod_fedora.json redhatgov/fedora chmod 400 /etc/hosts
chmod: changing permissions of '/etc/hosts': Operation not permitted
sudo podman run --rm -it --security-opt seccomp=2_chmod_fedora.json redhatgov/fedora chown root:root /etc/hosts
chown: changing ownership of '/etc/hosts': Operation not permitted
Docker presents the socket syscall to containers by default, this may not be a capability you want your containers to have in certain situations. Let’s look at another example where we use the powerful networking tool, Netcat. Netcat is used for just about anything under the sun involving TCP or UDP. It can open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports, do port scanning, and deal with both IPv4 and IPv6.
Let’s run a container with Netcat installed in it and listen for local traffic on port 999.
sudo podman run --rm -it redhatgov/fedora bash
[root@2b1369bfa927 /]# nc -l 999
^C (1)
[root@2b1369bfa927 /]# exit
exit (2)
1 | Netcat successfully connected. Use Control + C to exit Netcat. |
2 | exit to exit the container. |
We were able to bind to the localhost and listen for traffic on port 999. In Step 2 we will disable networking in this container.
Run strace on the Netcat program, to identify the network-restricting syscalls we need for our seccomp profile, in our container.
sudo podman run --rm -it --cap-drop SYS_PTRACE redhatgov/fedora bash
Next, from inside the container, run strace and the Netcat command.
[root@9ad9f00480a0 /]# strace nc -l 999
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
+++ exited with 1 +++
3_network.json
Exit the container
[root@9ad9f00480a0 /]# exit
We have now found the correct syscall to add to our seccomp profile. Create a seccomp profile called 3_network.json
- using vim, or your favorite editor.
Copy and paste the seccomp profile below into a text editor. Then, save it as a file named 3_network.json
to create the profile.
{
"defaultAction":"SCMP_ACT_ALLOW",
"syscalls":[
{
"name":"socket",
"action":"SCMP_ACT_ERRNO"
}
]
}
Now that you have your new profile created, let’s run the container again and see if our new seccomp profile blocks Netcat from working.
sudo podman run --rm -it --security-opt seccomp=3_network.json redhatgov/fedora bash
[root@de51762b4213 /]# nc -l 555
Ncat: Unable to open any listening sockets. QUITTING. (1)
1 | Netcat is blocked from connecting to a network socket, via the seccomp profile. |
[root@de51762b4213 /]# exit
exit
This FAILED
because we explicitly denied this syscall to the container via the seccomp profile. This profile can help to stop would-be attackers from being able to further compromise a container or container host.
Domain | ||
Workshop | ||
Student ID |