Exercise 1.7 - Seccomp

Return to Workshop

Exercise 1.7 - Seccomp

Exercise Description

Secure Computing Mode (seccomp) is a kernel feature that enables you to filter system calls to the kernel from a container. The combination of restricted and allowed calls are arranged in profiles, and you can pass different profiles to different containers. Seccomp provides more fine-grained control than capabilities, giving an attacker a limited number of syscalls from the container. This exercise examines how seccomp works and how it can be employed to provide container security.

The default seccomp profile for containers is a JSON file and can be viewed here: https://github.com/docker/docker/blob/master/profiles/seccomp/default.json. It blocks 44 system calls out of more than 300 available. Making the list stricter would be a trade-off with application compatibility. A table with a significant part of the blocked calls and the reasoning for blocking can be found here: https://docs.docker.com/engine/security/seccomp/.

Berkeley Packet Filter (BPF)

Seccomp uses the Berkeley Packet Filter (BPF) system, which is programmable on the fly so you can make a custom filter. You can also limit a certain syscall by also customizing the conditions on how or when it should be limited. A seccomp filter replaces the syscall with a pointer to a BPF program, which executes that program instead of the syscall. All children to a process with this filter will inherit the filter as well. The command line flag used to operate with seccomp is --security-opt.

Section 1: Defining seccomp policy

The following is an example of how to explicitly define the default seccomp policy for a container:

sudo podman run --security-opt seccomp=/path/to/default/profile.json <container>

Step 1. Create the working directory

mkdir ~/seccomp
cd ~/seccomp
vim 1_chmod.json

Step 2. Copy the 1_chmod.json text below and paste it into the terminal window.

Press i for Insert, then cut and paste control + v, then escape and write the file esc, :wq.

This example seccomp file will be used to disallow the chmod, chown, or chown32 systemcalls.

{
    "defaultAction": "SCMP_ACT_ALLOW",
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
    ],
    "syscalls": [
        {
            "name": "chmod",
            "action": "SCMP_ACT_ERRNO",
            "args": []
        },
        {
            "name": "chown",
            "action": "SCMP_ACT_ERRNO",
            "args": []
        },
        {
            "name": "chown32",
            "action": "SCMP_ACT_ERRNO",
            "args": []
        }
    ]
}

Lets take a closer look at how this file restricts these system calls.

"syscalls": [
        {
            "name": "chmod",                (1)
            "action": "SCMP_ACT_ERRNO",     (2)
1 chmod: is the command and system call which may change the access permissions to file system objects (files and directories). It may also alter special mode flags. The request is filtered by the umask. The name is an abbreviation of change mode.
2 SCMP_ACT_ERRNO: Basically this blocks the chmod syscall. Man page: The thread will receive a return value of errno when it calls a syscall that matches the filter rule.

Step 3. Apply the restrictions to a container

Lets apply these restrictions to a container by using the --security-opt flag to point to our 1_chmod.json file. We will start a container and try to run the chmod command on a file.

sudo podman run --rm -it --security-opt seccomp=1_chmod.json redhatgov/alpine chmod 400 /etc/hosts
chmod: /etc/hosts: Operation not permitted

This FAILED because we explicitly denied this syscall to the container via the seccomp profile. This profile can be updated to enable or disable syscalls for your application.

Section 2: Syscall Identification

Not all container operating systems use the same mappings for syscalls. While one OS may use a direct mapping of the binary chown to the syscall chown it is not always the case for other operating systems.

To find out what syscalls your application or command is using under the covers we need to profile or trace the application and the syscalls it is making. To do this we use a tool called strace.

Strace is used to identify the underlying syscall being made by the operating system. Lets walk through a example where the syscalls being made do not map directly to our syscalls in the 1_chmod.json file.

Step 1. Run Fedora

Let’s run a instance of Fedora and use the same 1_chmod.json profile to try and limit the chmod command inside the container.

sudo podman run --rm \
                -it  \
                --cap-add SYS_PTRACE \
                --security-opt seccomp=1_chmod.json \
                redhatgov/fedora \
                chmod 400 /etc/hosts && echo $?

The command should return a zero. This means that the container was able to chmod the /etc/hosts file. We want to limit this action and need to map the correct syscall to out seccomp profile. To identify the correct syscall we need to use strace.

Step 2. Add the strace program to trace chmod

Lets run the same command again and add the strace program to our command to trace the chmod command.

sudo podman run --rm -it --cap-add SYS_PTRACE --security-opt seccomp=1_chmod.json redhatgov/fedora strace -P /etc/hosts chmod 400 /etc/hosts
stat("/etc/hosts", {st_mode=S_IFREG|0644, st_size=174, ...}) = 0
fchmodat(AT_FDCWD, "/etc/hosts", 0400)  = 0
+++ exited with 0 +++

Step 3. Create a seccomp profile using new mappings

Create a seccomp profile using the new mappings, for system calls, for chmod & chown. Check your answer below.

Create the following profile using vim, or your favorite editor.

2_chmod_fedora.json
{
    "defaultAction": "SCMP_ACT_ALLOW",
    "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
    ],
    "syscalls": [
        {
            "name": "fchmodat",
            "action": "SCMP_ACT_ERRNO",
            "args": []
        },
        {
            "name": "fchownat",
            "action": "SCMP_ACT_ERRNO",
            "args": []
        }
    ]
}

Step 4. Create a seccomp profile with syscall mapping

We have now found the correct syscall to add to our seccomp profile. Let’s create a seccomp profile with our new syscall mapping. Now we can create a seccomp profile called 2_chmod_fedora.json using vim, or your favorite editor. You can copy and paste the seccomp profile above into this profile.

Now that you have your new profile created, let’s run the container again and see if our new seccomp profile blocks chmod & chown from working.

chmod
sudo podman run --rm -it --security-opt seccomp=2_chmod_fedora.json redhatgov/fedora chmod 400 /etc/hosts
chmod: changing permissions of '/etc/hosts': Operation not permitted
chown
sudo podman run --rm -it --security-opt seccomp=2_chmod_fedora.json redhatgov/fedora chown root:root /etc/hosts
chown: changing ownership of '/etc/hosts': Operation not permitted

Section 3: Limiting Network Syscalls

Docker presents the socket syscall to containers by default, this may not be a capability you want your containers to have in certain situations. Let’s look at another example where we use the powerful networking tool, Netcat. Netcat is used for just about anything under the sun involving TCP or UDP. It can open TCP connections, send UDP packets, listen on arbitrary TCP and UDP ports, do port scanning, and deal with both IPv4 and IPv6.

Step 1. Run a container with Netcat installed.

Let’s run a container with Netcat installed in it and listen for local traffic on port 999.

sudo podman run --rm -it redhatgov/fedora bash
In a Container
[root@2b1369bfa927 /]# nc -l 999
^C (1)

[root@2b1369bfa927 /]# exit
exit (2)
1 Netcat successfully connected. Use Control + C to exit Netcat.
2 exit to exit the container.

We were able to bind to the localhost and listen for traffic on port 999. In Step 2 we will disable networking in this container.

Step 2. Identify network-restricting syscalls.

Run strace on the Netcat program, to identify the network-restricting syscalls we need for our seccomp profile, in our container.

sudo podman run --rm -it --cap-drop SYS_PTRACE redhatgov/fedora bash

Step 3. Run strace and the Netcat command.

Next, from inside the container, run strace and the Netcat command.

[root@9ad9f00480a0 /]# strace nc -l 999
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
+++ exited with 1 +++

Step 4. Create a seccomp profile called 3_network.json

Exit the container

[root@9ad9f00480a0 /]# exit

Step 3:

We have now found the correct syscall to add to our seccomp profile. Create a seccomp profile called 3_network.json - using vim, or your favorite editor.

Copy and paste the seccomp profile below into a text editor. Then, save it as a file named 3_network.json to create the profile.

{
   "defaultAction":"SCMP_ACT_ALLOW",
   "syscalls":[
      {
         "name":"socket",
         "action":"SCMP_ACT_ERRNO"
      }
   ]
}

Step 5. Test the new seccomp profile

Now that you have your new profile created, let’s run the container again and see if our new seccomp profile blocks Netcat from working.

sudo podman run --rm -it --security-opt seccomp=3_network.json redhatgov/fedora bash
In a Container
[root@de51762b4213 /]# nc -l 555
Ncat: Unable to open any listening sockets. QUITTING. (1)
1 Netcat is blocked from connecting to a network socket, via the seccomp profile.

Step 6. Exit the container

Exit the container
[root@de51762b4213 /]# exit
exit

This FAILED because we explicitly denied this syscall to the container via the seccomp profile. This profile can help to stop would-be attackers from being able to further compromise a container or container host.


Workshop Details

Domain Red Hat Logo
Workshop
Student ID

Return to Workshop