Introduction
In today's changing world of technology, where different applications and workloads need to coexist in shared environments, the importance of security and isolation has significantly increased. It has become crucial to protect applications from interfering with each other and limit the actions that workloads can perform. To address this concern various techniques for container sandboxing and isolation have emerged. This article aims to explore some of these techniques, such as seccomp, AppArmor, SELinux, gVisor, Kata Containers, Firecracker and Unikernels. We will discuss the strengths and weaknesses associated with each of them.
Introduction to Container Sandboxing
Imagine you have two workloads that you want to keep separate from each other. This is where containerization and virtualization play a role. These technologies enable you to run applications in isolated environments that make it challenging for them to interfere with one another. Another approach for achieving isolation is through sandboxing which involves restricting an application's access to resources and system calls.
When you run an application, within a container it is essentially placed within its sandbox. Each container contains an application so if an attacker compromises one container they may attempt to execute code beyond what the application was designed for. Sandboxing mechanisms serve the purpose of limiting the capabilities of this code thereby restricting the attackers ability to impact the system.
Seccomp: Restricting System Calls
Seccomp, an abbreviation for " computing mode " is a mechanism within the Linux kernel that controls and limits the set of system calls an application can make. When it was initially introduced in 2005 seccomp only allowed processes operating in this mode to execute a few system calls. These included sigreturn (for returning from a handler) exit (for terminating the process) and read and write operations using file descriptors.
In 2012 seccomp bpf (Berkeley Packet Filters) was introduced, enabling control over system call restrictions. It utilizes profiles to determine which specific system calls a processs permitted to make. Each process can have its seccomp profile. The profile evaluates both the system call opcode and its parameters to decide whether it should be allowed, resulting in either an error message, termination of the process or invocation of a tracer. Seccomp offers advantages in containerization scenarios as it can block unnecessary system calls that containerized applications should not need access to. For example it can prevent attempts to modify the clock time or manipulate kernel modules. Docker employs a default seccomp profile that blocks more than 40 kinds of system calls thereby enhancing security, for most containerized applications.
However by default Kubernetes does not implement a seccomp profile. But don't worry you have the option to add profiles using annotations on PodSecurityPolicy objects. If you want to customize your seccomp profiles there are ways to go about it. You can manually trace system calls using tools, like strace. Utilize eBPF based utilities such as falco2seccomp or Tracee. Additionally some container security tools offered commercially can automatically generate custom seccomp profiles for you.
AppArmor: Application-Level Access Control
Let's talk about AppArmor now. It's a Linux security module (LSM) that provides application level access control. When you associate profiles with files you determine what actions those files are allowed to perform in terms of capabilities and file access permissions. AppArmor ensures that mandatory access controls set by administrators are enforced effectively. This means users won't be able to modify or bypass these controls easily. While Linux file permissions are access controls where users can grant or deny file access AppArmors mandatory controls cannot be altered by users since they're centrally administered.
To create AppArmor profiles, for containers there's a tool called "bane" that you can use. Once the profile is created simply install it under the `/etc/apparmor` directory. Load it using the `apparmor_parser`. With Docker, Containerd, CRI O and Kubernetes all you need to do is add annotations to your container definitions in order to make use of AppArmor profiles.
SELinux: Enhanced Security Through Labelling
SELinux, also known as "Security Enhanced Linux " is an LSM developed by Red Hat. It uses labels to define the permissions and interactions of processes, with files and other processes. Each process operates within an SELinux domain, which represents its context while files are labelled with types.
In addition to discretionary access controls (DAC) SELinux adds mandatory access controls to enhance security. This means that administrators can establish policies that dictate what actions a process from a domain can perform on files of a particular type. SELinux policies can be logged for review than strictly enforced, similar to AppArmors "complain" mode. To create SELinux profiles it is crucial to have an understanding of an application's file access requirements, in normal and error scenarios. While some vendors offer built profiles there may be cases where custom profiles need to be created for specific applications.
gVisor: Combining Sandboxing and Virtualization
gVisor, developed by Google offers an approach to container isolation. It acts as a user space kernel by intercepting system calls to how a hypervisor handles system calls for machines. Consisting of the "Sentry" and "Gofer" components gVisor aims to provide levels of security and isolation compared to containers. The Sentry intercepts system calls made by the application and applies security measures using seccomp. It then delegates file related system calls to the Gofer process. Within the Sentry itself gVisor acts as a guest kernel, in user space reimplementing system calls. While gVisor provides isolation it does have limitations as not all Linux system calls are supported. This may restrict its usability for applications. Additionally applications with rates of system calls might experience some performance impact.
Kata Containers: Running Containers in VMs
Kata Containers offers an approach by running containers within virtual machines. This combines the advantages of utilizing container images with the enhanced isolation provided by machines. To achieve this Kata Containers utilizes a runtime proxy that leverages QEMU to create machines for each container. While this approach ensures isolation it is important to note that it can result in startup times due to the virtual machine boot up process.
Firecracker: Lightweight Virtual Machines
Firecracker tackles the challenge of machine startup times by providing lightweight alternatives. With durations as low as 100ms Firecracker is particularly suitable for container workloads that require advanced security through virtualization. Firecracker achieves startups by eliminating functionality from the virtual machine image. It operates within user space. Utilizes KVM based hardware virtualization.
Unikernels: Minimalistic Operating Systems
Unikernels take isolation to the extreme by creating machine images that only contain the components for running the application. This approach ensures attack surfaces and quick startup times.
To use Unikernels each application needs to be compiled into a Unikernel image, which includes all its required elements. While Unikernels provide security it does require applications to follow an Unikernel format.
Conclusion
Container sandboxing and isolation are vital for application deployment. Techniques such as Seccomp, AppArmor, SELinux, gVisor, Kata Containers, Firecracker and Unikernels offer ways to enhance security and isolation for workloads. Each technique has its advantages and limitations; therefore choosing the tool depends on requirements and constraints. As technology continues to evolve these techniques will become increasingly important in safeguarding our applications and data, in shared environments.
Do you like to read more educational content? Read our blogs at Cloudastra Technologies or contact us for business enquiry at Cloudastra Contact Us.