Overview

It’s important that I’m not over-stating things here - these are what I consider to be some security best practices; I can’t guarantee that this means your deployments are free of vulnerabilities, but these tips will certainly help.

The vast majority of blogs and tutorials on the internet seem to skip over some basics when it comes to securing your workloads. This in turn can lead to blindly deploying services that aren’t properly secured, or could be vulnerable to being used as a base to attack other areas of a cluster.

In this article, I’m going to describe and demonstrate some simple “best-practices” that you can easily apply to your .NET core applications when deploying to Kubernetes.

Note: these changes aren’t necessarily specific to .NET Core, and could be used for other languages and frameworks too. I have samples in both Rust and Go, and may well add those to this blog in the future.

I’m using Helm in the samples below - simply because I prefer to use Helm for deployments. Whilst my deployment methods may be opinionated in this article, the general principles can be applied without requiring Helm - you might just need to work a bit harder to exclude the Helm template approach.

The source code for this article can be found at https://github.com/michaelrosedev/netcore_kubernetes_securitycontext.

Contents

Running as Root

If a container is running as uid 0, then it is effectively running the same root as on the host. If someone somehow manages to exploit a vulnerability in your code and gets access to your container or access to run scripts, you’re going to be in trouble. There is the potential that someone with access to your container is also going to be able to get access to the host it’s running on.

Note: unless you do anything about it, your container is going to run as root. I.e., unless you specify otherwise, your pods will be running as root.

To avoid this, you need to add a securityContext to your pod definition - see https://kubernetes.io/docs/tasks/configure-pod-container/security-context.

Here’s an extract from the sample project’s Helm chart templates:

First, values.yaml defines some applicable values:

podSecurityContext:
  runAsUser: 2000 # the user id to run as
  fsGroup: 2000 # supplemental group id

Next, the deployment.yaml file uses the values specified in values.yaml when defining the pod specification:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "sample.fullname" . }}
  namespace: {{ .Values.namespace }}
  labels:
    {{- include "sample.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
{{- end }}
  selector:
    matchLabels:
      {{- include "sample.selectorLabels" . | nindent 6 }}
  template:
    metadata:
    {{- with .Values.podAnnotations }}
      annotations:
        {{- toYaml . | nindent 8 }}
    {{- end }}
      labels:
        {{- include "sample.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "sample.serviceAccountName" . }}
      securityContext: # <-- this section here applies the context
        {{- toYaml .Values.podSecurityContext | nindent 8 }}

When the chart is applied, this will result in the securityContext being set for the pod.

Now we also need to specify that the containers should not run as root:

Again in values.yaml we are defining the values to apply:

securityContext:
  runAsNonRoot: true
  runAsUser: 2000

Note: we are setting runAsUser at the container level here as well as the pod spec level. We don’t actually need to do this. If you do, and the values differ, the more specific value will win, i.e. the value specified against the container is more specific than the value specified for the entire pod, so the container-level value will “win”.

This securityContext is then applied in the container spec in deployment.yaml:

      containers:
        - name: {{ .Chart.Name }}
          securityContext: # <-- the security context is applied here
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}

Now when you run your pod, the process will no longer be running as root.

Port Bindings

An issue you will come across if this is the first time you’re attempting to run a .NET Core container is that when you’re not running as a privileged account (like root!) then you’re going to fail to bind to port 80.

To get around this, your workload should expose a port higher than 1,000 and let your service handle the redirection from privileged ports.

You can achieve this in .NET Core in a number of ways, but one of the simplest methods is to supply an environment variable to in your pod definition:

env:
- name: ASPNETCORE_URLS
  value: "http://+:5000"

This will make sure .NET Core binds to port 5000 and your Kubernetes service should reflect this port for the backend.

Unnecessary Capabilities

It’s good practice to drop any unnecessary capabilities from your pods, otherwise there’s a risk.

For instance, failing to drop the NET_RAW capability could allow a compromised pod or workload to be used for “Ping of Death” attacks.

To resolve this, you should drop any capability that your workload doesn’t require.

This can be achieved using the securityContext elements we used to prevent running as root.

In values.yaml we’re extending the securityContext element to drop specific capabilities:

securityContext:
  capabilities:
    drop:
    - ALL
  runAsNonRoot: true
  runAsUser: 2000

These values will be applied to the container via the deployment.yaml Helm template:

      containers:
        - name: {{ .Chart.Name }}
          securityContext: # <-- the security context is applied here
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}

Note: in this example, we’ve dropped all capabilities from the container. You can explicitly define specific capabilities to drop if you need to, for example:

securityContext:
  capabilities:
    drop:
    - NET_RAW
    - KILL

You can also first drop all capabilities, then just add in the capabilities you need:

securityContext:
  capabilities:
    drop:
    - ALL
    add:
    - SYS_TIME

In this example, all capabilities were dropped but the SYS_TIME capability was added.

For more in-depth details relating to security contexts and capabilities see the official Kubernetes docs.

No Resource Limits

A pod without resource limits could potentially end up consuming resources that other services need, or continually grabbing hold of memory until there’s none left to go around.

Starving other workloads of resources could lead to cascading failures in your clusters. It’s much better to set some sensible limits up front - this will not only aid with capacity planning; it will also ensure that workloads don’t consume resources that result in other services failing.

Resources come in two “flavours”: requests and limits:

  • requests - the amount of resource (CPU and/or memory) that your pod needs to run.
  • limits - the limit of resource (CPU and/or memory) that your pod should be able to consume.

These resource requests and limits will be used by the Kubernetes scheduler to appropriately balance resources across a cluster, as well as to ensure that misbehaving workloads are evicted.

Note: Just because Kubernetes can automatically evict or restart workloads the exceed their request limits, this doesn’t mean you shouldn’t care about memory leaks. I’ve seen memory leaks too many times handled by SRE or platform teams rather than tackled at source by development teams.

To define you requests and limits, specify the values in the values.yaml file:

resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 50m
    memory: 96Mi

In this sample, the default resources requested at 50m CPU and 96Mi of memory.

The CPU requests are set in millicores, whereby each core is split into 1,000 millicores. 500m represents ~50% of a single core.

Memory is specified in Mebibytes. 1Mi represents 2^20 bytes (1,048,576).

The initial request is limited to 50m and 96Mi, whereas the limit is set as 100m and 128Mi - each deployed pod can use up to ~10% of a core and 128Mi of memory.

What happens when limits are reached?

  • If your pod reaches the defined limits for memory, it may be killed with an OOM error
  • If your pod reaches the defined limits for CPU, it may be throttled

Either way, it’s extremely important to set limits and monitor them. See this Sysdig article which goes into much more detail, including defining limits per namespace which requires all deployments to specify resource requests/limits.

Privilege Escalation

Sometimes, as a cluster admin, it might be beneficial to deploy a pod which has increased privilege. With normal workloads, however, you should not allow privilege escalation.

If your pod allow privilege escalation, then you may be susceptible to one or more of the following issues:

  • Processes may be able to escape the container
  • Processes may be able to read secrets from Kubernetes, Docker, or from other applications
  • Processes may be able to stop or control Kubernetes, Docker, or other applications

None of these are good things…

To prevent privilege escalation, we can again turn to the securityContext in values.yaml:

securityContext:
  capabilities:
    drop:
    - ALL
  runAsNonRoot: true
  runAsUser: 2000
  allowPrivilegeEscalation: false # <-- prevent escalation of privilege

Writeable File System

To be honest, this part is the most .NET-specific part of the article. Using a read-only filesystem isn’t specific to .NET, but the addition of a writeable portion to allow .NET Core applications to run is…

By default, any volume mounted by your workloads will be writeable, and the filesystem of the container itself will also be writeable. This means that someone who has gained access to your running pod or container could write to the local filesystem.

To prevent this, we need to define the filesystem as read-only. Again we will turn to the securityContext in values.yaml:

securityContext:
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true # <-- root filesystem is read-only
  runAsNonRoot: true
  runAsUser: 2000
  allowPrivilegeEscalation: false

Additionally, if you do have any mounted volumes in your pod definition then you need to specify readOnly: true in your definition, e.g.:

apiVersion: v1
kind: Pod
metadata:
  name: volume-test
spec:
  containers:
  - name: container-test
    image: busybox
    volumeMounts:
    - name: all-in-one
      mountPath: "/projected-volume"
      readOnly: true

Now, because this article uses a .NET Core sample, we need to ensure that the /tmp path is writeable, otherwise when we attempt to run our workload we’re going to get an error:

Failed to create CoreCLR, HRESULT: 0x80004005

This somewhat cryptic message is solved by mounting a special type of volume in your deployment’s definition:

  volumeMounts:
    - mountPath: /tmp
      name: tmp
volumes:
- emptyDir: {}
  name: tmp

This will ensure that when the .NET Core application is started, it will be able to write to the /tmp path successfully.

Kube Scan

To test the security surface area of your cluster from the inside, you can use a tool such as Kube Scan which will highlight all of the issues covered in this article, as well as more.