ADR-0017: Enable container autodiscovery to scan container images from private registries
| Status: | DRAFT | 
| Date: | 2022-09-13 | 
| Author(s): | Simon Hülkenberg simon.huelkenberg@iteratec.com | 
Context
The secureCodeBox offers an autodiscovery feature which automatically creates scheduled scans when new services or containers (in pods) are created. The container autodiscovery will start a Trivy container scan for every distinct container in a given namespace. Currently, it is not possible to scan container images that require imagePullSecrets to download the image. The container autodiscovery will create a scheduled scan but the scan itself will fail. Because of that limitation it is necessary to make the container autodiscovery aware of secrets to enable Trivy to download the container images.
Problem regarding imagePullSecretes
It is possible to configure multiple imagePullSecrets in a pods yaml file as seen in this Stackoverflow post. Kubernetes will then try all available credentials until one succeeds. The container autodiscovery needs to do this too because Trivy does not support multiple imagePullSecrets at once. Additionally one can only pass Trivy imagePullSecrets as an environment variable whose name depends on the location of the image (see Trivy docs). For google cloud credentials one as to provide a json file path as an environment variable for example, but for dockerhub username and password as environment variable is sufficient. So Trivy configuration changes quite significantly with different platforms.
Another problem is the fact that imagePullSecrets might change over time. It must be ensured that Trivy has access to the newest imagePullSecrets when the scan starts.
Possible solutions
In general the easiest solution to provide Trivy with the required credentials are kubernetes secretes that are mounted as environment variables. The following solutions use this idea although the secrets are created through different approaches.
Using an initContainer and a sidecar
This option does not alter the behavior of the container autodiscovery at all. All additional logic is moved to an initContainer and a sidecar container.
The initContainer reads the imagePullSecrets of the pod that houses the container to be scanned. The initContainer must determine which secret is the correct one for the scanned container image. It will then create a temporary secret following a predefined naming scheme (technically the Trivy pod will be created while the secret that is mounted as an environment variable doesn't exit yet, thus the temporary secret name must be predefined). After the initContainer created the secret the Trivy container and the sidecar will be started. The sidecar will continuously check if the main scan container exited (similar to the lurker) and then delete the temporary secret that was used to pass the credentials to Trivy to clean up after the scan to avoid cluttering of the namespace.
Pros:
- The autodiscovery logic will not change.
- No secrets that clutter the namespace
Cons: *
- By default, a pod does not have sufficient permissions to create/delete secrets, so the pod needs to be assigned a serviceAccountwith those permissions.
- Logic split up over multiple containers, solution a bit hacky
- Edge Case: Changing the host platform of the container (example: AWS -> GCP) would break the scan because the required environment variables for Trivy must change (as described above). The parameters of the scheduled scan must be changed but the existing autodiscovery operator would lack the logic to do this (no read permissions on secrets so it would notice the difference).
Details
Proof of concept
This simplified example will do the following: TheinitContainer will created a secret that is mounted by the main container imagine-this-is-Trivy. The main container will print the value of the secret and then sleep for 10 seconds. When the main container completes the sidecar will delete the secret.
The real implementation wouldn't use kubectl directly nor the shared volume. This is just a simplification.kind: ServiceAccount
metadata:
  name: internal-kubectl
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: modify-secrets
rules:
- apiGroups: [""]
  resources:
  - secrets
  verbs:
  - get
  - create
  - list
  - delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: modify-secrets-to-service-account
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: modify-secrets
subjects:
- kind: ServiceAccount
  name: internal-kubectl
---
apiVersion: batch/v1
kind: Job
metadata:
  name: secret-test
spec:
  template:
    metadata:
      name: secret-test-pod
    spec:
      serviceAccountName: internal-kubectl
      restartPolicy: Never
      volumes:
      - name: shared-volume
        emptyDir: {}
      initContainers:
      - command:
        - sh
        - -c
        - pacman -Sy && pacman -S --noconfirm kubectl && kubectl create secret generic
          some-image-pull-secret --from-literal=username=im_a_secret!
        image: archlinux
        name: init-secret
      containers:
      - command:
        - sh
        - -c
        - echo secret is $env_test_secret && sleep 10 && touch /shared-volume/shutdown
        env:
        - name: env_test_secret
          valueFrom:
            secretKeyRef:
              key: username
              name: some-image-pull-secret
        image: archlinux
        name: imagine-this-is-Trivy
        volumeMounts:
        - name: shared-volume
          mountPath: /shared-volume
      - command:
        - sh
        - -c
        - pacman -Sy && pacman -S --noconfirm kubectl && while [ ! -f /shared-volume/shutdown ]; do echo shutdownfile not found && sleep 2; done; kubectl delete secrets some-image-pull-secret
        image: archlinux
        name: secret-deletion-on-stop-sidecar
        volumeMounts:
        - name: shared-volume
          mountPath: /shared-volume
Using an initContainer and use ownerReferences for the created secretes
Using the same initContainer logic described above but setting the ownerReference of the created secrets to the job/pod of the scan would result in similar behavior without the need of a sidecar.
Pros: *
- No need for a sidecar while keeping the namespace clean of temporary secrets
Using a separate operator
Instead of creating the secrets for Trivy at scan runtime all secrets could be created upon pod creation. The new proposed operator would not listen on secrets but on pods (not every secret in a namespace is automatically used as an imagePullSecret so it doesn't make sense to subscribe to secrets). When a pod gets created the new operator would check if it has imagePullSecrets defined and if so it would check if the specific container  was already processed. If not then it would check which imagePullSecret belongs to which container and then create the Trivy secrets. Similar to the container autodiscovery this new operator would only create a secret for Trivy for every distinct secret. When a pod gets updated the new operator will check if the imagePullSecrets changed. This approach will clutter the namespace quite substantially because the Trivy secrets exist continuously until the secret is not needed anymore (pod that uses the imagePullSecrets gets deleted). To be clear: The scheduled scans are still created by the existing autodiscovery. The existing autodiscovery will check if the new operator created a secret for the specific container it wants to scan. If a secret for the container was created the existing autodiscovery would mount this secret. There needs to be a way to tell the existing autodiscovery which pod was processed by the new operator. One could use pod annotations to mark pods as processed (processed := secret was created by the new operator).
Pros: *
- Existing autodiscovery does need permissions to read secrets
- It would be possible to only install the existing autodiscovery in case one doesn't like that the secret operator needs read permissions on secrets (Container Autodiscovery and new secret operator could be separate helm charts).
- One could alter the roles of the service account of the new secret operator to only enable it in certain namespaces while using the existing autodiscovery in every namespace (both could use different serviceAccounts).
Cons:
- Namespace gets cluttered by secrets created by the new operator
- The new secret operator needs permissions to read, create and delete secrets
- Edge Case: Changing the host platform of the container (example: AWS -> GCP) would break the scan because the required environment variables for Trivy must change (as described above) The parameters of the scheduled scan must be changed but the existing autodiscovery operator would lack the logic to do this (no read permissions on secrets so it would notice the difference).
Modifying the container autodiscovery
The container autodiscovery itself could test which imagePullSecret belongs to which container and then create a scheduled scan that mounts the correct secret directly (without creating new intermediate secrets). With each pod update the autodiscovery would check if the imagePullSecrets changed (by checking which secret the underlying scheduled scan mounts) and if the imagePullSecret changed it would alter the scheduled scan accordingly.
Pros: *
- No initContainers or additional operators
- Secrets will be mounted directly without creating intermediate secrets, Namespace will not be cluttered.
- One could alter the roles of the service account of the container autodiscovery operator to only allow read permission on secrets in certain namespaces while using the existing autodiscovery in every namespace.
Cons:
- Per default the container autodiscovery has read permissions on secrets. This can only be changed after installation by altering the clusterRole.
Decision
It was decided to use an initContainer that reads the existing secrets through volumes and then it creates a temporary secret. That secret has a owner reference to the pod the initContainer is running in (which is the pod in which the scan is executed). This way the temporary secret gets automatically deleted after the scan is finished.
Consequences
The initContainer must be able to create secrets. This means that the pod in which the scan is executed must use a different serviceAccount than the standard lurker account. A new trivy scan type called trivy-image-autodiscovery was introduced that utilizes a serviceAccount with the same rights as the standard lurker serviceAccount plus the right to create secrets.