ADR-0012: Cluster Wide Custom Resources
| Status: | OPEN | 
| Date: | 2022-06-17 | 
| Author(s): | Jannik Hollenbach jannick.hollenbach@iteratec.com, Max Maass max.maass@iteratec.com | 
Context
Currently all custom resources for the secureCodeBox are isolated into the namespace they are installed from. If you start a scan of type nmap in namespace demo-one you'll have to have the ScanType (and the corresponding ParseDefinition) nmap installed in demo-one. This is usually not a big issue as installing a ScanType is pretty easy (helm install nmap oci://ghcr.io/securecodebox/helm/nmap --namespace demo-one).
If you then want to start other scans for other targets you might want to create another namespace demo-two. To run scans in demo-two you'll also have to install nmap in that namespace.
Another (possibly even more annoying) scenario with the need to have the ScanTypes and ParseDefinitions installed in every namespace is apparent when looking at the Kubernetes AutoDiscovery. The AutoDiscovery automatically starts scans for resources (e.g. a ZAP Scan for http service, trivy scans for container images) it discovers in the individual namespaces. At the moment this only works properly if the namespace where the resource was discovered in has the correct ScanType installed.
Prior Art
The cert-manager project has a similar concept and has inspired part of the following document.
CertManager has two different Custom Resource Definitions for issuers: Issuer and ClusterIssuer, with Issuer being scoped to a single namespace and ClusterIssuer being cluster-wide and available to every namespace in the cluster. See more on the cert-manager issuer docs: https://cert-manager.io/docs/concepts/issuer/
Assumptions
This proposal aims to provide a solution which makes the secureCodeBox easier to use both in single and multi-tenant cluster. For multi-tenant clusters, this proposal assumes that access to the cluster-wide custom resources, proposed in this ADR, is locked down to be only accessible by cluster admins and not by everybody.
Proposal 1: Mixed Cluster-wide and Namespace-local Resources
In addition to the existing ScanType, ParseDefinition, ScanCompletionHook and CascadingRule, the following additional cluster-wide scoped resource should be introduced: ClusterScanType, ClusterParseDefinition, ClusterScanCompletionHook and ClusterCascadingRule. The behavior of these is detailed in the following sections.
The CRD Scan and ScheduledScan don't require ClusterWide variants as they are tied directly to the execution of the scan jobs which are themselves tied to a namespace. They don't provide any service / re-usability to other secureCodeBox components (unlike the CRDs listed above).
ClusterScanType
Other than being cluster scoped, ClusterScanTypes are identical to the existing ScanType CRD.
When a new scan is started and the operator requires the scan job template, it should first look for ScanTypes in the Scans namespace matching the Scan.spec.scanType name configured in the scan. If that is not found the operator should look for ClusterScanTypes with the same name.
No matter which one was picked, the Job for the scan always has to be started in the namespace of the Scan.
Issues with ConfigMap / Secret Mounts:
Some features of normal scanTypes will not function normally with ClusterScanTypes, especially mounting values from ConfigMaps / Secrets as files / environment variables into your containers as the ConfigMaps / Secrets are not existing in every namespace and there is no sensible way to roll them out with our helm chart.
When deploying cluster scans types you have to make sure that either:
- All referenced ConfigMaps / Secrets exist in all namespaces. This will likely be done by a script / operator generating these configmaps and persisting them into every namespace. These scripts and operator will likely differ greatly from user to user as the ConfigMaps and Secrets will likely have to be configured differently for every team (e.g. different access tokens used by scanners). If they should all be the same, users can use third party tools like the Cluster Secret Operator and similar to sync their configs across namespaces.
- All dependencies to ConfigMaps / Secrets are removed from your scanTypes and either moved into the container image of the scanner, or are specified by an initContainer in the ScanType. The initContainer copies these files over to volumes shared with the scanner container. Some more references and discussions on ways to improve this in the future can be found in ADR-0009: Architecture for pre-populating the file system of scanners
ClusterParseDefinition
Other than being cluster scoped, ClusterParseDefinitions are identical to the existing ParseDefinition CRD.
When a scan job has completed and the results need to be parsed, the operator should first look for ParseDefinitions in the Scans namespace matching the scantype.spec.extractResults.type / clusterscantype.spec.extractResults.type name configured in the ScanType.
If that is not found the operator should look for ClusterParseDefinitions with the same name.
No matter which was picked the Job for the parser always has to be started in the namespace of the Scan.
ClusterCascadingRule
Other than being cluster scoped, ClusterCascadingRules are identical to the existing CascadingRule CRD.
When the cascading scans hook starts, the hook should fetch the list of both CascadingRules and ClusterCascadingRules and merge them together into one list. The list then gets evaluated against the scan just like it is now. All scans started from it have to be in the same namespace of the original Scan.
Deduplication of CascadingRules
One potential issue with CascadingRule and ClusterCascadingRule is that in contrast to ScanType and ParseDefinition both namespaced and cluster-wide types are used. This can be problematic when both the namespace as well as the cluster level have similar / equal CascadingRule installed as this could lead to both producing the same scan which would then be executed twice.
Potential workarounds here would be to introduce an additional field to the CascadingRule spec which allows us to define a sort of "deduplication" / replacement mechanism.
ClusterScanCompletionHook
ClusterScanCompletionHook behave mostly like their namespaced counterpart, when the parser completes, the operator fetches the list of ScanCompletionHooks from Scans namespace (respecting the hookSelector of the Scan) and the ClusterScanCompletionHooks (also respecting the hookSelector of the Scan (this might have to be discussed...)) are merged and ordered according to the respective priorities set on them.
The execution of the hooks is different to the current model, as there must be an additional setting on the ClusterScanCompletionHook called namespaceMode which allows users to configure to either run the hook in the same namespace as the scan it is getting executed for, or to provide a fixed namespace where all executions for this ClusterScanCompletionHook will be run in.
Example ClusterScanCompletionHook specs with this being set:
Example ClusterScanCompletionHook which gets executed inside the scans namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
  name: slack-notification
spec:
  type: ReadAndWrite
  image: "docker.io/securecodebox/notification"
Example ClusterScanCompletionHook which gets executed inside a fixed namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
  name: example-cluster-hook
spec:
  # spec.executionNamespace - would let you set a fixed namespace for the hook to get executed in
  executionNamespace: example-namespace-name
  type: ReadAndWrite
  image: "docker.io/securecodebox/notification"
This fixed namespace mode has been added to the ClusterScanCompletionHook but not for the ClusterScanType for the following reasons:
- Hooks are usually used to interface with 3rd party systems which are often standardized for the entire company. Allowing to run them in a namespace separate from the individual teams / scans allows the hook to access these systems without having to distribute and thus potentially compromise the credentials for the 3rd party systems.
- In contrast to scans, hooks usually do not need to send requests to the scanned applications. Scanning an app from outside its namespace opens up a lot of potential issues with network policies blocking inter network traffic between scanner and app. As the hooks only need to access the S3 bucket and the 3rd party system they are interfacing with, this is not an issue for hooks.
Potential Issues With the Fixed executionNamespace Mode
Can lead to conflicts between teams if a scan of "team 1" fails because of an error in the ClusterScanCompletionHook managed by a "central cluster team". Members of "team 1" will likely not even have access to the namespace the failing hooks run in and won't be able to debug / fix these issues by themselves.
Proposal 2: Distinct Cluster-wide and Namespace-local Resources
This alternative proposal uses the same cluster-wide CRDs mentioned above, but adds two new types: ClusterScan and ClusterScheduledScan. These are used to enforce a stricter separation between cluster-wide and namespace-local resources. Unless stated otherwise, non-Cluster resources do not interact in any way with their Cluster equivalents (e.g., a non-Cluster scan will not use a ClusterScanType, and vice versa).
ClusterScan and ClusterScheduledScan
The ClusterScan is almost identical to the Scan type, however, it includes an additional field called executionNamespace that controls in which namespace it is scheduled. The operator will schedule it in to the namespace, or throw an error if the namespace does not exist or the operator cannot schedule into it for any reason. They will only trigger a ClusterScanCompletionHook, only respect ClusterCascadingRule, and in all other ways be kept separate from non-Cluster-resources, with one major exception: access to namespace-specific ConfigMaps and Secrets. Here, this access is desireable, as it allows teams to customize the behavior of cluster-managed scans to their own situation (e.g., provide a cluster-wide ZAP scan with a namespace-specific authentication configuration for the microservice). The same consideration from the other proposal apply for ensuring that the secrets and configMaps are available.
To avoid exposing secrets and ConfigMaps to cluster users who can create cluster-wide scans, we could implement a restriction in the operator (or using a validating webhook) that checks which secrets and ConfigMaps are being mounted by such a scan, and reject the scan if any of the secrets or ConfigMaps does not have a special label or annotation that marks them as being exposed to this feature.
These CRDs will likely not be used a lot by human operators, but they are helpful for use with the autodiscovery feature, which can use them to schedule scans into namespaces without interfering with any scans, hooks or other features the owners of the namespace are using. Pods scheduled from scans like this should also receive a distinct label or annotation from regular scans in that namespace, to allow for fine-grained network policies (e.g., "Cluster scans are allowed to access this specific server that caches nuclei templates for them, but regular scans are not").
By default ClusterScans cannot be seen by users that only have access to their own namespace.
ClusterScanType
Other than being cluster scoped, ClusterScanTypes are identical to the existing ScanType CRD. They do not interact in any way with regular scans, and are only used by a (Scheduled)ClusterScan.
ClusterParseDefinition
Other than being cluster scoped, ClusterParseDefinitions are identical to the existing ParseDefinition CRD. They do not interact in any way with regular scans, and are only used by a (Scheduled)ClusterScan.
ClusterCascadingRule
Other than being cluster scoped, ClusterCascadingRules are identical to the existing CascadingRule CRD. They do not interact in any way with regular scans, and are only used by a (Scheduled)ClusterScan. Any scan they trigger will automatically be a ClusterScan, and it will only be scheduled into the same namespace the triggering ClusterScan was running in (the operator / hook will prevent anything else).
ClusterScanCompletionHook
ClusterScanCompletionHook behave mostly like their namespaced counterpart: when the parser for a ClusterScan completes, the operator fetches the list of ClusterScanCompletionHook (respecting the hookSelector of the ClusterScan), and processes them the same way it would regular ScanCompletionHooks (prioritization etc.). The concept of a "fixed namespace mode" from the other proposal can be maintained, if desireable.
Proposal 3: Mixed Cluster-wide and Namespace-local Resources, Mode Controlled by CRD Field
The third proposal is a middle ground between proposals 1 and 2. It uses the same CRDs as proposal 1, and does not include a ClusterScan or ClusterScheduledScan resource. However, it internally separates scans using namespace and global resources, based on a field in the Scan CRD. This facilitates a more predictable execution, as namespace-specific scans cannot be influenced by cluster resources, and vice versa.
Changes to existing CRDs
The Scan CRD receives an extra field that describes the mode of the scan. It can be set to namespace (which is also the default if it is not specified), or cluster. A scan in namespace mode will behave exactly the way it does today, and ignore all cluster-wide resources. The behavior of scans in the cluster mode will be defined below.
ClusterScanType
Other than being cluster scoped, ClusterScanTypes are identical to the existing ScanType CRD.
When a cluster-mode Scan is created, it will search for a matching ClusterScanType, ignoring namespace-specific installed ScanTypes. The Job is then created normally, based on the template provided in the ClusterScanType.
Issues with ConfigMap / Secret Mounts:
Some features of normal scanTypes will not function normally with ClusterScanTypes, especially mounting values from ConfigMaps / Secrets as files / environment variables into your containers as the ConfigMaps / Secrets are not existing in every namespace and there is no sensible way to roll them out with our helm chart.
When deploying cluster scans types you have to make sure that either:
- All referenced ConfigMaps / Secrets exist in all namespaces. This will likely be done by a script / operator generating these configmaps and persisting them into every namespace. These scripts and operator will likely differ greatly from user to user as the ConfigMaps and Secrets will likely have to be configured differently for every team (e.g. different access tokens used by scanners). If they should all be the same, users can use third party tools like the Cluster Secret Operator and similar to sync their configs across namespaces.
- All dependencies to ConfigMaps / Secrets are removed from your scanTypes and either moved into the container image of the scanner, or are specified by an initContainer in the ScanType. The initContainer copies these files over to volumes shared with the scanner container. Some more references and discussions on ways to improve this in the future can be found in ADR-0009: Architecture for pre-populating the file system of scanners
ClusterParseDefinition
Other than being cluster scoped, ClusterParseDefinitions are identical to the existing ParseDefinition CRD.
When a cluster-mode scan job has completed and the results need to be parsed, the operator only considers ClusterParseDefinitions for parsing the results. The parsing job is then started as normal.
ClusterCascadingRule
Other than being cluster scoped, ClusterCascadingRules are identical to the existing CascadingRule CRD.
When the cascading scans hook is installed as a ClusterScanCompletionHook, it only uses ClusterCascadingRules to create cascading scans. They are evaluated against the cluster-mode scan like normal, and ignore namespace-mode scans entirely. All scans started from it have to be in the same namespace as the original Scan.
ClusterScanCompletionHook
Other than being cluster scoped, ClusterScanCompletionHooks are identical to the existing ClusterScanCompletionHook CRD, except for the addition of a new setting called namespaceMode which allows users to configure to either run the hook in the same namespace as the scan it is getting executed for, or to provide a fixed namespace where all executions for this ClusterScanCompletionHook will be run in. Like all other cluster-scoped resources, ClusterScanCompletionHooks will only be executed on scans running in the cluster mode, not those in namespace mode.
Example ClusterScanCompletionHook which gets executed inside the scans namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
  name: slack-notification
spec:
  type: ReadAndWrite
  image: "docker.io/securecodebox/notification"
Example ClusterScanCompletionHook which gets executed inside a fixed namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
  name: example-cluster-hook
spec:
  # spec.executionNamespace - would let you set a fixed namespace for the hook to get executed in
  executionNamespace: example-namespace-name
  type: ReadAndWrite
  image: "docker.io/securecodebox/notification"
This fixed namespace mode has been added to the ClusterScanCompletionHook but not for the ClusterScanType for the following reasons:
- Hooks are usually used to interface with 3rd party systems which are often standardized for the entire company. Allowing to run them in a namespace separate from the individual teams / scans allows the hook to access these systems without having to distribute and thus potentially compromise the credentials for the 3rd party systems.
- In contrast to scans, hooks usually do not need to send requests to the scanned applications. Scanning an app from outside its namespace opens up a lot of potential issues with network policies blocking inter-network traffic between scanner and app. As the hooks only need to access the S3 bucket and the 3rd party system they are interfacing with, this is not an issue for hooks.
Changes to Existing Helm Charts
Existing charts for scanners and hooks should get a clusterWide boolean value which is false by default.
If set to true the scanner / hook should then be installed as cluster-wide resources.
The following (not so) edge cases should be considered:
- Scanners and Hooks with namespaced resources (config maps, e.g. amass, zap-advanced...) should also contain the clusterWideparameter, when installed with it the installation should fail with guidance and links to a (non scanner specific) doc site which details on how a namespaced ScanType can be adjusted to as a ClusterScanType, e.g. on how to set up a init container to compensate the missing ConfigMaps, see Issues with ConfigMap / Secret Mounts.
- When a hook is installed in clusterWidemode and theexecutionNamespacebeing set (should be exposed as a value in all hook helm charts), the install should fail if the helm install namespace !=executionNamespace. This should ensure that all namespaced resources (ConfigMaps, Secrets) are present in theexecutionNamespace.
RBAC Concept
- The RBAC System / guidelines for existing CRDs stays the exact same.
- The newly added ClusterScoped CRDs should be read only to normal cluster users by default.
- Only cluster admins should be allowed to create, edit, patch and delete the cluster scoped CRDs
This ensures that the security model remains intact as only cluster level users should be able to interact with resources shared across all resources and makes sure that it can't be used by attackers to escalate their privileges into other namespaces.
Decision
We use Proposal 3, because:
- The strict separation in Proposal 2 makes the system hard debug (pods spawn randomly in your namespace and you don't know why).
- RBAC becomes more difficult in Proposal 2
- Currently, jobs have a child relationship with the underlying scan. This would have to be broken in proposal 2 (as the child relationship only works within a namespace), leading to more work for the cleanup. Prop 1 and 3 also have this problem, but a lot less so (only for scan completion hooks with fixed namespace mode).
- Proposal 1 can also lead to hard-to-understand failure modes if unexpected ScanTypes are used (why can the teams own ZAP installation break the global ZAP installation of autodiscovery?)
- Proposal 3 allows better visibility and debugging on failing scans
Security Considerations
- RBAC needs to be non-broken :) (we assign write rights on global types only to admins by default). Being able to change who can modify cluster-wide resources is equivalent to full read and execute access to the entire cluster.
- Cluster*resources can be inspected by anyone, don't put secrets in there (if you need to keep a secret for a hook safe, use the target namespace mode of the cluster scan hook)
Consequences
The only part which isn't done as described on the ADR is that the executionNamespace for ClusterScanCompletionHooks is missing. We had some major problems integrating it:
- Because the ownerReferences can't be set across namespaces the operator might not get notified of changes to the hook so that updates / hook completion could be missed by the operator
- The hook sdk fetches the Scan on startup to make it available to the hook code (e.g. to read annotations from the scan CR), this doesn't work nicely with the executionNamespace because the service account with which the hook is running doesn't have the required access rights to fetch the scan from another namespace. To fix this we'd need to create ClusterRoles for these automatically which we didn't want to do as this might have effects on the RBAC model that we didn't anticipate previously.