ADR-0007: Proposal How to Mark Findings With Hashes to Find Duplicates
| Status: | DRAFT | 
| Date: | 2020-11-25 | 
| Author(s): | Sven Strittmatter sven.strittmatter@iteratec.com | 
NOTE: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
Context
The secureCodeBox should support the possibility to find duplicate findings easily out of the box. One use case is that a user want to accept a existing finding and want to ignore the same finding in the future.
Assumptions
- The execution order of hooks is unspecified.
- The information if a finding’s hash is a duplicate MUST NOT be stored or maintained in the SCB S3 storage.
- The SCB MUST NOT remove findings: read-write-hooks may alter them, but never delete or filter them out.
- Maybe a read-hooks MAY decide to not store a finding into an external system.
 
Decision
- We generate a hash for each finding, so we can compare findings by the hash and identify duplicates.
- This hash MUST be mutable and MAY be altered by read-write-hooks because we don’t want to introduce an exception to what a read-write-hooks can alter.
- The parser MUST generate the initial hash of a finding from some of its attributes (e.g. name, location, category …).
- Each scanner MUST define a default set of attributes used for the hashing.
- This set of hashed attributes MAY be overwritten.
 
- Each read-write-hook MUST update the hash as last step because the hook MAY change a hashed attribute.
We implement the hashing step in the parser first with feature flag to evaluate this proposal.
Consequences
- We don’t need to introduce an ordering for the read-write-hooks.
- The duplicate detection/handling MUST be done in another service with its own data storage. This is because we have no stable hash until the read-hooks will be executed and these MUST NOT alter the data in SCB itself. But the read-hooks MAY decide to not store data into an external system.