I recently got asked about best practices for dependency management by an old colleague. In response, I wrote the following which I realized might be useful to a broader audience.
So … things like github’s new notification on this have been automated via platforms like CodeClimate or via Jenkins internally at dev shops for a long time using tools such as:
- Bundler Audit (Ruby)
- Safety Check (Python)
- Dependency Check (Java)
Generally, we write these checks into a build so that the build fails if there are vulnerabilities in libraries. We have contributed to an open source project called Glue that makes it easy to run tools such as these and then push the results into JIRA or a CSV for easier integration with normal workflows.
Note that there are also commercial tools that do that ranging from Sonatype to BlackDuck to Veracode’s Software Composition Analysis. I generally recommend starting with an open source thing to prove you can do the process around the thing and then improving the thing.
At a higher level, I pretty much always recommend that companies adopt a tiered response system such as the following:
- Critical – This needs to get fixed ASAP, interrupts current development and represents a new branch of whatever is in Prod. Typical target turnaround is < 24 hours. Examples of vulnerabilities in this category might be remote code execution in a library – especially if it is weaponized.
- Flow – This should get fixed in the next Sprint or a near term unit of work, within the flow of normal development. These types of issues might need to get addressed within a week or two. Typical examples are XSS (in non major sites, major sites treat XSS like an emergency).
- Hygiene – These are things that really aren’t severe issues but if we don’t step back and handle them, bad things could happen. If we don’t update libraries when they come out with minor updates then we get behind. The problem with being far behind (eg. 3 year old JQuery) is that if there is an issue and the best fix is to update to current, the actual work to remediate could involve API changes that require substantial development work. So philosophically, I think of this as being the equivalent of keeping ourselves in a position where we could realistically meet our Critical SLA (say 24 hours) on updating to address any given framework bug.
An important part of this is figuring out how a new identified issue gets handled in the context of the tiering system. In other words, if a new issue arises, we want to know who (a team?) gets to determine which tier it should be handled as. We definitely do not want to be defining the process and figuring this all out while an active major issue looms in the background.
Of course, with all of this and particularly the hygiene part, we need to be pragmatic and have a way to negotiate with dev teams. We can weigh the cost of major updates to a system against the cost of keeping it running. Planned retirement of applications can be the right answer for reducing overall risk.
Ultimately, we want to translate the risk we see with this into terms stakeholders can understand so that they don’t balk at the hygiene work and they are prepared when we need to drop current work to accommodate critical updates.