"If we make an effort & automate our deployment, running tests, static checks, monitoring, ... basically automate whole delivery pipeline in a proper way (so it really works w/o manual interactions) - how do we preserve the knowledge about HOW it works? After half a year or more, if something breaks we may not be able to fix it w/o reverse-engineering ..."
This has popped up few weeks ago during one of the conversations with some other engineers. Yes, it's quite an interesting issue & this guy(ette) has clearly made a point. But doesn't it really apply to any other kind of code that operates in production flawlessly for a longer period of time? OK, delivery pipeline automation may be really complex & it's tricky to test, but so can be business code. So my answer is - apply the same, good engineering practices you'd apply to sophisticated business code:
- thoughtfully structure all the automation code: e.g.: pay attention to naming, avoid "blackboxish" configuration-by-clicking tools (that are harder to abstract out the "interaction API"), dissect automation code by particular function (by following Single Responsibility Principle)
- automate validation (tests) of your delivery pipeline - various "Infrastructure as Code" tools have proven that it's possible
- apply key living documentation principles: e.g.: use code-level metadata to keep documentation together with your code (Doxygen/Javadoc style)
- follow (for automation code) the same engineering excellence practices you normally apply to any other kind of code: like source code versioning, code peer review or pair programming
This is really that simple (or that hard - for teams who can achieve this level of technical proficiency even for "standard", business code). In fact in many cases we're dead scared of automation code because it's in objectively poor shape:
- it consists of ever-growing, linear shell scripts (always in haste, never refactored)
- it's torn apart between several tools, standards, machines - as configuration, scripts, compiled applications & whatever else you could imagine
- its history (& logic behind all the changes) is completely untraceable (unless you want to go through team's mailboxes history)
Conclusion?
Automation is just a (necessary!) beginning of the journey. Doing it the right way requires a lot of (continuous) effort & has to start with an important shift in a thinking about automation code: it has to be treated as 1st class citizen among your team's work products.
Pic: © bakhtiarzein - Fotolia.com