"Doveryai, no proveryai." ("Trust, but verify.")
Ronald Reagan (to Mikhail Gorbachev)
Developing an enterprise-level software product is never easy ("thanks, Sherlock!") & clearly doesn't depend on pure programming skills only ("now you tell me?!"). That's why we have all those "processes", "activities", "methodologies", etc. Their shared (& sometimes indirect) purpose is to control whether things are proceeding in a correct direction, with a suitable speed, without excessive risks, etc. We all know about current trends in making them: lighter, leaner, based on direct interaction rather than formal documentation hand-overs. No-brainer.
But this time I'm going to focus on a different aspect:
Timing & synchronicity (in processes)
Let's start with an example:
You're introducing code peer review in your team(s). You realize that sometimes a piece of code may fail the peer review -> that's the whole point of it, isn't it? Such faulty piece of code shouldn't get beyond the review step (to test environments, shouldn't get merged into trunk, etc.), so you're facing a tough decision between the following options:
-
Peer Review should be a BLOCKING step in development process - piece of code shouldn't be shared with other programmers, get to continuous build server, etc. until review is done
-
Peer Review should be ASYNCHRONOUS & NON-BLOCKING step in development process - source code is committed directly to repo, gets sucked into CI loop (& further - to test environments) - all that without waiting for peer review outcome
First option is more secure & gives more control over what's happening, but it will quite likely create bottlenecks & introduce delays - some of them may make sense, but honestly: majority of commits will pass the peer review & they will still suffer because of this pipeline. So, the impact will affect whole continuous feedback & integration loop.
Second option allows faulty code to get promoted further, but assumes that things that could seriously break things down will get filtered out in automated CI loop anyway. Obviously this approach requires introducing some additional control mechanisms (like a periodic reporting that would find commits that didn't get a review at all), but as long as automated testing is OK, you'll keep development agility high, without affecting quality.
Code peer reviews is just one example of blocking VS non-blocking choice. Here are some other ones:
- production release approvals
- design / plan / estimation validation (for instance - in change management process)
- incident / problem distribution
Transparency & traceability are the key
It shouldn't be a surprise that I vouch for the second (asynchronous) option. That's the one that promotes:
- responsibility delegation
- focus on operational agility & speed
- fighting the bottlenecks
- simplicity
If you're not convinced yet, let's get through another example - hotfix release approval:
There's no doubt that messing with production environment ain't a joke - this is about potential impact on actual products & services: quite likely the source of income & the very heart of business domain of the company. But what do companies do to deal with that burden & responsibility of hotfix deployment approval & acceptance? Usually they put it on executives' shoulders (single or committee), because the fact that executives are usually accountable in general (& easy to draw the consequences from ...) is more important that they are not able to assess (in person) the actual risk & impact of introducing the particular change (as executives they are more or less "content-free").
In other words, executives participate in the approval process, but except from delaying it, their only input is to make a decision 100% based on consultation with people who MADE the actual change ... Doesn't it sound like a waste of time?
Why not (instead of that) invest this time in making whole process 100% traceable & transparent? Like that:
- no anonymous commits
- only CI server-built components in deployment pipeline
- each commit with a mandatory link to a SIR (in incident tracking tool)
- "stop-the-line" automated regression testing loop
- dev him(her)self assesses risks (probability, impact, consequences of denial, etc.) & costs (service unavailability, ops effort)
With all this information you're able to validate & verify each decision, without slowing processes down. This way you're making consequences of mischief / error inevitable (that helps with keeping people awake), off-load "queues" (when things get pretty intensive) & remain flexible (you can postpone the decision whether you validate 100% or 50% of cases, etc.).
Async all the things! What coo wrong?uld possibly g
— Simon Brown (@sjb3d) November 28, 2013