This post has been inspired by both the personal experience (some painful lessons learned included) and also bunch of articles on the web, e.g.: one of the most recent ones - https://thehftguy.wordpress.com/2016/11/01/docker-in-production-an-history-of-failure/
We all want to be the beneficiaries of rapidly changing model of software lifecycle. Openness (open technologies, open APIs, open source), short release cycle, small increments, rapid adoption of disrupting trends - we all realise that these all (if applied properly) can have a tremendous impact on our products' / services' competitive advantage. Well, and sometimes we just go with the hype flow as well :)
But once we get really excited with all these visions & potential gains, we tend to forget the price that comes bundled together with bleeding edge technology. The price that in some cases may be very high - the actual threshold to overcome may turn out to be too high for our current capabilities.
Bleeding edge of tech disruption is double-sided. And this is what this blog post is about.
You will break things, if you want to move fast
If you've read the article mentioned in this post's header, you know the story already:
Company X decides to embrace game-changing bleeding edge tech (let's call it BET, for the sake of brevity), makes some PoC(es), tries first production deployment(s), slowly scales up/out & at some point tries to perform a version upgrade (of this particular BET) - either because of new capabilities, support requirements or just due to some bugs that are supposedly fixed in the next version.
And that's the moment when shit hits the fan. Splendidly. APIs / interfaces / contracts have changed, performance characteristics have changed, internal format has changed (& sort of conversion is required), new behavioural / temporal coupling has appeared (& you've learned it by chance, because it wasn't in changelog) - etc. Team spends a long of time & effort to re-stabilise the situation (refactor code, convert data, re-integrate services), but ...
- official docs are not yet updated ...
- some new bugs have appeared in the new version (that have not been present in the previous one)
- there's no smooth transition path if you want to keep a viable rollback option (because you're running a HA system & you realise that these are your own balls at the stake ...)
After some time & several buckets of poured sweat, tears and blood ... upgrade is performed successfully. Just in time to find out that ... your issue (the upgrade's actual reason) is not fully fixed, so you have to perform ANOTHER upgrade & this time it's from the non-release branch (beta or sthng) & you better do it fast, as current binaries are not really 100% stable, you know ...
My advice on that:
Sorry to say, but the above doesn't mean that the product is broken or its creators are some losers. Modern, disruptive tech has to develop with certain (very high) velocity to remain at the top. And the price of high velocity is paid with the breaking changes, large scale re-factoring ("re-architecting") decisions & increased effort on low-level maintenance: understanding these implications is crucial if you're to be an early adopter.
Modern products are never "done-done"
Some may try to out-smart the system by waiting for the "big releases", "stable milestones", "enterprise-ready versions", while in fact ... these do not really exist anymore, at least not for this certain class of products (BET).
In fact I have this impression that modern software is in perpetual state of never-ending beta testing. Features get released in quite an unbaked state - rough, MVP-ish, aimed to probe their usefulness & they mature up as a side effect of being production-tested by their first users (earliest adopters). Before they are considered rock-solid, stable & such, there's already another layer of new functional "icing" or just some more, "parallel" features - in fresh, unproven, "beta" state.
This observation applies to versions marked as RTM, GA, "final", "release", etc. - I'm not referring to nightly build, pre-release, release candidates or official alphas/betas.
My advice on that:
My key output here is that waiting for big round version numbers doesn't really change much. It's rather fooling yourself that you'll get a version not affected by other points mentioned in the blog post, while actually the releasing strategies of modern software products are very fluent, continuous and really fine-grained.
So if you're really determined to use some particular BET, don't wait indefinitely, start your experiments early & make sure you have enough time to learn your lessons before you go all-out (commit yourselves too much).
Not everything will work out - learn to assess the sunk cost
Never fully bet on one single piece of software product you have a limited prior experience with. Always have a backup plan (alternative solution), frequently validate whether you're stilling following the correct targets (chasing business goals, not struggling with complexity of technology to fulfil your own ambitions), set the clear rules on when you give up on particular BET.
No-one likes being wrong, no-one likes making a bad decision. But there's one thing far worse than f#$%ing up -> to trudge endlessly in the depths of madness, because you just can't allow yourself to give up on the cause already lost. Some solutions just don't work in particular scenarios. Either because of technology itself, people, environment, other limitations & combinations of all mentioned.
Shit happens. And sunk-cost fallacy is actually one of the most frequently ignored mistakes we make in software delivery.
My advice on that:
Make sure that you've calmly assessed the situation, got as close to understanding (& properly locating) the root cause & adjust the course (using the knowledge you've gained in the meantime). As Lean Startup teaches us: "Persevere" is on option, but at some point "Pivot" makes much more sense.
DIY in OSS
Many people seem to be missing the point of OSS (Open Source Software). As a user of OSS technology you're not really a "client" - someone's who's demanding & expecting something from the product he paid for (in whatever form). You're actually becoming a member of community of people who are bound by a specific social bond that is about collaboration on an initiative they are all in a need for, but can't build it separately (or it's a bad idea for whatever reasons).
This "social bond" thing is actually very important - you're supposed to contribute in this form or another. Missing functionalities, functional gaps, inconsistencies - these are not "flaws", "errors", "deficiencies", but rather open opportunities that wait for you to "cover your obligation" for using this particular BET.
At some point I had difficulties with fully embracing this fact - e.g. I was trying Jest (test framework for React - https://facebook.github.io/jest/) & I was really pissed off when I've noticed that its Windows tooling was very immature & feature-stripped.
"WTF Facebook?! Can't you even build a proper product? Why are you wasting my time with that really?"
Obviously it was a reprehensible and immature way of thinking, not something to be very proud of. It's clear that my anger was caused by a lack of feature I needed (& I needed it NOW, ... I mean - THEN), but reasonably speaking ...
My advice on that:
... OSS BET products are not free - their price is just expressed in a different currency. You pay for them with your own development effort, learning their internals, self-support, contributing to the product for the common good of the whole community using this BET. OSS BET is NOT a homing missile, it doesn't follow the "fire & forget" rule.
You're (quite likely) not prepared
Betting on bleeding edge is a risky game. The prize may be very high & it's OK to go for it, but it's foolish to ignore the fact that a lot of pain will be involved - having everyone fully realised that (so that whole unit / team commits to the decision & is up for the consequences) is something I find absolutely crucial.