Let's daydream a little bit:

You build stuff in a work environment of virtually endless repertoire of ideas to be implemented. Your work is all about new features, extensions of existing features, feedback-based improvements of sub-optimal features, etc. Obviously, it's not that each week you work on something totally disconnected; however, the clue of your contributions is to create more & more new, shiny stuff your end-users will love.

High velocity, creative spree (that we all love).

So you've built a meaty chunk of a feature, including a nice suite of automated tests, all the associated deployment tooling, suitable monitoring setup, carefully crafted test scenarios & data (for the sake of exploratory testing). According to the lessons from more "physical" kinds of engineering, as long as you don't modify existing (already completed) behavior, your job here is done - you can switch over to the new stuff. Apparently, such a routine can be repeated indefinitely, making you a fast & efficient feature delivery machine, expanding the system's functionality for the greater good (or just shiny "dinero").

Sorry to disappoint you, but it doesn't work that way.

The "hidden" cost

We may not explicitly split code-work into "new stuff" & "maintenance stuff" anymore (as we did in the 90s or early 00s), but ALL the existing code (regardless of whether it's being modified now or not) has its cost of running (and no - I don't refer to infrastructure costs here ...). And this cost gets incrementally higher as the general code-base grows (with several additional factors to be taken under consideration as well).

Why so? What does belong to this cost? And is it *really* unavoidable?

First of all, the conditions you're running the code under just after its initial deployment are not static - they do change over time:

  • performance limitations & bottlenecks need some volume (of data, of traffic, etc.) to pop up (& hurt ...) for real
  • it easier to underestimate the vulnerability in the completely unknown platform, but once it gets some market attention ...
  • complexity tends to stack up, potentially rendering previously valid business logic either vague, unambiguous or simply incoherent ...

Second of all, there are very few functionalities that have no dependencies at all - the issue may not be about the component itself, but about a (potentially cascading) problem with services it relies on.

Thirdly, the mature approach to running software ain't about being re-active & entering action only once the fire is already roaming everywhere around. Experienced engineers build the observability into the product, monitor the metrics, analyze the trends, extract insights out of available data & pro-actively address the forthcoming issues, before anyone gets impacted.

At last but not least, business needs no know no limits, nor follow any design patterns. A new, critically important idea may be conceptually "perpendicular" to the existing solution model - because some previous assumptions or domain invariants have just been annihilated. That's how life is and you can't help it - remodeling / refactoring should follow to adjust running code to ever-changing business reality.

You build it, you (should) run it

Many of these considerations require a high level of awareness - the one that usually comes with explicit ownership. If you're an owner of a service and have a sufficient capacity, in your own interest you should fix the potential problems before they turn into actual problems. This is a part of the famous "you build it, you run it" approach where each team is end-to-end responsible for what they own, so they have a great incentive of dealing with issues before they got waked at 3 a.m., because of production outage in their area ...

Another thing many people tend to forget - the range of what team could own is limited: by cognitive load, by team's expertise, by expected (by the organization) ratio of "new stuff" VS "maintenance" work, by the product's quality and the level of automation, by the tightness of coupling & resilience of individual components. If ...

  • ... the team has too much on their plate ...
  • ... the team doesn't feel the consequences of their decisions (because there's someone else on support duty) ...
  • ... problems can't easily be located & assigned to component/area ...

... the team will naturally avoid investing efforts in any kind of mid-/long-term maintenance, as it simply doesn't pay off (fairly) - from their perspective.

Running ahead without looking back

But what exactly are the consequences of ignoring the fact that software maintenance cost/effort ain't zero?

The answer is simple:

  1. Deteriorating quality; nothing is given forever in software engineering - all the factors mentioned above demand efforts aimed to adapt already existing software to the new conditions
  2. Switching operating mode to risky, re-active one; things work well until they do not (anymore) - it takes them some time to cumulate, but once you cross a certain threshold, you may end up in a dreadful fire-fighting loop
  3. Accumulating debt (of all sort); with all its consequences I've described in so many other posts: exploding complexity, dropping velocity of change, multplicating dependencies, etc.

How much is enough?

How much maintenance is needed to keep the quality high enough? There's no single answer - it depends on what's your baseline level, expected quality standards, code-base size, platform architecture, etc. What is more, the effort depends on how much effort you've put in the past - the negative effects accelerate in the times of negligence.

For several years, my idea to deal with maintenance cost was to always look for the (preferably) monetary value comparison - to spend time on what would provide more value to the end-users: either a feature or some maintenance work (compared "head-to-head"). Such an approached seems to make a lot of sense & it's based on the essence of agility, but ... it misses all the pro-activity factor I've brought up above.

The team that owns the particular piece of software should have enough "slack time" reserved to probe, challenge, assess, stress & tune (based on collected feedback/data) their work products. This investment fluctuates over time (e.g. "waves" of investment are separated by relatively calm time of collecting the data/passive monitoring) and should be estimated empirically (as it's highly contextual).

Share this post