I am yet to see a company in full control over its technical & model debt, with long-term stable development agility and the unshakeable ratio between new development & maintenance work. Usually, it's an inevitable downslide with a varying speed & acceleration. Development gets slower, maintenance gets more cumbersome & resource consuming, problems multiply, intertwine & spawn new generations of meta-problems ... Avoiding such a gradual downfall requires skill, patience, common understanding (I was lucky enough to witness & participate) & ... usually works only short-term, until leadership (official or thought one) changes or ever-changing business reality screams for bloody sacrifice.
Different kinds of companies deal with that in a varying way - it depends mainly on their culture & dynamics.
The most ambitious & modern-thinking ones usually try to outgrow the problem: they invest in a massive increase of manpower to accelerate development, so their causative power remains higher than their inertia (yea, this can't run indefinitely ...). It kinda ... makes sense - the issue is postponed until the moment when it's sweetened with a decent market-share and valuation.
The more enlightened ones create software in a rewrite-friendly way (Will Larson has written about that in his latest book - "The Elegant Puzzle"): once debt (of whatever sort) gets too high within a component or a sub-system, it's getting scrapped & re-written - just the problematic part ofc. This requires a very well-balanced approach to architecture: knowing where to be rigid & where to grant autonomy & flexibility.
Wrong way
The majority however is lost in a spiral of death: spur, struggle, despair, spur, struggle, despair, spur, ... What does it mean? Once in a while someone gets frustrated enough to raise above the voices of the disgruntled to appeal for the n-th time:
"Guys! We have to do something, it's not bearable anymore!"
After the initial wave of enthusiasm (& even gestures of encouragement from top mgmt), a continually decreasing group of volunteers loses their zeal - the mundane reality smashes their naive hopes ... They lack the environmental support, new layers of the problems keep piling up and nobody knows where to start. And to be honest - for all that tedious & ungrateful work - there's no glory to be taken, just the risks to be accountable for. Next time they will think twice before doing anything pro-actively, one should rather expect a sarcastic comment than a gesture of help from them.
Increase your odds
But does it have to be that way? How can we at least increase our odds in the fight for long-term technical excellence?
I'm not going to give you an accurate solution (because there's none that would fit all!), but at least some hints or indicators you should be paying a lot of attention to.
Rule One: Negative Delta
First of all - if you're burdened with an excessive amount of crap on your shoulders first thing you should assure is that more stuff is being SCRAPPED than ADDED. Sound freaking obvious, how many companies actively deprecate, withdraw and decommission their products/services?
Rule Two: No Open Fire
Before you start any serious effort - make sure you're out of fire-fighting mode (e.g. daily outages, unstable environments, excessively buggy apps that affect business). By whatever it means: re-allocating people temporarily, investing in external help, extinguishing the fire with dollars (e.g. buying more infra, hiring more people).
Rule Three: WIP of 1
The next step is to secure (exclusive) capacity - when certain people are working on goals related to technical debt, they should work ONLY on that (otherwise "more important" business work would topple the priorities). Their goals should be precise (even if actions to reach them are not), efforts time-bound, activities transparent & feedback inevitable.
Rule Four: Talk is cheap, show me the numbers
What does it mean that goals have to be "precise"? One can't deal effectively with the debt if it's not known (measured)! Be precise when it comes to what you're struggling with, e.g. how many ...
- ... dependencies are to be upgraded (of any kind: starting with OS, ending with build-time ones)
- ... API contracts are missing metadata descriptions
- ... services are not sufficiently covered with monitoring mechanisms
Measuring has several advantages - it exposes small victories first (think morale!) & it illustrates the trend, so the long-term goals do not look so unrealistic anymore.
Rule Five: Consider thrice before building anything
We haven't done anything yet at this point, but we need to consider yet another topic - maybe some products/tools/services should be scrapped w/o re-writing. Not because they are useless (it would be obvious then ...), but because they are NOT a part of your core domain (particular problems that are what your business is all about) and by solving them yourself you're NOT getting any competitive advantage.
This is called a "build-VS-buy" dilemma and it's all about fooling yourself - that it's gonna be cheaper, better and faster if you create & maintain some auxiliary (out of your core domain) app instead of paying someone who specialises in that. The reality is brutal - it never is (cheaper, better or even faster).
There's a very dangerous, particularly vicious sub-genre of this issue - the belief that "we're unique and something that has worked for others won't work for us" (aka NIH syndrome). People lured by that kind of thinking tend to customise literally everything - usually gaining nothing & paying a quickly accumulated price in future.
Rule Six: Unlock capacity first
Companies suffocate not because of the technical debt, but usually due to how it constraints their development agility - small advancement in feature development suffers from so much waiting time, so many loopbacks, inefficiencies, distractions & other energy drains, that doing anything (incl. fighting the technical debt) drags mercilessly ...
That's why the smartest way to start is by addressing the quick-wins in the area of development agility: the more capacity you free, the faster you can progress further. These are the topics you should probably look at first:
- build time (incl. dependencies & redundant compilation)
- everyday dependencies between teams
- frequency of change integration between collaborating teams
- stability of test environments & the repeatability of deployment process
There are no shortcuts.
The only way to succeed in the field of long-term technical excellence is to swallow the sour bile and plow through the issues - one by one. It's far harder to do than to write about it (been there! done that!), but it's also the most promising (in terms of probability of a meaningful success) approach (in spite of all!). All the others ways are some drastic form of ... revolution. As the history taught us: revolutions are hardly predictable and always take their toll.