In this post you'll find: micro-monolith neo-evangelism :), great (video) example of premature system distribution hitting the fan ..., demystifying of microservice independence myth and what are X-centric systems.
I hate blog posts, books or conf talks that focus on criticising only - it's far easier to b*tch on something than propose an actual solution to non-trivial problems. That's why after my recent post (on why microservices may not be an optimal answer to all of your growth problems), I'd like to re-visit an architectural pattern almost as old as Computer Science in general, in fact, a very under-appreciated one ... Sharding.
It's not a single answer to all scalability problems, but in my case, I treat it as one of 5-7 key techniques (altogether with e.g. CQRS, capability-based aggregates, open standard APIs, ...) I usually use to build future-proof solutions.
What is sharding?
Sharding is an idea of scaling the solution out horizontally (into separate processes) not by splitting solution by functions (responsibilities), but into micro-instances that do contain the same functionality, but process 1/X of homogenous traffic volume (using 1/X of data). Sharding doesn't cover only the split of stateless processing (basic service redundancy) but most of all - of data used by the shard exclusively.
Key principles of sharding:
- avoid having shards communicating with each other (except shard balancing)
- make it possible to adjust a range of data covered by a shard
- partitioning algorithm (how to distribute data across shards) is scenario-specific
At the first glance sharding may not look very appealing:
- where's the decoupling?!
- where's the independent functionality deployment?!
- what about aggregations (e.g. reports) over full data volume that can't be sharded?!
These are all valid concerns - no solution is perfect & by moving from the concept of microservices to sharding you'd trade certain problems & challenges for different ones - still, a good deal in this case (IMHO), let's get into the details to prove this thesis ...
Myth of independence
Real-life microservices are independent only at the lowest level (atomic business capability) - these are just assembling pieces for normal, real-life business interactions (that frequently do involve several capabilities for each operation that is 'atomic' from the end-user perspective). If your system has a complex domain and involves a lot of reads (which are synchronous by their nature), that's what can happen:
TL;DW (Too Long; Didn't Watch) - excessive(& out-of-control) number of x-process calls (needed to serve a single request!) adds more & more latency, quickly exhausting connection pools & making the whole system unresponsive.
When sharding, communication between capabilities (services that are not separate processes, but modules of a well-structured monolith) is intra-process. This reduces the latency to minimum.
Do I need independent scaling for capabilities?
With microservices it's simple - if process X serves N times higher volume than process Y, you can slap N times more instances for X than for Y - voila. And it makes sense (except the fact that each process call has to go through a load balancer ...). What can we do in case of sharding?
When sharding, the idea is to distribute the domain (& data, & processing - as a consequence) across the nodes in a way that a total load (regardless of request type) on all of them is (roughly) equally balanced. If you can do that (design the split algorithm, maintain & monitor the balance, add more nodes on the way - if needed, etc.), you're perfectly covered - there's no point in worrying for differences between traffic on particular functionalities (capabilities/"services").
And in fact, you get some "bonus" gains: like A/B testing ability by design (you can update 1 or few nodes while the majority is still waiting).
Is splitting into shards a big deal?
As you can easily deduce out of previous paragraphs, the effectiveness of sharding relies on the successful implementation of:
- domain (/data/traffic) split (AKA partitioning)
- routing request to the correct shard(s)
Both of these are heavily scenario-dependent - there's no single optimal algorithm, no single perfect approach, in fact in many cases you may be forced to apply more than 1 partitioning algorithms for 1 system (e.g. 1 for core OLTP, 1 for reporting).
To design your partitioning algorithm you need to truly understand your domain:
- how will the traffic look alike (in time) / how will key entities scale up with business growth?
- what are the most common usage scenarios & what transactions boundaries will these have?
- will the platform be read- or write-heavy? will the read & write models be similar?
- for your key entities (aggregate roots for your key transactions) - what would be the best potential split criteria to distribute the traffic equally - geographical, alphabetical, round-robin across sequential identifiers, recency?
- if there are needs for any aggregations, can these be pre-calculated (at least partially)? who & how often will use these aggregations?
- which data has to be 100% online (in real-time), which data can be stalled & for how long?
I won't cover here all the potential considerations, BUT based on my past experience 95% of domains (& models, & applications) can be classified as X-centric, while X is some sort of particular key business entity.
I've seen systems that were: contract-centric, company-centric, user-centric, product-centric, location-centric, ...
For X-centric systems all the major transactional data (obviously excluding business configuration, audit logs, etc.) is somehow (directly or indirectly) linked to X. As a consequence each user (or narrow user group) operates across many tables (/object hierarchies), BUT all of them are in a single subgraph with one single (or few - but the list is strictly bound) entity X as its root. And yes - this makes X is the perfect candidate to rely partitioning upon.
Let's illustrate that with some examples ...
Next post in the series can be foudn here.