In this post you'll find: how sharding can work (well) together with CQRS, what tricks can be used for re-partitiong data in live instances (w/o downtime) and why it's OK to use domain-related data in routing on API gateway

This is the 2nd post in the series. The first post can be found here.

Real life, hello?

Let's consider 4 sample scenarios:

SaaS HR system - used by various client-companies for their internal operations
On-line marketplace - hundreds of vendors, hundreds of thousands of buyers
Live-traffic map - geospatial data is collected from vehicles, anyone can watch (regardless of location)
Reporting system - with an ability to adjust reports, drill down, etc.

Each of these specific types of app requires a different approach (regarding partitioning):

In SaaS HR system each company is practically a separate tenant - this system is clearly company-centric (each company can have its data on a separate node (+ HA replicas, etc.)). Obviously, different companies can generate different traffic (due to their scale), but this is a minor complication.
On-line marketplace - this scenario can be a b*tch :) Splitting by vendors is controversial (users look for an article that may be delivered by many vendors), splitting by any of user's properties doesn't have any sense. What is important is recognition of search's importance - when user is browsing a single vendor/item, such data can be retrieved from shards split by vendor, but offer search should rely on a separate set of "front-line" instances, with their own data organized like any search index - by the keywords/keyphrases. Search data should be populated (as a read model) asynchronously each time offer is updated (on a vendor-split node).
Live-traffic map - in fact quite similar to the previous case, event collection (write-only) can be split by some user-specific attribute (even round-robin for sequential id), while read model (updated as a consequence of event processing) should be geo-based (but ofc area sizes cannot be equal - some areas are naturally more popular than others).
Reporting system - OLTP is OLTP, OLAP is OLAP. If you're building a dedicated reporting system, you should design a dedicated data model for that. Star schema, dimensions, measures, pre-calculated aggregates - some scenarios do have a dedicated partition criterion (e.g., cost center), some can be optimized to such a level they don't need sharding, some do not have to be 100% real-time, so they allow partial data aggregation as report post-processing step.

(Re-)balancing partitions

Designing a partitioning algorithm is crucial, but wait ... there's more (to do). Initially, it's OK to go with well-balanced X instances, but as traffic increases, you'll be forced to add more - what then? Re-distributing data in a cluster may easily clog the network if implemented too recklessly ...

Fortunately, there are several tricks one can put in her/his sleeve ...

live-replication of data first, traffic split rules (routing) are switched once data is duplicated, processing on the node is not bothered by unused data
halving nodes - affects only 1 existing & 1 new instance (yes, this trick has a limitted applicability for a large server farm ...)
object databases that do keep aggregates as objects (/or tree-like hierarchies of objects) simplify cross-shard transition a lot (because you're just moving uniform blobs, not graphs of relation-bound rows)
many eventually consistent no-SQL databases actually provide sharding on the database level, so manual data transition would not be required at all (DB engine itself covers it) - here's the example

Navigating among the shards

Needless to say, you need to send the request towards correct shard - this inevitably leads to increasing the complexity of the overall solution: some sort of (smart) Load Balancer/API Gateway (between consumer applications & sharded endpoint-exposing micro-monoliths) is required.

Many authorities advise against putting too much logic into gateways (e.g. Thoughtworks Tech Radar vol.18 - Platforms - "Overambitious API gateways"), but I find metadata-based routing something totally acceptable for API Gateways (or "really smart" LBs).

What about the fact that routing is based on domain property (quantum of business information)? In this case, it's a conscious architecture design decision, not a temporary hack. "System is X-centric" is also domain-related fact & it sets some architecture limitations.

How would I implement it? My API Gateway:

would use a run-time configuration in HA KV store like Consul
configuration would be loaded in start-up time (of gateway instance) ...
... and on demand when the gateway is "pushed" by authorized party calling a dedicated endpoint ("force refresh")