Serverlessness: what makes the service serverless

In two previous posts, we've covered the nature of "serverlessness" & how it's implemented in stateless, raw-compute services. Now it's time to raise the bar ...

Speaking about stateful services, let's start with something familiar to nearly everyone - transactional, relational, ACID-compliant SQL-based database. In our case, it will be Aurora Serverless v2. How "serverless" is it (apart from the name)?

First of all, such DB typically consists of two major "building blocks" with different characteristics (when it comes to scaling, fail-over, resources involved, etc.): compute and storage; they should be considered separately.

Storage (in ASv2) doesn't require manual pre-allocation, and neither it's visibly split into separate "volumes" (physical or logical). It grows/shrinks automatically - depending on how much data you actually store. Are you paying just for the storage you use? Well, it depends on how you define "use" - because you are charged for all the data you keep, even if none of your queries read that data. If you ask me, it makes perfect sense - the underlying disks are occupied (by you), which means they can't be used by anyone else (until you free the space). You're also paying (separately) for the I/O, but this one doesn't appear controversial at all (fixed price per million operations).

What about the compute "component"? It should be more ephemeral, shouldn't it? The initial impression is similar to Fargate - there are some "virtual" units (ACUs) that are bound to a given amount of RAM and corresponding CPU (linear dependency). However, there are some twists:

You can't scale down to zero (min ACU is 0.5).
The granularity of scaling up and down is not bad (0.5 ACU), but the maximum capacity is 128 ACU (each writer and reader is scaled separately).
Autoscaling (both up & down) is fully automated, based on internal Aurora metrics.

The consequences are not hard to grasp: you pay some money even when your instance appears idle (you don't run any queries); due to automated auto-scaling, your costs are never entirely predictable (unless you set up very narrow ACU limits); hard ACU limit on scaling up means that at some point you may need to shard the DB on your own or re-architect it in some other way.

Where are these limitations coming from? Architectural details of strongly consistent relational databases. For instance, DB runs various processes that do not service your queries directly but are still essential: analyzing est. costs of operations (for query optimizer), storage optimization (incl. indexes rebalancing), transaction log mgmt, etc. These processes have to be there (available & very responsive) regardless of DB size or traffic. The max ACU cap is another kind of challenge related to the max size of metal servers beyond Aurora - to remain both strongly transactional and very fast, the transactions have to be local, with all the possible sources of latencies removed from the transaction commit protocol.

These constraints were logical for an ACID-compliant OLTP database (like Aurora), but what about OLAP one (like Redshift)? They have their own specific characteristics (e.g., the number of transactions is incomparably smaller, locking/isolation levels are simplified, and the response times expectations are far more liberal, but the amount of data to be scanned is orders of magnitude greater).

There's nothing exciting about Redshift's pricing for storage - you don't pay for IO, but you're charged for the disk space you utilize. That's what one could reasonably expect.

The truly interesting part is (again) the compute "component". Just like in Aurora's case, a virtual unit (RPU) is used to measure the compute resources used by Redshift. However, there are subtle differences between how RPU and ACU work:

You don't pay for idle Redshift "compute" - e.g., when there are no queries (it scales down to zero).
When there's activity (queries, ingestion, etc.), at least 8 RPUs are allocated (so the minimum threshold is much higher than in the case of ASv2, keeping in mind that ACU is roughly three times cheaper than RPU ...).
Scaling granularity is not as fine-grained: the increments are 8 RPUs, but it gets scaled up to as many as 512 RPUs.

So, what do you think then - is Redshift Serverless still "serverless"? Are the differences justified by engineering common sense & use cases specific to analytical databases (aka data warehouses)?

No charge for idle cluster seems logical, but there's a "price to pay" - increased response time due to the compute resources "re-provisioning". However, warm resource spin-up is typically OK for users who don't expect a response in milliseconds (and that is OK when you're generating a report or running some other analysis). And the increased minimum RPU is justified by the characteristics of OLAP queries - aggregations run lock-less over large volumes of data & are parallelized by design.

At the same time, the essential qualities of "serverless" services seem to be in place. The only major one that is missing (IMHO) is being virtually limitless when it comes to scaling out. A hard cap (128 ACU for Aurora, 512 RPU for Redshift) may, at some point, become a painful restriction. We can assume that its (cap's) existence is due to the origin of both services - they haven't been designed as serverless initially. But is it really possible to develop a serverless, virtually unlimited ACID-compliant OLTP or analytical OLAP database from scratch?

This is a consideration for an entirely separate blog post, but as I don't want you to leave w/o any answer, I'll share my opinion on that:

For ACID OLTP DBs, I believe it is NOT possible if you want to keep the performance level of existing top-tier DBs, AND you're not open to scale-out-oriented enhancements/constraints (e.g., sharding built-in on the level of DDL/DML).
For OLAP DBs, I'm strongly convinced it IS possible (examples: Athena, the whole Hadoop ecosystem, Ray ...), but it always comes for some price - either limited elasticity (of what you can do) or processing overhead (that may scale very poorly).

I think it's the highest time to wrap up the whole series:

My goal was to explain why we should not expect that everything "serverless" should be some sort of a reflection of a Lambda function. E.g., it'd be naive to imagine serverless DBs as "query-as-a-service". There ARE DBs that work like that (DynamoDB), but this model works only for very simple engines (KV, simple indexing, no joins, etc.) - it would not work for the PostgreSQL equivalent.

It makes much more sense to define a serverless "serverless" with the following qualities:

no servers to manage
fully automated auto-scaling (up & down) driven by service consumption
you're getting charged for what you consume (/allocate)

IMHO, it's a fair definition that nails the operational advantages of "serverless" services.

And what about the most frequently raised controversies ("does not scale down to zero", "scales up & down in virtual units of step granularity")? Personally, I believe there is good engineering rationale behind that, knowing that serverless versions of the services haven't been created from scratch but adapted from server-bound versions.