Serverlessness: what makes the service serverless

The previous post in the series (where we covered the definition of "serverlessness" and how Lambda functions correspond to it) can be found here.

A very similar idea of "serverlessness" has been applied in another compute service - AWS Fargate (or, to be more precise: ECS on Fargate). What's the difference? Instead of functions, one is running container image-based tasks. But the general idea stays the same - for each task, you specify:

which container image should be run
resource consumption per task instance
what are the basic rules that determine the number of task instances running in parallel (min, max, autoscaling conditions, etc.)

Does it mean ECS on Fargate works exactly the same way Lambda does? Nope, (obviously) the devil is in details. The Lambda function's lifespan is bound to the life of a single request. ECS containers are (generally) long-living. One spins them either to perform long-running jobs or to handle stable, foreseeable parallel request traffic over a long time. That brings in two significant implications:

If you calibrate the configuration of your long-running containers poorly, you may end up paying for heavily underutilized resources (especially if there's a period of zero traffic).
The spin-up time of new container resources is significantly higher (than in the case of lightweight Lambda) - that's why scaling down to zero is frequently a no-go (as it would create a very long tail in the latency distribution).

Does that mean that ECS on Fargate is not "serverless" then? Hmm, I wouldn't say that. The continuous fabric is still there (when allocating tasks, one doesn't have to worry about individual servers' "boundaries"), and one doesn't have to provision any servers (or operate, patch/update, etc.). Could ECS on Fargate be "more serverless"? Well, in theory, AWS could have kept charging ECS on Fargate "per request" and automatically overprovision ECS clusters with "hot" standby resources - however:

One can do many things (on containers) besides serving HTTP requests - what would be a universal atomic unit of processing then?
In the end, someone would have to cover the cost of overprovisioning - would it really be better if AWS hid these costs and simply blended them in their rates? The reasonable alternative is to set up proper auto-scaling (which is on ECS consumers, as they are the only ones who know the specifics of their applications)

OK, that was (relatively) easy. Why so? Compute services are typically stateless, and their (out-)scaling mechanisms are generally straightforward & flexible. But what about stateful services, which are all about their persistent state?

The final post in the series, where we take a look into stateful serverless services: incl. Aurora, Redshift, and DynamoDB - is available here.