Serverlessness: what makes the service serverless

Nothing happens without reason. The same applies to the origin of this post, which was triggered by two independent events.

First of all, some time ago, I visited a local meet-up dedicated to software architecture. The main topic of the evening was "serverless", but for some reason, in the participants' opinion, "serverless" meant just Lambda functions - all folks who've spoken up used these two concepts interchangeably. I am OK with a good discussion that dives deep into just Lambda functions - there's certainly a lot to cover there - but I've found that oversimplification a bit surprising.

Additionally, it was hard to miss the wave of community comments that followed last year's announcements of new AWS services that have "Serverless" in their names, e.g., MSK Serverless, OpenSearch Serverless, or Redshift Serverless. Commenters were initially very enthusiastic about the idea, but (after diving into the details) many have questioned whether the aforementioned services are genuinely serverless. Some went as far as to call such names a misleading marketing trick.

I don't know about you, but for me, it sounds like an excellent opportunity to look closer at the term "serverless". What does it truly mean, what are the major criteria one has to meet to be "serverless," and how can we assess the amount of "serverlessness" (if it's not a binary measure ...). Just one clarification before we jump into details: this is by no means an official statement/definition from Amazon Web Services (my current employer). The opinions and mental models used below are mine and should be treated as private.

We could have started by coining a theoretical definition, but let's try a different approach. If Lambda functions are indeed so "serverless", let's capture their properties to extract the essence of "serverlessness". Unfortunately, there's a small (but important) gap between how people think Lambda functions work and how they really do. Why so? Many people have heard/read about Lambda functions, but significantly fewer actually use them in real production scenarios (the reasons are a topic for a completely separate discussion ...).

So how do people imagine Lambda functions work?

"I deploy the function with its runtime dependencies. The function is exposed via some endpoint for the callers. Each invocation is given (automatically) the resources (memory and computing power) it needs. Many invocations can happen at the same time (in parallel) - I don't need to worry about the number (unless there's a dependency that is not "serverless", e.g., a shared connection pool). I pay only for resources my function invocations consume (during the period of consumption)."

Nice, yet it needs more accuracy. We'll omit several details completely (e.g., related to the cold/warm start time, concurrency limitations, etc.), but there's one we should fix on the spot: one actually has to decide how much "horsepower" is under the hood - by specifying the amount of memory available (for each instance). What about CPU? It's linearly proportional to memory (at a fixed ratio). So it's still possible to over-provision Lambda, but only within the time of functions' execution - one may not notice that because of Lambda functions' granularity.

To illustrate the concept of Lambda, one can imagine an "infinite" (well, nothing is truly infinite ...), continuous computing plane ("fabric"?) where the functions are running. Each execution of a function instance "carves" out a piece of that fabric, uses it exclusively for the duration of the run and returns it (for others to use) when it's over. Those pieces are so small that their individual sizes are meaningless in the context of the whole "fabric" capacity. See? No servers involved, (almost) nothing to provision/patch/manage (except dependencies and runtime), only this "magical, continuous, infinite computing fabric".

Why "continuous"? Good question, thank you (it's a bit awkward debating with myself ...). The fabric you allocate from has no visible "seams". You don't indicate where you want to allocate stuff, and you're never told: "you know, this bale of fabric is already disposed of, try another one". It probably doesn't come up as a surprise, but the continuous compute fabric is just an abstraction. It's all running on servers (managed by AWS), but Lambda's control plane takes care of the effective allocation (e.g., to minimize "leftover" resources that can't be utilized) - it's completely transparent for you as a Lambda user.

The next post in the series, which covers other serverless computing options, is present here.