I frequently get questions about my opinion on the future of computing. Can anyone threaten the position of (current) hyper-scalers? Didn't the progress of computing (from a technological standpoint) become predictable and ... boring? Aren't the times of exciting breakthroughs over?
Gosh, I don't think I have any mandate to claim to be an authority here. But my SWAG (Scientific Wild A$$ Guess) is that the exciting times are just around the corner. The forthcoming wave of new tech has the potential to reshuffle the market thoroughly. And no, I'm not referring to any kind of blockchain-powered dapps (decentralized apps).
OK, so what's coming?
The "current era's" scalable platforms are already built on distributed systems' principles, of course. But the technology that powers them has been shaped about ten years ago (in some cases even more). The solutions we use (databases, message brokers, computing engines, etc.) were optimized for the hardware constraints of the early 2010s (at best). However, much has changed since then ...
If you compare the technological progress for different computer components within the last ten years, the progression didn't advance everywhere at an equal pace:
- multicore transitioned to manycore (128+ cores ain't such a big deal these days), but the performance of a single core didn't change significantly
- RAM prices kept dropping (while the capacity of modules kept growing), but there was no breakthrough in terms of performance here
- network speeds (within your infrastructure, not in the public Internet) have increased tremendously - NICs that support 100 Gbps Ethernet are not a novelty anymore
- the biggest revolution happened in the persistent storage department, though - for both speed and price ($$$/GB); but (what many people are missing) this revolution isn't just about SSD replacing traditional HDD - one should not overlook the impact of NVM Express (NVMe)
There's an adorable synergy here. We have more CPU cores than ever, and finally (thanks to NVMe), they can utilize the internal parallelization potential of SSD (something that was not possible with AHCI - which was created for traditional, mechanical disk drives). NVMe supports multiple command queues and no-locking parallelism with full-duplex data transmission - storage I/O stopped being the unquestionable constraint in computing.
A side note: yes, I omitted GPU here. It is pretty new, it has its own (specific) applicability, which is outside of the scope of this article (different kind of computing).
Switching the gear up
OK, great. But what does that mean IN PRACTICE?
Pardon the oversimplification here, but CPU was rarely a bottleneck in the systems designed in the 2010s. Two main constraints those days were: storage (disk) I/O and network throughput. These two were so slow, that the performance of non-I/O code was pretty much irrelevant. So there was pretty much no reason for:
- sophisticated concurrency models
- bytecode level optimizations (in lower-level prog. languages, like C or even assembler)
No offense intended here, but that's why so many high-performing (for those times' standards) solutions were created on JVM. :)
Now, the rules of the game are changing. With NVMe available, there's no point in wasting good CPU cycles (or even whole cores). It's like an invitation for highly concurrent async I/O, preferably on a high-speed, compiled tech stack (like C/C++, Golang, or Rust). To make sure that CPU doesn't become a bottleneck (OMG, how does it even sound?!)
To put it bluntly - a modern computing system (database, processing engine, etc.) built from scratch with modern hardware in mind can easily beat its predecessors (performance-wise) by the factor of 3, 5, maybe even 10 (!). The critical point here is: it can't be a simple update or a gradual evolution, but a new product or pretty much a complete rewrite (as the changes are in fundamental aspects of back-end architecture).
How to make money on OSS, v.2.0
It's an unprecedented opportunity for all the contenders, a brand new opening with massive stakes. And what's even more interesting, the existing moguls have pretty much no advantage - it's a virtual reset and a chance for "pole position" for everyone with skills, talent, and A2E.
The leading software architectures of today are built on a foundation of OSS products. There's no reason for this trend to stop. But to make the competition even spicier, the OSS model isn't what it was 15 or 10 years ago.
The smart people in the community aren't that eager to commit their time and effort to develop a freely-available product, so then BigTech can monetize it at scale. These folks have learned their lesson. There are more and more companies built around the OSS projects - they not only drive (& control) the growth of the project but also shape their unique commercial offering based on it. To secure their business interest, they use new, more protective licensing (you can find a good example here).
How long will we have to wait for this new wave of high-performance products? Well, the first ones are apparently already here:
- ScyllaDB - https://www.scylladb.com/open-source-nosql-database/
- Red Panda - https://github.com/vectorizedio/redpanda
- ClickHouse - https://github.com/ClickHouse/ClickHouse
What's particularly interesting - many of these new challengers re-use the public APIs of already existing, successful products (e.g., ScyllaDB -> Cassandra, Red Panda -> Kafka) to simplify the adoption (by becoming pretty much a drop-in replacement of sub-par technology).
Exciting times ahead, without any doubt.