According to the data shared by GitHub (e.g., here), the programming activity is booming online: not only do we get more developers, but they also create more code. Unsurprisingly, this trend is mostly attributed to the rise of LLMs (in all flavors: from "autocomplete on steroids" to agentic coding).

This seems like good news for the whole community, especially for its social coding and open source part, but let’s try to speculate (yes, it will be some sort of coffee ground fortune-telling) on the longer term effects - what does it mean beyond here & now, when the rules of the game adjust to the new dynamics. Spoiler: the news may not be as positive as I initially thought ...


Disclaimer: I’ll focus here on the impact on Open Source in the context of “traditional” software, but the final paragraph will be dedicated to so-called “Open Source LLM models” and their potential future.

Contributors

I assume that agentic coding will gradually improve over time, and there’ll be barely any need for humans to write code. With the constraint of manual code-crafting removed from the critical chain of building software (because LLMs will write not just cheaply, but also very quickly), we’ll move our attention to new bottlenecks (according to Goldratt’s Theory of Constraints) - deliberate design, business validation, product experimentation - and new activities aimed at securing/controlling LLMs’ output (review, creating guardrails & evals, crafting LLM-friendly specifications, etc.).


What will this mean for Open Source? I daresay the approach to contributors will suffer a tectonic shift:

  • The code contributions’ value will drop to near-zero - in fact, the in-processable avalanches of PRs are already effectively paralyzing projects' core teams.
  • Without “typical” contributions, it will be much harder (especially for newbies) to get a “buy-in” (stand out, credentialize oneself & build trust) in the global community. How will you be able to tell if someone 1/ has good intentions 2/ is well-aligned with the vision, and (last but not least) 3/ has an ability to execute?
  • That will have a transitive negative effect → a loyal pool of regular contributors was always a backup plan in case some members of the core team dropped off. That may make OSS projects far more fragile than they are now.

That’s not all.

LLM-cloning

Until now, people have been able to fork a successful repo to develop it their way (feature-wise, architecture-wise, etc.), but it had always come with a significant cost in future maintenance (merging changes from upstream) - the more you diverge, the higher the effort (especially if you decide to cherry-pick). That was one of the reasons people preferred contributing to the core project - and thereby building a positive flywheel effect for the good projects (good == worth contributing to).

But now the reality is changing - why contribute to the project if you can easily (& near-instantly) recreate it (as many times as you want) with LLMs, even in a different programming language (if you fancy)? That may dramatically increase the fragmentation of the OSS ecosystem, or, even worse, cause people to reconsider making projects Open at all, as they’ll be afraid that everyone will LLM-clone their work with zero attribution (& potentially change licenses).

Potential fragmentation (imagine thousands of Redis-alikes instead of just one Redis) may change the perception of OSS: a big part of trust (that OSS is rock-solid, battle-proven, and safe) was based on their popularity and widespread use - but this may not be the case anymore ...

Input

OSS was built on a foundation of passion and commitment of individuals who were not told what to do. They kept sacrificing their own free time because: 1/ it was fun; 2/ they were curious; 3/ they believed what they were doing was meaningful. The word “time” is essential here. They could have spent this time on various other things, like riding a bike, socializing, playing a guitar, dancing, or playing board games, but they’ve decided OSS contribution was the most worthwhile (for them personally).

But apparently, creating software won’t be a function with just one input (time) anymore. We’re getting a second, maybe even more important input variable: tokens. Will people be so eager to spend this precious (& potentially scarce) resource to contribute to OSS? I strongly believe that:

  • When it comes to LLM development, we’ve reached the curve area where the law of diminishing returns kicks-in strongly - the yield when it comes to investment in models' quality is getting smaller and smaller (for the same or higher investments) - Gen AI-assisted development won’t improve much by introducing newer models (unless we encounter yet another breakthrough - like Transformers in the past - but that is never guaranteed).
  • What matters now is: the depth of the context, the more structured approach to planning, governance & validation - both those require more & more tokens! The quality of the LLM's outcome will strongly depend on how many tokens we’ve dedicated to the task.

That means that (at least for some time) we’ll never have “enough” tokens - we’ll be continuously making tough decisions of where & when to save (use fewer tokens or a more token/context-savvy model). In such a reality, who’ll be eager to spare some of the stashed token pool for OSS?

“Open” models

Hey, wait - but there are also open Gen AI models - aren’t they? Isn’t it the future of OSS? Or at least doesn’t it mean that the openness will be preserved in some form?

I think I’ve already written about that more than once: “Open Source models” are a lie. Something like that does not exist. What many people call (deliberately or mechanically) “Open Source models” are, at most, open-weight models with an option of self-hosting (under a given license, so not necessarily free).

What does this mean in practice?

  1. You can’t re-create the model from scratch (yes, that would be VERY expensive), as you don’t have the data (in fact, you don’t even know where the data came from ...) or the software needed for pre-training, training and post-training.
  2. You can use open weights to fine-tune the model for your specific case or to keep it up to date. That’s a delicate operation (it can degrade quality as well!) that doesn’t modify the model itself, but adds another “layer” on top. But that’s pretty much the only way you can "contribute" to such “Open Source models”.
  3. Contrary to traditional OSS, while using or even fine-tuning the model, you don’t learn anything regarding how to build a similar model.
  4. If the model’s publisher decides they won’t publish new versions of their model anymore, you’re cooked. You can’t develop new versions on your own (fine-tuning isn’t really a new version - it’s optimizing for a purpose).

All that does have very strong consequences. Contributors & users don’t solve any meaningful problems for problem creators. What is the real incentive for open-weight models publishers, then?

  • When it comes to LLM providers, there’s no prize for 2nd, 3rd, or 4th place... as switching is so cheap and there’s no moat. That’s why, if you’re behind, it may make sense to publish your models for self-hosting, just to hurt the leader (& make sure the gap doesn’t widen).
  • It’s "free" marketing that may actually increase your market share (without increasing your slice of the pie though ...) - very important in this hype-powered early stage era.
  • There are sociopolitical reasons as well: some want to force their version of history and their viewpoints, and LLMs seem very effective as question answering engines.

What’s ahead?

It’s a good moment for a wrap-up.

I don’t know for sure what will happen with the Open Source software. Whether it will vanish or thrive. But I am certain of one thing - it will have to reinvent itself. The distribution of duties and activities will change, roles will evolve, and the future incentive hierarchy will likely be very different from what we know. There's no coming back to what we knew. But I want to believe there’s too much potential in human collaboration for OSS to go completely extinct.


When it comes to “Open Source models”, I am very skeptical. Those won’t go away, but as always, when you’re getting something “for free”, there’s always a catch - in that case, probably someone trying to spread their “version of truth”. And I absolutely do not believe we’ll keep getting quality levels comparable to bleeding-edge “commercial” models from those.

Is there anything that could change the situation and make “Open Source models” viable? That IS possible, but it would require a fundamental change in the rules of the game - 1/ at least two sides (publishers and contributors) would have to contribute to the quality of the product (model); 2/ models would have to be much more open, to secure the interest of the community (to prevent it from being cut off).

And what about local inference? Doesn't it have a potential to change those rules? Especially looking at the astronomical costs of infrastructure investments? Yes, it could be VERY beneficial for the consumers (they'd be "creating" their own, local token pools), but model publishers need inference revenue stream to blend the massive costs of training and updating the models - that's why I believe that the most probable scenario for local inference is running closed models locally in isolated containers, under the full control of model publisher (as a cheaper alternative to cloud-based LLM-as-a-service).

Share this post