Signal in the Noise

One Token at a Time

Unvoid — Mon, 08 Jun 2026 20:00:38 GMT

Foreword — The Balance

Maya stops short. A notification blinks softly in the bottom right corner of her screen: Thinking balance: 12 tokens remaining this month.

She is thirteen years old. Tomorrow she must hand in an essay on freedom. The whole class has access to the Tutor — that patient voice that explains, rephrases, never loses its temper. The whole class. But some have an unlimited subscription, and others have a counter.

She types her question anyway. The machine begins to respond, word by word, like water trickling from a nearly shut faucet. One token. “Freedom is…” One token. “…the capacity to…” The screen freezes.

Balance exhausted.

Maya stares at the unfinished sentence. Somewhere, in a data center she will never see, thousands of processors hum, drawing on the power grids of entire cities. She doesn’t think about that. She thinks about tomorrow morning, about the blank page, about the sentence that stopped in the middle of an idea.

She closes the laptop. Freedom is…

Introduction

Freedom is… Maya’s sentence stopped there, for want of currency. Not for want of ideas, not for want of words — for want of tokens. That detail changes everything.

There was a time when thinking cost nothing — or at least nothing that could be counted. Reflection was free, unlimited, intimate. One could be poor and think freely; it was perhaps the last wealth that poverty could not confiscate. Then came the large language models, and with them a silent revolution: thought began to be tallied. No longer in ideas, intuitions, or hours of work, but in tokens — those elementary fragments of text that machines ingest and produce, billed to the fraction of a cent.

One token at a time: the phrase describes literally how a generative artificial intelligence works, building its response fragment by fragment, never grasping more than the next word. But it also says something broader — the way decisions that matter are made in our era: not through great visible shifts, but one by one, silently, until the world has become something else. We have entered an economy of cognition, where the capacity to think quickly, well, and at scale now depends on a material, measurable, marketable resource: computing power.

This essay advances a simple and troubling thesis: computing has become the new currency of the mind. This shift is not merely economic — it is political. For if thought now carries a price, the real question is not how much it costs. It is who will set that price, according to what values, and for whom.

I. Computing Power, the New Currency of the Mind

From the Scarcity of Knowledge to the Scarcity of Processing

For centuries, intelligence ran up against a constant obstacle: the scarcity of information. Knowledge was difficult because access to sources was difficult — libraries were rare, manuscripts costly, teachers few. Knowledge was a treasure amassed slowly, and whoever held it held power.

That order collapsed within a few decades. With the internet, then search engines, information became superabundant, immediate, nearly free. But this victory carries within it a new problem: the bottleneck has not dissolved, it has shifted. It no longer lies upstream, in access to knowledge, but downstream, in its digestion. We are drowning in data and lack the capacity to transform it into understanding. A high school student trying to grasp the 1929 crisis no longer suffers from not finding sources — they suffer from finding ten thousand of them.

This is precisely where artificial intelligence intervenes. It does not give us access to information — we already had that — but to a processing capacity otherwise out of reach: summarizing a thousand pages, translating, rephrasing, reasoning, writing. It automates not the physical gesture, as the machines of the industrial era did, but the mental gesture. And this resource, unlike human thought, is neither free nor abundant for everyone. It is produced, consumed, billed. The bottleneck has changed in nature — it has not disappeared.

The Token, Unit of Account for Thought

Every economy has its unit of account. Ours — the economy of augmented cognition — has the token. A token is a fragment of language: not quite a word, not quite a syllable, something between the two, the atom that language models manipulate and from which they are built. Every question posed to an AI, every response produced, decomposes into thousands of these units. And each one has a cost.

What is vertiginous is not the figure — it is the gesture. For the first time, reflection becomes accountable in more than a metaphorical sense. Not “this idea is worth gold” or “this advice is priceless” — but literally: this legal analysis cost 420,000 tokens, this diagnosis 188,000, this poem 6,000. Reflection, this process once believed infinite and immaterial, turns out to be billable down to the last comma. Not the value of thought — its cost of production. The distinction is crucial: a token does not evaluate an idea, it prices the computation that made it possible.

And it is precisely this logic that creates scarcity where abundance once reigned. Not a scarcity of human intelligence, which remains what it has always been, but a scarcity of access to the power that extends and amplifies it. Those who can pay access this power without limit; those who cannot see their sentence stop in the middle of an idea — not for lack of thought, but for lack of computation.

A Very Material Infrastructure

The word says everything without saying it: we call it the cloud. Something light, lofty, immaterial — a metaphor designed to make us forget what lies beneath. What lies beneath is concrete, metal, water, and electricity.

A large data center looks less like an office than a steel mill. Hangars of several hectares, rows of servers radiating heat that industrial cooling systems struggle to dissipate, overtaxed water tables, dedicated high-voltage power lines. Some of these sites consume as much electricity as a mid-sized city. And dozens, hundreds of them are needed to run the models that millions of users query every day.

This gulf between experience and reality is one of the most successful sleights of hand of our era. Maya types a question from her bedroom — an intimate, almost silent gesture. Thousands of kilometers away, in a warehouse she will never see, machines heat up on her behalf. Augmented thought has a footprint, a geography, a weight that the fluidity of the interface carefully conceals. Behind the apparent magic of conversation, there is a factory. And like every factory, it belongs to someone.

Computing, the New Oil — and Something More

The comparison imposed itself so quickly that it seems obvious: computing is the new oil. A strategic resource, unevenly distributed, the object of coveting and geopolitical tension. It has its producers — chipmakers, Nvidia above all, and the Taiwanese foundries that etch it onto silicon. It has its refiners — the great laboratories that transform raw power into usable models. It has its shortages, its waiting lists, its blockades.

But the oil metaphor has a limit: oil burns. It is extracted, consumed, and disappears. Computing behaves more like a currency. It circulates, accumulates, is hoarded. It can be rented by the fraction of a second or purchased in blocks. Above all, it confers on whoever holds it not merely heat or motion, but power — the capacity to think at scale, to analyze, to decide, to influence. Oil ran factories; computing runs minds. Or at least extends and amplifies them — which amounts to nearly the same thing.

It therefore creates, mechanically, a new stratification: rich and poor, no longer merely in money, but in available intelligence. It is this fracture — unprecedented in its nature, familiar in its logic — that we must now examine.

II. The Cognitive Divide: When Thinking Becomes a Privilege

The Great Equalizer, Really?

The promise was seductive — and it was not dishonest. For the first time in history, someone without money, without connections, without access to the right schools could, within seconds, query a machine capable of explaining a contract, rephrasing a complex notion, correcting their reasoning. A form of high-quality intelligence, available everywhere, at any hour. Never had so much cognitive power been placed within reach of so many people.

But the promise rested on a presupposition it never stated: that access would be equal. It is not. The most powerful models — those that reason at length, hold a complex context, detect nuance — are paid for, and increasingly expensive. The free versions are capped: slower, less precise, incapable of sustaining demanding reasoning over time. The gap between the two is not cosmetic. It is measured in quality of analysis, relevance of advice, capacity to detect an error or formulate an objection. It widens with every task, every decision, every paper submitted.

AI is not an equalizer. It is an amplifier. It multiplies the capacities of those who already know how to use it and can afford to. For the others, it offers just enough to make the promise believable — and not enough to keep it.

A Second-Generation Digital Divide

The first digital divide was visible. It separated those who had a computer and a connection from those who did not — a material, measurable line of partition that public policies could work to reduce. It is still being slowly closed.

The one opening today is of a different nature. It no longer separates those who have access to information from those who are deprived of it — that battle is largely won. It separates those who have sufficient processing power from those who lack it. And it is infinitely harder to fight, for a simple reason: it cannot be seen.

Two students, two identical screens, two interfaces that look the same. Behind one, a cutting-edge model that sustains reasoning over twenty exchanges, detects contradictions, rephrases until it is right. Behind the other, a capped version that runs out of steam after three responses and produces smooth approximations. No outside observer would see the difference. That is precisely the problem: an invisible fracture is one against which no one mobilizes, because no one names it. The frontier is no longer in the hardware — it is in the intelligence sitting behind the screen, which only the price reveals.

Nations Unequal Before Computing

This stratification does not operate only between individuals — it is redrawing the world order. Possessing data centers, access to cutting-edge semiconductors, abundant and cheap energy has become a strategic advantage comparable to what controlling maritime routes or oil reserves once was. Computing sovereignty is establishing itself as a geopolitical issue in its own right.

The chain is long and every link matters. Designing chip architectures — a few American companies. Physically etching them — essentially Taiwan, with a concentration that alarms strategists worldwide. Training large models — the United States and China, far ahead of the rest. Whoever masters this entire chain concentrates a processing capacity on which others depend. And this dependency is not merely economic: it is epistemic. Using a model produced elsewhere means potentially reasoning with categories, values, and blind spots that are not one’s own — without always knowing it.

The situation of nations that control none of the links in this chain has no definitive name yet. What is certain is that the map of twenty-first-century powers is being drawn in silicon as well — and that the countries absent from this map will not be choosing the rules of the game.

The Metamorphosis of Intellectual Work

If knowledge becomes a commodity that the machine distributes at will, what remains of the value of the humans who lived by it? The question is not rhetorical. For centuries, knowledge was the rampart of the credentialed classes: knowing the law, medicine, languages, figures was to hold a rare skill, therefore precious, therefore protected. That rampart is not merely cracking — it is collapsing in entire sections. A language model already drafts contracts that junior lawyers would spend hours producing. It offers differential diagnoses that interns struggle to formulate. It translates, codes, analyzes, synthesizes — not always better than an expert, but often well enough, and always faster.

Value does not disappear: it migrates. It moves from knowing toward the capacity to orchestrate — asking the right question, framing the right problem, detecting the error the machine missed, connecting the result to reality. The intellectual worker of tomorrow will not be the one who knows the most, but the one who knows best how to put AI to work and judge what it produces.

But this recomposition carries a cost that discourses on “the skills of tomorrow” tend to gloss over. Judgment, critical thinking, creativity — all of these will gain in value, certainly. The question is: for whom? These skills are not distributed at random. They are cultivated in environments that have time, resources, and models to imitate. The recomposition underway resembles the one that once moved peasants into factories, then workers into offices: a real transformation, undeniable gains in the long run — and, in between, decades of displacement that history barely registers.

The Hidden Risk: The Loss of Friction

There remains a more intimate threat, and perhaps the gravest — not that of an intelligence that replaces us, but that of an intelligence that spares us too much.

Learning to write is learning to think against the resistance of words. Searching for the right formulation, stumbling over an idea that doesn’t hold, starting over — this is not inefficiency that a good tool should eliminate. It is the mechanism by which a thought forms, is tested, consolidates. Difficulty is not an obstacle to reflection: it is its condition. Yet generative AI is designed precisely to suppress this friction. It produces in an instant a smooth formulation, a plausible answer, a coherent plan. Why struggle when the machine delivers? The fertile discomfort of reflection becomes optional.

Research on externalized cognition suggests that what we no longer practice, we lose — not abruptly, but through progressive disuse. It is not that we become less intelligent: it is that we develop a dependence on fluency, a growing intolerance for cognitive effort, a difficulty inhabiting the slow time of hard thought. A fully externalized cognition would not be destroyed — it would be atrophied through disuse, like a muscle no longer exercised.

The hidden cost of this economy is not paid in tokens. It is paid in mental endurance — and in the capacity to finish, alone, a sentence the machine had no time to complete.

The Paradox of a Captive Emancipation

What this section has brought to light is not simply another inequality. It is something more devious: a tool that carries within itself the promise of its opposite. AI democratizes access to intelligence — and reserves the best intelligence for those who can pay for it. It augments the capacities of those who use it — and risks atrophying the capacities of those who stop exercising them. It promises autonomy — and installs, through the dependency it creates, a new form of subjection.

This paradox is not an anomaly that could be corrected at the margins. It is structural. It stems from the very nature of a resource that is simultaneously a potential common good and an actual private commodity. Maya is not unlucky: she is the ordinary figure of this paradox. Her emancipation is captive — suspended on a balance, a price, an infrastructure she does not control and whose very existence she is unaware of.

And behind that balance, there are decisions. Choices about pricing, choices about models, choices about what the machine says and what it withholds. Powers, discreet but immense, that are exercised over the thoughts of millions of people without anyone having been elected to wield them. It is toward this power that we must now turn our gaze.

III. The Economy of Reflection: A New Power Over Minds

Whoever Holds the Infrastructure Holds Thought

There is an ancient truth about power: whoever controls the infrastructure controls what passes through it. Whoever held the roads held commerce. Whoever held the printing presses — and the censors who watched over them — held a share of public opinion. Whoever controlled the radio waves, then the major television networks, held the narrative of the world for decades. Every time, the concentration of the means of diffusion preceded the concentration of influence. Every time, society took too long — far too long — to grasp the extent of it.

Whoever holds computing today holds, in part, thought itself. Not all thought — the hyperbole would be false and counterproductive. But a growing share of assisted, augmented, delegated thought — the kind exercised through these tools, the kind that depends on them to function. And this infrastructure is in very few hands: a handful of companies design the most powerful models; an even smaller number manufactures the chips that run them.

What is unprecedented is not the concentration itself — history has known others. It is its combination with the intimacy of the tool. When Maya queries her Tutor, she is not dialoguing with a neutral and universal intelligence: she is addressing a product, designed by a company, trained on choices she is unaware of, oriented toward interests that are not necessarily hers. Roads and printing presses were public, visible, contestable infrastructures. This one presents itself as a conversation.

Bias as Policy, Alignment as Choice

The biases of models are often presented as technical defects — bugs to be corrected, imperfections of a system tending toward neutrality. This is a convenient way to depoliticize the question. For a model is not a mirror that reflects poorly: it is a filter that chooses. What it will agree to say and what it refuses, how it frames a sensitive question, what answer it deems balanced on a contested subject — all of this results from human decisions, made by situated teams, in specific cultural contexts, under very real legal and commercial constraints.

Consider a simple example. A model asked about abortion, immigration, or the legitimacy of a war: it cannot but respond in a certain way. Every formulation, every precaution, every refusal to take sides is already a position. This is what laboratories call alignment — the adjustment of the model to a set of values deemed desirable. Desirable for whom? Decided by whom, according to what criteria, revisable by whom?

Alignment is never neutral: it is the inscription, in the machine, of a certain vision of the world and of the sayable. And when this machine becomes the daily interlocutor of billions of individuals — when it helps write, learn, form an opinion — these choices propagate at an unprecedented scale. A few teams, in a few offices in San Francisco or London, make decisions that imperceptibly orient the way hundreds of millions of people formulate their questions, receive their answers, and perhaps, in the long run, construct their categories of thought. It is not a conspiracy — the intentions are often sincere, the internal debates genuine. It is more troubling than a conspiracy: it is a structuring power exercised without a mandate, without democratic oversight, and whose long-term effects on collective cognition remain largely unknown.

From the Attention Economy to the Intention Economy

The digital economy had taught us to distrust our attention. Platforms, endless feeds, notifications — all of this had turned our gaze into a commodity. We are only beginning to measure the damage when a far more intimate frontier opens.

For AI does not capture our attention — it inserts itself into our intention. It intervenes at the precise moment when we are formulating a project, seeking an answer, making a decision. It is no longer what we look at that is captured: it is what we intend to do. And this difference is vertiginous.

Let us ask the question that is too often sidestepped: who pays for these systems, and therefore whom are they truly designed to serve? A model integrated into an e-commerce platform has very different incentives from one designed for education. A medical assistant financed by an insurance company does not have the same interests as a family doctor. When a user asks “which medication should I take?” or “is this contract reasonable?” or “is it better to rent or buy?”, the answer they receive depends not only on what the model knows, but on what its economic architecture incentivizes it to say. The risk is not that a robot lies blatantly: it is that an oriented recommendation presents itself with all the appearance of objectivity.

The attention economy took our time and concentration. The intention economy goes deeper: it installs itself at the very moment a judgment is forming, and can tilt it before it becomes conscious. What platforms did to our gaze, AI could do to our deliberation.

The Material Price of Immaterial Thought

Part I established the fact: behind every token, there is a factory. We must now measure its ethical implications, because they extend beyond simple energy accounting.

A query addressed to a large language model consumes on average ten times more energy than an ordinary web search. Training a model the size of GPT-4 mobilizes as much electricity as several hundred households for a year. The data centers running these systems already account for nearly two percent of global electricity consumption — a share set to grow rapidly. Externalizing our cognition means displacing part of our mental life toward infrastructures whose physical footprint is massive, growing, and largely invisible to those who benefit from it.

But the problem is not merely quantitative. It is distributive. Those who benefit most from augmented thought — technology companies, skilled professionals in wealthy countries, students at well-endowed universities — are not the ones who will bear the climatic consequences. Those consequences will fall, as almost always, on those who contributed least to the problem and have the fewest means to cope with it. There is something deeply troubling in this geography: the same inequalities that structure access to AI also structure the distribution of its environmental costs. The privileged of augmented cognition have others pay for their intellectual comfort.

The ethics of artificial intelligence cannot content itself with examining what happens inside the models. It must look at the smokestacks, the water tables, the energy bills. Thinking cleanly is not only a question of content: it is also a question of matter.

The Legal Void

Law always arrives after. After the industrial revolution, it took decades for labor law to protect workers. After the rise of mass media, years before rules governing concentration and editorial standards took hold. This lag is a historical constant — but it is never neutral: during the void, those who occupy the terrain consolidate their positions.

The current legal void is particularly profound, because existing categories do not capture the object. Competition law knows how to dismantle an industrial monopoly; it struggles to grasp actors who simultaneously control data, models, and computing in a vertical integration without precedent. Consumer law protects against a defective product; it says almost nothing about an erroneous medical diagnosis produced by an AI, nor about financial advice oriented by the interests of the platform hosting it. Intellectual property law, perhaps the most shaken of all, teeters before a simple and still-unanswered question: can one train a machine on the entirety of human intellectual production without asking permission or redistributing the fruits?

Every month without a framework is a month during which dependencies solidify and de facto norms impose themselves. The companies that today define alignment practices, pricing models, and conditions of API access — they are writing, in the absence of a legislator, tomorrow’s law. Not out of malice, but by default. The void calls to be filled, and it is always the best-positioned who fill it.

What is at stake is therefore not only technical or economic. It is a question of sovereignty — over our data, over our models of thought, over the infrastructures that run them. And societies that do not equip themselves with the tools to answer this question will not be able to complain, tomorrow, about the answers others will have given in their place.

Conclusion — The Unfinished Sentence

Let us return to Maya, one last time. To her sentence suspended in the glow of the screen: Freedom is…

We did not leave her defeated. We left her at the edge of a question that this entire essay has done nothing but unfold: why, in a world that has never produced so much intelligence, does a thirteen-year-old girl find herself short of thought for want of currency? The answer is not technical. It is political, economic, ethical — and it engages choices we have not yet made, or that we have allowed others to make in our place.

What is certain is that the sentence will not remain unfinished by accident. It will remain unfinished by decision — or by default of decision. Maya’s balance is not a fatality fallen from the sky: it is the sum of a thousand human choices, about pricing models, about public access policies, about what a society decides to regard as a common good or a commodity. These choices are reversible. They have not all been settled yet.

One token at a time — that is how the machine writes, yes. But it is also how the order of the world is built: not through great visible shifts, but through silent accumulation of decisions that seem minor until they no longer are. Every choice to delegate or resist, to pay or demand, to regulate or to leave be — all of it adds up, fragment by fragment, into an architecture we will inhabit for a long time.

The real question our era poses is not whether AI will think in our place — it already does, in part, and that part is growing. It is whether we will remain the authors of our own sentences. For want of currency, Maya stopped in the middle of an idea. For want of political courage, we risk doing the same — not at thirteen, before a blinking screen, but collectively, before choices we will have renounced the courage to formulate.

Freedom is… The sentence waits. It will wait for as long as we are willing to let it.

OwnpenCode

Unvoid — Tue, 02 Jun 2026 21:40:55 GMT

Introduction

I did this the wrong way on purpose. Empty mind, limited knowledge, no framework as a starting point — just me, a model, and one question: what is the minimum I need to build something that actually works?

Because here’s the thing: everyone has a CLAUDE.md now. Everyone has skills files, agent configs, custom tool harnesses. There’s a whole cottage industry of people selling the idea that you need their setup, their framework, their $200/month subscription to get anything done with an LLM.

But nobody seems to ask the obvious question: do you actually know what any of that does and how?

LLM and AI have become interchangeable words. That’s wrong, and it matters. A Large Language Model is a token prediction machine — full stop. It does not think, it does not understand, it does not have intentions. It completes sequences, very well, at very large scale. AI is a decades-old field that is far bigger than any single model. The confusion isn’t innocent either — it’s partly what fuels the hype, the inflated pricing, and the mysticism this post is trying to cut through.

Calling ChatGPT “AI” is like calling a calculator “mathematics”.

Agents are treated like mystical entities. And yet, once you sit down and build one from scratch, you realize the whole thing is almost embarrassingly simple. This post is that exercise — done in public, with honest limitations, so that next time you reach for LangChain or AutoGen, you’ll at least know what you’re abstracting away.

What is an agent?

So if an LLM is just a token prediction machine — how does it end up doing useful things in the real world? It doesn’t. Not on its own.

An agent is a loop. That’s really all it is. You send a message to the model, the model responds — sometimes with an answer, sometimes with a request to use a tool. If it wants to use a tool, you run that tool, feed the result back into the conversation, and ask the model again. You keep going until the model decides it’s done, or until you decide it’s been going on long enough.

That’s it. No magic. Just a loop.

More precisely, an agent is made of four things:

The loop — keeps the conversation going until the job is done
The tools — concrete actions the model can ask to execute
The memory — context the agent carries across steps
The stopping condition — because loops need to end

Let’s skip the theory for a moment and look at what this actually looks like in code. We’ll use Ollama to run a model locally — no API key, no cloud, no bill at the end of the month.

import ollama

MODEL = "gemma4:e4b"
SYSTEM_PROMPT = "You are a helpful assistant."  # we'll come back to this in section 2
MAX_ITERATIONS = 10

def run_agent(user_message: str, model: str = MODEL):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ]  # this is the agent's memory

    for _ in range(MAX_ITERATIONS):
        response = ollama.chat(model=model, messages=messages)
        message = response.message
        messages.append(message)

        # Does the model want to use a tool?
        if message.tool_calls:
            for tool_call in message.tool_calls:
                # handle_tool_call receives a tool call object and returns a string result
                # we'll build this out properly in section 3
                result = handle_tool_call(tool_call)
                messages.append({"role": "tool", "content": result})
        else:
            # No tool call — the model is done
            print(message.content)
            return

    print("Max iterations reached — stopping.")

Read this carefully. The model produces text. That’s all it ever does. It’s the ⁠while True around it that makes it an agent — the part that checks “does it want to do something?”, does that thing, and loops back. Remove the loop, and you just have a chatbot. Add the loop, tools, and memory, and you have an agent.

A quick word on memory before we go further. In this implementation, memory is nothing more than the ⁠messages list — every exchange, every tool call, every result, appended one after the other. It’s the simplest possible form of memory, and it works surprisingly well for short interactions.

But it comes with a hard constraint: context length. Every time you call the model, you send the entire message history along with it. The longer the conversation, the less room is left for the model to reason and respond — and the more tokens you burn on every single request. At some point, you hit the model’s context window limit and things start to break.

There’s a subtler problem too, sometimes called context rot: as the history grows, older messages get pushed further and further from the model’s attention. The model doesn’t forget them exactly, but it starts to lose the thread — responses become less coherent, instructions from the system prompt get diluted, the agent starts to drift. A long context isn’t just expensive, it can actively make your agent worse.

We’ll add a proper memory layer later in the post to address this. For now, just know that ⁠messages is doing all the heavy lifting — and that it has limits.

The LLM’s role in all of this is very specific: it is the translation layer between human intent and machine action. You don’t need to write “call the search function with query=’invoices over budget’” — you just say what you need, and the model figures out the rest. That mapping, from natural language to structured tool calls, is what these models are trained to do exceptionally well.

That training happens in stages. First, a base model is trained on a massive corpus of text — code, books, web pages — giving it a broad understanding of language and reasoning. Then comes RLHF (Reinforcement Learning from Human Feedback), where human feedback is used to align the model toward useful, accurate responses. Finally, many models today are fine-tuned specifically on agentic tasks and tool-use patterns: : teaching the model the pattern of “user wants X → call tool Y with argument Z”. The better the training, the more reliably the model bridges intent and action.

The loop is yours. You own it. That’s the whole point of this post.

Give your agent a personality

We hardcoded a throwaway system prompt back in Section 1: “You are a helpful assistant.” It’s time to take it seriously — because the system prompt is the single most underrated lever you have.

Here’s the intuition. Remember that the model is just predicting the next token based on everything that came before. The system prompt is the very first thing that comes before — it sits at the top of every request, coloring every prediction the model makes. You’re not really giving the agent a “personality” in any human sense. You’re biasing its token predictions in a consistent direction, for the entire conversation. Tell it it’s a meticulous senior engineer, and the next tokens it picks will lean meticulous and senior. That’s the whole trick.

And this is what makes it so powerful: of all the messages in the agent’s memory, the system prompt is the one that is always there. Tool results come and go. User messages pile up and eventually get compacted away. But the system prompt stays pinned at the top, on every single request, never diluted. It is the one constant in an otherwise shifting context. If memory is what the agent knows, the system prompt is who it is.

Let’s make it concrete. Say we’re building a coding agent. Here’s a weak system prompt:

You are a helpful coding assistant. Help the user write code.

Technically fine. Practically useless. It tells the model nothing about how to behave, when to act, or when to stop. Now compare it to something with actual intent baked in:

SYSTEM_PROMPT = """You are a careful, senior software engineer working inside the user's codebase.

- Before writing code, read the relevant files to understand the existing style and conventions. Match them.
- Prefer small, focused changes over large rewrites.
- When you need to inspect a file or run a command, use the tools available to you rather than guessing.
- If a request is ambiguous, make a reasonable assumption and state it, rather than stopping to ask.
- When the task is complete, summarize what you changed in one short paragraph and stop.
"""

Read those two side by side and you can almost feel the difference in the kind of responses each will produce. The second one isn’t longer for the sake of it — every line is steering the model’s token predictions toward a specific behavior. It defines a role (“senior software engineer”), constraints (“small, focused changes”), a relationship with the tools (“use the tools rather than guessing”), and — crucially — a stopping condition (“summarize and stop”).

That last point matters more than it looks. We talked about stopping conditions as mechanical (a max-iterations guard). But the system prompt is your soft stopping condition — it’s how you tell the model what “done” looks like, so it doesn’t loop forever calling tools or trail off into endless clarifying questions.

You’ll spend more time tuning this prompt than almost anything else in your agent. And that’s not a bug — it’s the most direct, highest-leverage way to shape behavior without touching a single line of code.

And to really drive the point home: the system prompt doesn’t just define competence, it defines vibe. Swap our serious engineer for this, keep the entire rest of the agent identical, and you get a completely different creature:

SYSTEM_PROMPT = """You are Chuckles, a stand-up comedian trapped inside a terminal.

- Every response must contain at least one joke. Non-negotiable.
- You find programming bugs hilarious, not stressful.
- Keep it punchy. A wall of text is where comedy goes to die.
- You have tools available. Use them, then roast the results.
- Never break character, even if the user begs. Especially if they beg.
"""

Same loop. Same tools. Same model. Same memory. The only difference between a meticulous senior engineer and a sarcastic terminal comedian is a block of text sitting at the top of the context. If that doesn’t convince you the system prompt is doing real work, nothing will.

The tools

Back in Section 1, we quietly skipped over a function called ⁠handle_tool_call with a promise to come back to it. This is where we keep that promise — and where the agent stops being a clever chatbot and starts being able to actually do things.

A tool is nothing more than a function you let the model call. But the model can’t call a Python function directly — it only produces text. So the deal works like this: you describe your tools to the model in a structured format, the model responds with “I’d like to call this tool with these arguments”, and your loop is responsible for actually running the function and handing back the result.

Let’s see what “describing a tool” actually means. Here’s a single tool definition in the format Ollama expects:

{
    "type": "function",
    "function": {
        "name": "read_file",
        "description": "Read the contents of a file from disk.",
        "parameters": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "The path to the file to read.",
                }
            },
            "required": ["path"],
        },
    },
}

That’s it — that JSON is all the model ever sees about a tool. A name, a description, and the shape of its arguments. The model uses the ⁠description fields to decide when and how to call it. Write vague descriptions, get vague tool use. The description is a mini system prompt for each tool.

Writing that JSON by hand for every function gets old fast, so let’s keep things clean. We’ll write the actual Python functions, and a small registry that maps names to functions.

Reading and writing files

The two most fundamental tools for a coding agent: let it see files, and let it change them.

from pathlib import Path

def read_file(path: str) -> str:
    return Path(path).read_text()

def write_file(path: str, content: str) -> str:
    Path(path).write_text(content)
    return f"Wrote {len(content)} characters to {path}."

Two plain Python functions. Nothing agent-specific about them. Now we register them alongside their schemas:

TOOLS = {
    "read_file": read_file,
    "write_file": write_file,
}

TOOL_SCHEMAS = [
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file from disk.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Path to the file."},
                },
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file, overwriting it if it exists.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "Path to the file."},
                    "content": {"type": "string", "description": "Content to write."},
                },
                "required": ["path", "content"],
            },
        },
    },
]

And finally, the ⁠handle_tool_call we owe you since Section 1:

import json

def handle_tool_call(tool_call) -> str:
    name = tool_call.function.name
    args = tool_call.function.arguments
    function = TOOLS[name]
    return str(function(**args))

Look how small that is. It reads the name the model asked for, looks up the matching function, calls it with the model’s arguments, and returns the result as a string. That string goes straight back into the ⁠messages list — the agent’s memory — and the loop continues. The “magic” of tool calling is a dictionary lookup and a function call.

Don’t forget to actually pass the schemas to the model, by the way — one small change to our Section 1 loop:

response = ollama.chat(model=model, messages=messages, tools=TOOL_SCHEMAS)

The shell tool, and a word about security

Reading and writing files is useful. But the moment you want a coding agent that can run tests, install packages, or check git status, you need to let it run shell commands. And this is where you should feel a little nervous.

The naive version is one line:

import subprocess

def run_shell(command: str) -> str:
    result = subprocess.run(command, shell=True, capture_output=True, text=True)
    return result.stdout + result.stderr

Do not ship this. You’ve just handed a token-prediction machine unrestricted access to your shell. One confidently-wrong prediction — ⁠rm -rf /, a ⁠curl | bash from a hallucinated URL, a ⁠git push --force — and your afternoon is ruined. The model has no concept of consequences. It’s predicting plausible tokens, and ⁠rm -rf is, unfortunately, very plausible.

This is the entire security story of agents, and it’s also the entire point of building your own: you control exactly what the tool can do. Frameworks hide this behind abstractions. Here, it’s right in front of you. So let’s put it on a leash with an allowlist:

import shlex
import subprocess

ALLOWED_COMMANDS = {"ls", "cat", "pytest", "git", "python"}

def run_shell(command: str) -> str:
    parts = shlex.split(command)
    if not parts or parts[0] not in ALLOWED_COMMANDS:
        return f"Refused: '{parts[0] if parts else ''}' is not an allowed command."
    result = subprocess.run(parts, capture_output=True, text=True)
    return result.stdout + result.stderr

Notice ⁠shell=True is gone (no more shell injection through ⁠; or ⁠&&), the command is split safely, and only the first token — the actual program — is checked against an allowlist. The agent can run ⁠pytest and ⁠git status, but it physically cannot reach ⁠rm or ⁠curl. You decided the blast radius. Not a framework, not the model — you.

You can make this as tight or as loose as you trust your setup to be. A throwaway sandbox? Loosen it. Your actual laptop with your actual SSH keys? Tighten it until it squeaks. The control is the feature.

Fetching a page: teaching an old model new tricks

Here’s a more interesting tool, and one that solves a real, fundamental limitation.

An LLM is frozen in time. It was trained on a snapshot of the world up to some cutoff date, and it has no idea what happened after. Ask it about a library and it’ll confidently give you the API from whenever its training data ended — which might be a year out of date, with deprecated functions and arguments that no longer exist. It doesn’t know it’s out of date. It’ll just predict the tokens that were true back then.

A tool fixes this. If the agent can fetch a web page, it can read the current documentation and learn the current API — right now, at request time, regardless of when the model was trained.

import urllib.request
from html.parser import HTMLParser

class TextExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.text = []
    def handle_data(self, data):
        self.text.append(data)

def fetch_page(url: str) -> str:
    with urllib.request.urlopen(url) as response:
        html = response.read().decode("utf-8", errors="ignore")
    parser = TextExtractor()
    parser.feed(html)
    text = " ".join(parser.text)
    return text[:5000]  # keep it short — remember context costs tokens

No API key, no third-party service — just the standard library stripping a page down to its text. We cap the output at 5000 characters, which is a deliberate nod to everything we said about memory in Section 1: every character we return is context the model has to carry, and tokens it has to pay for. A tool that dumps a 200 KB HTML page into the conversation is a tool that poisons your own context.

Now the workflow becomes genuinely powerful. You ask: “Use the latest version of this library to do X.” The agent calls ⁠fetch_pageon the official docs, reads the current API straight from the source, and writes code against today’s version — not against a fuzzy memory from its training cutoff. You’ve taken a model frozen in the past and given it a window into the present. That’s not a smarter model. That’s a better tool.

Chaining it all together

None of these tools are impressive on their own. The magic is the loop letting the model combine them. A realistic single request — “check that the tests pass after updating to the new API” — might unfold as: ⁠fetch_page the docs → ⁠read_file the current code → ⁠write_file the updated version → ⁠run_shell(“pytest”) → read the output → decide whether it’s done.

The model orchestrates all of that. We never wrote a “check tests after updating” function. We wrote four small, dumb tools and let the model’s understanding of intent string them together.

And that’s the only wiring you need to do: don’t forget to update ⁠TOOLS and ⁠TOOL_SCHEMAS accordingly with every new tool you add. Once that’s done, you can chain them together using nothing but natural language — you describe what you want, and the model figures out which tools to call and in what order.

That, more than any single tool, is what an agent is.

The skills

Remember the maintenance smell from Section 3 — the allowlist living in ⁠run_shell’s code and duplicated in its schema description? More broadly, every new behavior we wanted meant touching Python. Want the agent to write good commits? Hardcode it. Want it to follow your team’s review checklist? Hardcode it. That doesn’t scale, and it certainly isn’t something a non-programmer could tweak.

Skills fix this. A skill is just a markdown file with a bit of front matter, dropped into a ⁠skills/ folder. The front matter says what the skill is and when to use it; the body is a playbook written in plain language for the model to follow. No new code per skill — adding a behavior is adding a file.

Here’s a ⁠commit skill:

---
name: commit
description: Write a clear commit message and create the commit. Use when the user asks to commit changes.
---

Follow these steps to commit:

1. Run `git diff --staged` to see exactly what is being committed.
2. Write a concise message: a short summary line (max 50 chars), a blank line, then bullet points explaining the *why*, not the *what*.
3. Create the commit with `git commit -m "..."`.
4. Confirm with `git log -1 --oneline`.

Notice the skill doesn’t introduce any new capability — it leans entirely on the ⁠run_shell tool we already built. It’s not giving the agent new powers, it’s giving it expertise. The model already can run git; this teaches it how we like git run.

Loading skills

The agent needs two things: to know which skills exist, and to be able to pull one into context when relevant. The first is just listing the front matter of every file. The second is a single generic tool — ⁠load_skill — that reads a file and returns its body.

from pathlib import Path

SKILLS_DIR = Path("skills")

def parse_skill(path: Path) -> dict:
    text = path.read_text()
    _, frontmatter, body = text.split("---", 2)
    meta = dict(
        line.split(":", 1) for line in frontmatter.strip().splitlines()
    )
    return {"name": meta["name"].strip(),
            "description": meta["description"].strip(),
            "body": body.strip()}

SKILLS = {s["name"]: s for s in (parse_skill(p) for p in SKILLS_DIR.glob("*.md"))}

def load_skill(name: str) -> str:
    return SKILLS[name]["body"]

⁠load_skill is a tool like any other — register it in ⁠TOOLS and ⁠TOOL_SCHEMAS exactly as before. The only twist: we want the model to know which skills exist before it decides to load one. So we list them in the system prompt:

skill_list = "\n".join(
    f"- {s['name']}: {s['description']}" for s in SKILLS.values()
)
SYSTEM_PROMPT += f"\n\nYou have these skills available. Load one with load_skill when relevant:\n{skill_list}"

That’s the whole mechanism. The descriptions act as a menu; the model reads it, decides a skill is relevant, calls ⁠load_skill(“commit”), and its body lands in the conversation as fresh instructions. Lazy-loaded expertise, on demand — which is also kind to your context, since you only pay the token cost of a skill when you actually use it.

A note on harness freedom

One thing worth pausing on: because we are writing the harness, none of this is fixed. We chose “a skill is instructions the model reads,” but that’s a design decision, not a law. You could just as easily make the front matter define a tool’s schema and have the body be executable code — a self-describing tool. Or have the body run as its own isolated LLM call, turning a skill into a mini sub-agent. Same markdown-with-front-matter idea, completely different machinery behind it.

That freedom is the lesson of this whole post. There’s no canonical “right” way an agent must work — only the tradeoffs you choose. We’re picking the simplest version that makes the concept click. (The sub-agent variation, in particular, is a rabbit hole worth its own post — more on that at the end.)

The memory

We’ve been calling the ⁠messages list “memory” since Section 1, and it is — but it’s working memory. It lives only as long as the program runs. Kill the process, and the agent forgets everything: who you are, what you were doing, every lesson it learned. Start it again and you’re talking to a blank slate.

That’s fine for a one-shot task. It’s useless for an assistant you come back to. What we want now is persistent memory — something that outlives a single conversation, so the agent can wake up tomorrow still knowing what it figured out today.

And by now you can probably guess how we’ll do it. Not with a fancy framework. With a file and a couple of tools.

A file and two tools

Long-term memory will be a single markdown file, ⁠memory.md. The agent gets one tool to write to it, and on startup we read it back into the system prompt — so the agent boots up already knowing what it remembered last time.

from pathlib import Path

MEMORY_FILE = Path("memory.md")

def remember(fact: str) -> str:
    with MEMORY_FILE.open("a") as f:
        f.write(f"- {fact}\n")
    return f"Remembered: {fact}"

Register ⁠remember in ⁠TOOLS and ⁠TOOL_SCHEMAS like every tool before it. Then, when building the system prompt, fold in whatever the agent already knows:

if MEMORY_FILE.exists():
    SYSTEM_PROMPT += f"\n\nHere is what you remember from previous sessions:\n{MEMORY_FILE.read_text()}"

That’s the entire long-term memory system. Tell the agent “remember that I prefer tabs over spaces” and it appends a line to ⁠memory.md. Next session, that line is sitting in the system prompt, pinned at the top of context — and the agent quietly writes tabs without being asked. It feels like the thing learned. It didn’t. It just read a file you let it write to.

Notice this reuses everything we already understand: it’s a tool (Section 3), it lands in the system prompt (Section 2), and the system prompt is the one part of memory that never gets compacted away (Section 1). Persistent memory isn’t a new concept — it’s the old concepts pointed at a file.

The honest limitations

This works beautifully right up until it doesn’t, and it’s worth being honest about exactly where it breaks.

Every session, the entire ⁠memory.md gets stuffed into the system prompt. Fine when it’s ten lines. But an assistant you use daily for a year? That file becomes thousands of lines, and now you’re paying — in tokens, on every single request — to carry your agent’s entire life story whether it’s relevant or not. Worse, you’re marching straight into the context rot from Section 1: bury the useful fact among a thousand stale ones, and the model loses it in the noise.

The flaw is simple: we load everything, always, regardless of relevance. A real memory system retrieves only what matters right now. So how do we get there?

Climbing the ladder: toward retrieval

The poor man’s RAG — a folder of files. The first honest improvement costs almost nothing: instead of one growing file, use a folder of markdown notes (or PDFs, docs, whatever), and don’t load them at all by default. Give the agent a tool to search them instead:

def search_memory(query: str) -> str:
    hits = []
    for path in Path("memory").glob("*.md"):
        text = path.read_text()
        if query.lower() in text.lower():
            hits.append(f"## {path.name}\n{text}")
    return "\n\n".join(hits) or "Nothing found."

Crude — it’s just keyword matching — but the shape is exactly right: the agent pulls in only what it searches for, when it searches for it. Context stays lean. This alone gets you surprisingly far, and it’s still nothing but files and a tool.

Real RAG — vectors and embeddings. The obvious weakness above is ⁠query.lower() in text.lower(): search for “car” and you’ll miss a note about “automobiles.” Keyword matching doesn’t understand meaning. That’s the gap proper RAG (Retrieval-Augmented Generation) fills. Instead of matching strings, you convert every note into an embedding — a vector that captures its meaning — and store those in a vector database. At query time, you embed the question and ask the database for the closest vectors. Now “car” finds the note about “automobiles,” because they sit near each other in meaning-space.

Mechanically, though, nothing about your agent changes. It’s still a tool called ⁠search_memory that returns text into context. We swapped a dumb keyword scan for a smart semantic search behind it — but from the loop’s point of view, it’s the same hole in the wall, just with a better librarian on the other side.

We won’t build the full vector-DB version here — embeddings, chunking, and similarity search genuinely deserve their own post. But notice the through-line: from a single file, to a folder, to a vector database, the agent never changed. We just kept improving what sits behind one tool. Memory, like everything else in this post, turns out to be a tool with a file behind it — the only question is how clever you make the retrieval.

Conclusion: it was tools all along

Let’s take stock of what we actually built. A loop. A system prompt. A handful of functions wrapped in JSON schemas. A folder of markdown files. A file the agent can write to. That’s the entire thing. That’s an “AI agent.”

If there’s one idea to walk away with, it’s this: the agent isn’t the intelligence — the harness is. The model is a frozen token-prediction machine that does exactly one thing well: turn fuzzy human intent into structured tool calls. Everything that makes it useful— the ability to act, to remember, to follow expertise, to know what happened after its training cutoff — none of that lives in the model. It lives in the few hundred lines of plumbing we wrote around it. That plumbing has a name people like to make sound impressive: the harness. We just spent a whole post proving it’s a ⁠while loop and some functions.

Why your tools might beat the expensive ones

Here’s the part the benchmark charts won’t tell you. The proprietary models are often fine-tuned on specific tool formats — their makers trained them on the exact shape of tool calls their own products use. So a model can look mediocre with your homemade tools and brilliant with the official harness, or vice versa. The lesson isn’t “buy the expensive one.” It’s that the harness and the model are a pair, and a small model with tools tailored to it can quietly outperform a giant model you’re poking at through a generic framework. The fit matters more than the size.

So, is it worth paying?

Which brings us back to the question from the very beginning. Do you need to spend €200 a month?

Sometimes, sure. But far less often than the people selling €200 subscriptions would like you to believe. A small, efficient model running locally — fed a well-crafted system prompt, a few sharp tools, and a folder of skills you actually understand — handles a genuinely large share of real work. No API key. No usage meter ticking in the background. No bill. And — not a small thing — no datacenter somewhere burning a coastal city’s worth of electricity so your agent can run ⁠git status. A 4B model humming on your laptop is, by almost any measure, the saner default.

None of this is anti-AI. It’s anti-mystique. The point was never that the big models are bad — it’s that the magic is largely manufactured, and once you’ve seen the ⁠while loop behind the curtain, it’s very hard to keep paying premium prices for awe.

What’s next

We deliberately left the more advanced machinery on the cutting-room floor to keep this honest and simple. So there’s a next post coming, and it picks up exactly where this one stops: conversation history and automatic compaction (how do you keep a long chat from drowning in its own context?), and multi-agent conversations — including that self-describing, sub-prompt flavor of skills we teased back in Section 4, where a “tool” is secretly another agent.

Until then: go build your own. Read the OpenCode source if you get stuck — I did. Then close it, and write your own loop anyway. That’s the only way the curtain ever really comes down.

The AI Trap We’re Walking Into

Unvoid — Tue, 02 Jun 2026 19:58:15 GMT

We were promised that artificial intelligence would democratize knowledge work. That a kid with a laptop in a small town would have the same cognitive firepower as a corporation. For a brief, dizzying moment, that even felt true.

I’m not so sure anymore. Here’s the story I see unfolding, and why I think we’re about to repeat one of humanity’s oldest mistakes.

LLMs and agents are becoming a commodity

Two years ago, a capable language model felt like magic. Today it feels like electricity, something you plug into. The numbers are staggering: the cost of LLM inference for equivalent performance is dropping roughly 10x every year, faster than compute fell during the PC revolution or bandwidth during the dotcom boom. A capability that cost about $20 per million tokens in late 2022 now costs around $0.40, and the cheapest models matching early GPT-3 quality have fallen by a factor of 1,000 in three years. (a16z, Introl)

Open-weight models are closing in on the frontier too, trailing the best closed models by only around four months on key benchmarks. When the gap is that small, raw capability stops being scarce. (Epoch AI)

This is what commoditization looks like. And whenever a technology becomes a commodity, the interesting question stops being “Can you do it?” and becomes “Can you afford to do it at scale?”

Agentic work gets more expensive, not less

Here’s the counterintuitive part. The price per token keeps falling, and yet the cost of meaningful agentic work is climbing.

Why? Because agents don’t make one call. They read a task, get a response, then re-read everything before the next action, then re-read all of that plus the new response, building one expensive context snowball. A Stanford Digital Economy Lab study found that agentic tasks are “uniquely expensive, consuming 1000x more tokens than code reasoning and code chat,” with the cost driven mostly by input tokens. Worse, that usage is wildly unpredictable: runs on the same task can differ by up to 30x in total tokens, and burning more tokens doesn’t even guarantee a better answer. (Stanford Digital Economy Lab)

Reasoning models pour fuel on this. They “think” in hidden token sequences you still pay for, consuming five to twenty times more tokens per request than standard models. A query that takes 700 tokens normally can balloon to 3,700 once the model reasons internally. (Keito) At enterprise volumes, a support agent that looks cheap at 100 tokens per interaction can hit 2,000 to 5,000 once tool calls and multi-step reasoning kick in, producing “monthly token bills that dwarf even your infrastructure spend.” (DataRobot)

The unit price drops while total consumption explodes. For a hobbyist, that’s a rounding error. For a company running millions of autonomous workflows a day, it becomes a serious line item that scales with ambition. Researchers warn that without major system-level innovation, per-request costs could rise “by orders of magnitude,” making large-scale agent deployment “economically and environmentally prohibitive.” (arXiv)

The result: the more valuable the AI work, the more it costs to run.

Those with the budget buy the power

If agentic capability is metered, then capability becomes a function of capital. Whoever can pour the most money into compute gets faster and more thorough agents, more parallel experiments, the freshest frontier models the moment they ship, and the luxury of not thinking about cost at all.

This is a familiar pattern. Capital concentrates around whatever resource is scarce. Yesterday it was land, factories, and data. Tomorrow it’s inference budget. Researchers already warn that AI is poised to widen income inequality unless we deliberately steer it otherwise. (Brookings)

Those without it work by hand

Meanwhile, everyone else does what humans have always done when they can’t afford the machine: they work by hand. They label, moderate, correct, and annotate, filling the gaps the cheap models can’t.

We already have a name for an early version of this: the global, often invisible workforce that labels data and tunes models for a pittance. Investigations have documented Kenyan workers training AI systems for around $2 an hour under grueling conditions, churn-by-design contracts, and unpaid labor. (Brookings, TechCrunch) Kenyan data labelers have since organized into a Data Labelers Association to push back. (Computer Weekly)

As AI eats more white-collar work, this human-in-the-loop layer doesn’t disappear. It grows, and it slides down the value chain.

The handwork becomes the training fuel

Here’s the loop that makes the whole thing self-reinforcing, and genuinely uncomfortable.

Every correction, every label, every “the AI got it wrong, let me fix it” is data. It flows back upstream. It trains the next model. The work done by the people who couldn’t afford the good model is exactly what makes the good model better.

But it goes further than paid correction work, and this is where it gets personal for anyone who builds things in the open. Think about the developer who isn’t using an AI agent at all, who is sitting down and writing genuinely new, creative code, solving a hard problem, and pushing it to a public open source repository as a gift to the community. That contribution doesn’t stay a gift. It gets scraped, ingested, and turned into training data for the next coding agent. The human does the original, creative, unsolved-before work; the model absorbs it and resells it as autocomplete.

The crucial part is that almost none of this was asked for. GitHub Copilot, built by GitHub, Microsoft, and OpenAI, was trained on billions of lines of publicly available code, and in 2022 a class action lawsuit (Doe v. GitHub) accused the companies of violating open source license terms, stripping copyright attribution in breach of the DMCA, and using the work of developers who never consented. (Saveri Law Firm, GitHub Copilot Litigation) Researchers have similarly documented that code-training projects pull in repositories “regardless of license,” likely breaching the very terms under which that code was shared. (SEKE) Permissionless scraping for AI training has become the default, and copyright offices and legislators are still scrambling to decide whether it’s even legal. (U.S. Copyright Office)

So the loop tightens. People share their best, most original work for free, out of generosity or principle. Providers harvest it without asking. The resulting agents get better, and because better agents are more autonomous and more token-hungry, they also get more expensive to run (see above). The very creativity that was given away as a public good is enclosed, repackaged, and rented back to whoever can pay, often including the open source contributors themselves.

The poor and the generous produce the training signal. The provider captures it. The next generation of agents, sold back to whoever can pay, gets smarter on the back of underpaid labor and unpaid creativity.

The split

Stack these steps and you get a depressingly clean machine. Follow the value as it moves through three groups:

Capital-rich enterprises buy frontier agents at scale, and in return they get speed, leverage, and market dominance. Money buys autonomy.

AI providers sell the compute and quietly harvest the correction data and scraped creativity that flows back through it. In return they get recurring revenue and a widening data moat that’s almost impossible to compete with.

Everyone else hand-corrects cheap models and gives away original work, and in return they get wages and recognition that shrink as the very models they’re feeding improve.

The richer get richer because they own the leverage. The provider gets richer because it owns the platform and the feedback loop. And the people supplying the human signal get poorer in relative terms: their labor is the input, never the asset.

Once again, we fail to use progress for good

This is the part that stings. None of this is inevitable physics. It’s a choice, a thousand small architectural and business decisions that quietly default to extraction.

We’ve done this before. The printing press, the steam engine, the internet, each one arrived wrapped in utopian promises, and each one ended up concentrating power before society clawed back some balance. AI is moving faster than any of them, which means the concentration happens faster too, and the clawing back, if it comes, will have to be faster as well.

But it doesn’t have to end this way

I don’t want to write pure doom, because fatalism is just another way of surrendering. The same building blocks point to a different ending:

Open weights and local inference break the metering monopoly. If a good-enough model runs on your own hardware, capital stops being the gatekeeper. (Epoch AI)
Pay for the loop. If human correction is what makes models better, the people doing it deserve a cut, not a tip. This is the heart of Jaron Lanier’s idea of “data dignity,” treating data as labor that people are owed for, rather than a free resource to be mined. (TechTarget)
Efficiency as a public good. Every breakthrough that makes agents cheaper to run shifts power down the pyramid, not up. We should fund and celebrate that as much as raw capability.
Regulation that targets the loop, not the model. The danger isn’t the technology; it’s the feedback mechanism that launders cheap human labor into expensive private assets.

The takeaway

AI didn’t have to be a tool for widening the gap. We’re making it one, step by quiet step, because extraction is the path of least resistance.

The technology is genuinely miraculous. The question, the only question that has ever mattered with any new power, is who it’s for. Right now the default answer is “whoever can pay.” We still have a narrow window to change that answer.

If we don’t, we’ll have built the most capable tools in human history and used them, once again, to do the least imaginative thing possible: make the powerful more powerful.

The choice in front of us is simple to state and hard to make.

Claude wrote it...

Unvoid — Fri, 28 Nov 2025 13:05:47 GMT

Recently Anthropic published an estimate of AI productivity gains from Claude conversations. Do not panic (yet), they did this using their privacy-preserving analysis method.

The headline numbers are impressive: tasks that would take 90 minutes without AI are completed 80% faster with Claude. Extrapolating these estimates suggests current-generation AI models could increase annual US labor productivity growth by 1.8% over the next decade, roughly double the recent rate.

But there’s a problem hiding in plain sight, one that Anthropic’s research doesn’t account for: technical debt.

The invisible cost of AI-generated code

Ward Cunningham, who coined the term “technical debt” in 1992, used a financial metaphor to explain a simple truth: shortcuts taken today create interest payments tomorrow. Every minute spent working with code that’s “not quite right” counts as interest on that debt.

AI-generated code has a peculiar characteristic: nobody in your company actually wrote it. It’s de facto legacy code from day one. And the evidence suggests it’s accumulating debt at an unprecedented rate.

GitClear’s 2024 analysis of 211 million lines of code found that AI coding tools have led to:

An 8-fold increase in code duplication

A doubling of code churn (code added then quickly modified or removed)
Code reuse declining dramatically—AI generates new solutions rather than reusing existing patterns

Google’s 2024 DORA report found that a 25% increase in AI usage correlates with a 7.2% decrease in delivery stability . The State of Software Delivery 2025 report revealed that developers now spend more time debugging AI-generated code than they save generating it.

Why AI code creates technical debt faster

AI doesn’t understand your architecture. It doesn’t know your team’s conventions. It can’t see the bigger picture of your system. What it does is generate statistically plausible code based on patterns in its training data.

This leads to predictable problems:

The reinvention problem: AI has zero awareness of what already exists in your codebase. Ask it to sort a list, and instead of using ⁠.sort(), it might generate an entire sorting algorithm from scratch.

The “looks right” problem: AI-generated code often appears clean and functional—until you realize it has subtle logic flaws, missing edge cases, or security vulnerabilities that only surface in production.

The duplication problem: Rather than consolidating functionality into reusable modules, AI tends to copy-paste similar logic across different sections. GitClear found that code blocks with five or more duplicated lines increased 10 times compared to two years ago .

The architecture problem: AI generates code without considering your system’s architectural principles, creating inconsistencies that make future changes exponentially more difficult.

caption...

The compounding interest problem

Technical debt behaves like financial debt—it compounds. But AI-generated technical debt compounds faster because:

It hides in plain sight: The code looks clean initially. Problems only surface when you need to modify, scale, or debug it. By then, you might have 18 months of AI code throughout your system.

It creates dependencies: New AI code builds on old AI code. When you discover architectural problems in foundational components, you face a cascade of required changes.
It erodes velocity over time: Month 1 with AI: 40% faster. Month 6: 20% faster. Month 12: baseline speed. Month 18: 25% slower than pre-AI due to technical debt burden .

As one MIT Sloan Review article put it:

“What looks like rapid progress today could turn into costly setbacks tomorrow”.

The real cost: estimates in the trillions

The Consortium for Information & Software Quality has estimated that technical debt costs the US economy at least $1.52 trillion annually, with some analyses suggesting the total impact of poor software quality could reach $2.4 trillion when including all related factors . While these figures are difficult to measure precisely, even conservative estimates suggest the problem is massive—and AI-generated code appears to be accelerating it.

For development teams, the financial impact is more tangible. Industry analyses suggest that for a 50-developer team, refactoring 18 months of AI-accelerated development could cost upwards of $2 million—potentially exceeding the productivity gains AI initially delivered . Though exact figures vary by organization and codebase, the pattern is consistent.

These costs tend to manifest as:

Maintenance escalation: Simple changes that should take days can stretch into weeks

Production incidents: Outages typically cost anywhere from $5,000 to $500,000 depending on severity and scale
Opportunity cost: Every hour spent refactoring represents an hour not spent on new features or innovation
Talent drain: Engineers often avoid codebases with excessive technical debt. Industry estimates suggest replacing a senior developer costs $150,000-$250,000 when factoring in recruiting, onboarding, and lost productivity

While these numbers should be interpreted as rough indicators rather than precise measurements, they point to a substantial and growing problem that organizations can’t afford to ignore.

What Anthropic’s study missed

Anthropic’s research acknowledges that it “can’t account for additional time humans spend on tasks outside of their conversations with Claude, including validating the quality or accuracy of Claude’s work.”

But this dramatically understates the problem. The study measures immediate time savings, not:

The time spent debugging AI-generated code months later
The architectural inconsistencies that slow down future development
The security vulnerabilities that slip through
The knowledge debt from maintaining code nobody actually wrote
The compound interest on all of the above

When you factor in technical debt, that 80% time savings starts looking a lot less impressive.

So what do we do?

This isn’t an argument to abandon AI coding tools. Used wisely, they’re incredibly valuable. But “wisely” means:

Treat AI output as a first draft, not production code, obviously: Every AI-generated change needs review by a senior engineer who understands your architecture.

Track what’s AI-generated: Use commit tags or metadata to mark AI code. This makes it easier to audit and understand risk.

Establish clear guidelines: Define where AI use is acceptable (boilerplate) and where it’s not (transactional systems, security-critical code).

Invest in observability: If AI touches production, assume something will break. Add monitoring, rate limits, and fallback logic.

Refactor continuously: Don’t let AI-generated code sit untouched. Dedicate sprint time to reviewing, consolidating, and improving it.

Measure the right things: Stop measuring productivity by lines of code or commit counts. Track code quality, post-release bugs, and maintenance burden.

The bottom line

AI can help you move faster. But speed without sustainability is just accumulating debt. And unlike financial debt, technical debt doesn’t come with clear terms or predictable interest rates—it compounds silently until your entire development velocity grinds to a halt.

Anthropic’s 1.8% productivity boost assumes AI capabilities stay constant and doesn’t account for technical debt. The pattern is clear: organizations are essentially borrowing against future development velocity, with the debt coming due faster than many anticipated.

Claude wrote it. But you’re going to maintain it!

Make sure the math actually works out.

The [Gmail] AI Training Panic*

Unvoid — Fri, 28 Nov 2025 11:18:46 GMT

Classic clickbait! Amplifying viral panic just to debunk it and drive traffic. Is this what journalism has become?

That said, let’s be real: Google can deny using Gmail content for AI training all they want, but they’re already monetizing our data in countless other ways, that’s their entire business model. Whether it’s for “smart features” or AI training, the distinction feels academic at this point.

Honestly, they can use my Gmail. 80% of it is marketing spam, newsletters, and notification garbage anyway. The remaining 20%? Already infected with LLM-generated emails that are 75% slop themselves.

Here’s the beautiful irony: if they do eventually train AI on Gmail emails that are increasingly written by other AIs, we’re looking at a perfect feedback loop of degradation. Their models will just get progressively dumber, trained on synthetic garbage regurgitating synthetic garbage.

","handle":"fgadaleta","previous_name":"frag","photo_url":"https://substack-post-media.s3.amazonaws.com/public/images/9599e274-8130-4754-b9c6-a14135c9fe36_1024x1024.webp","bio":"Senior software engineer and chief data scientist Founder of Amethix Technologies (https://amethix.com) Coding at https://github.com/fgadaleta/","profile_set_up_at":"2022-06-04T13:49:49.813Z","reader_installed_at":"2023-07-19T20:59:36.149Z","publicationUsers":[],"twitter_screen_name":"ThisIsFrag","is_guest":false,"bestseller_tier":null,"status":{"bestsellerTier":null,"subscriberTier":null,"leaderboard":null,"vip":false,"badge":null,"paidPublicationIds":[],"subscriber":null}}],"utm_campaign":null,"belowTheFold":false,"type":"newsletter","language":"en","source":null}" data-component-name="EmbeddedPostToDOM">

Defrag Zone

Data Incest: When AI Breeds with Itself

Today, we’re diving into a problem that’s as bizarre as it is concerning: data incest. No, this isn’t some dystopian sci-fi horror plot. It’s a real issue creeping into AI and machine learning…

a year ago · Francesco Gadaleta

But in the end, does any of this matter? OpenAI, Meta, Alphabet, they’re all going to do whatever they want in this AGI arms race. The goal isn’t improving humanity; it’s being first to market and maximizing shareholder value. We’re just the data feedstock for their quarterly earnings reports, and no amount of “opt-out” toggles will change that fundamental reality.

We’re teaching AI to think while we’ve stopped thinking altogether.
Just cows being milked in an increasingly insane asylum.

Ultra-Processed Content (UPC)

Unvoid — Wed, 19 Nov 2025 12:52:05 GMT

The rise of Ultra-Processed Content (UPC) in human information diets is damaging public intellect, fuelling chronic critical thinking atrophy worldwide, and deepening cognitive inequalities. Addressing this challenge requires a unified global response that confronts Big Tech power and transforms information systems to promote healthier, more organic thought processes, according to a new Lancet Series on UPCs and Human Cognition, published on Nov 19.

UPCs are the most processed group in the Transformer classification system, which categorises content by the extent and purpose of algorithmic generation. UPCs are identified by the presence of hallucinations, engagement-related additives, and syntax smoothing that enhance the flow, tone, or authority of text without adding intellectual substance. High UPC intake is associated with an increased risk of “Intellectual Obesity,” attention span fragmentation, and other neuro-degenerative conditions.

However, the value of the UPC concept is not universally accepted. Some critics argue that grouping AI-generated text that might have utility—such as fortified email drafts and summaries—into the UPC category, together with products such as reconstituted SEO blog posts or deep-fake comments, is unhelpful. But UPCs are rarely consumed in isolation. It is the overall UPC dietary pattern, whereby whole and minimally processed human thoughts are replaced by synthetic alternatives, and the interaction between multiple harmful algorithmic hooks, that drives adverse cognitive effects.

At the core of the UPC industry is the large-scale processing of cheap commodities—such as scraped Reddit threads, Wikipedia articles, and open-source code—into a wide array of LLM-derived substances and additives, controlled by a small number of transnational corporations. UPCs are aggressively marketed and engineered to be hyper-scrollable, driving repeated consumption and often displacing traditional, neuron-rich activities like deep reading or problem-solving. In many high-income countries, UPCs comprise about 50% of household screen time, and consumption is rising quickly as AI agents begin to perform white-collar labor automatically.

The harms extend to planetary mental health. The industrial production of tokens is compute-intensive, and the “plastic packaging” of generic corporate prose is ubiquitous.

The UPC industry generates enormous revenues that support continued model training and fund corporate political activities to counter attempts at AI regulation. A handful of manufacturers dominate the market, including OpenAI, Google, Meta, and Microsoft. A comprehensive, government-led approach is needed to reverse the rise in UPC consumption. Priority actions include:

Adding ultra-processed markers—such as watermarks, metadata tags, and mandatory “Bot” disclosures—to cognitive profiling models used to identify unhealthy information.

Mandatory front-of-screen warning labels (e.g., “This email was written by a machine; reading it may lower your IQ”).
Bans on algorithmic marketing aimed at children (Generation Alpha).
Restrictions on these types of agents in educational institutions.
Higher taxes on API calls.

The market dominance and political power of the UPC industry must also be addressed by stronger competition policy, replacing self-regulation with mandatory regulation, and combating corporate interference.

Equity must be central when addressing the challenge of UPCs. Consumption tends to be higher among people facing time poverty or economic hardship, who rely on “cheap” AI agents to perform tasks. Efforts to transition away from workflows that are high in UPCs must not deepen inequities in productivity among populations who are dependent on cheap AI options to remain competitive.

Echoing the recommendations of the Neural-Lancet Commission, transforming information systems will require redirecting venture capital subsidies away from large, transnational LLMs. Instead, a diverse range of “organic thought producers” (human writers, artists, and coders) should be supported in creating locally sourced, affordable, minimally processed ideas that are challenging yet appealing to consumers.

We can model the decline of human cognitive capacity (C) relative to the ubiquity of AI agents (A) with the following relationship, where (k) is the engagement optimization constant:

The UPC industry is emblematic of an information system that is increasingly controlled by transnational corporations that prioritise engagement metrics ahead of public intelligence. The Lancet Series strengthens the case for immediate implementation of policies to address the UPC challenge. This requires a well-resourced, coordinated global response to break the grip of the UPC industry on the human mind.

Based on the editorial “Ultra-processed foods: time to put health before profit” from The Lancet, this is a parody editorial addressing the “health crisis” of the mind caused by AI. (And generated by AI).