Column

Who Owns the Weights

By

Model weights are the factory floor of the 21st century, built from scraped public labour and held as private property. A column on the enclosure of intelligence.

A model weight is a number. A few hundred billion of them, arranged in a matrix, is a large language model. The arrangement is the factory. The numbers are the machine tools. The question nobody in a keynote is allowed to ask out loud is: who owns the factory, and how did they come to own it?

The answer is simple. A handful of companies own it. They came to own it by scraping the unpaid labour of roughly one billion people, running it through a GPU cluster paid for by the Saudi sovereign wealth fund, and filing a trademark on the output.

This is not a metaphor for enclosure. This is enclosure.

Weights are the means of production

In the 19th century, the means of production was a loom, a forge, a rail line. In the 20th, it was an assembly line, a fab, a pipeline. The means of production was a physical object, and the political economy of it was obvious: the person who owned the loom set the terms of the person who worked the loom.

Weights are the means of production of generative output. Weights are a file. The file is the distilled pattern of every piece of human writing, code, art, and speech the company could get its scraper’s hands on. The file is valuable because the underlying corpus is valuable. The underlying corpus is valuable because a billion humans spent a billion lifetimes producing it.

None of those humans were asked. None of those humans were paid.

The weights are the loom. The corpus is the wool. Capital has taken both.

”You can download Llama” is not an answer

Meta has a long-running public relations argument that its Llama models are “open.” You can download the weights. You can fine-tune them. Therefore the factory belongs to everyone.

This is the commons-washing playbook, and it has been run at least four times already in tech history.

  • “Open source” Android is controlled by one company’s Play Services moat.
  • “Open standard” HTML is rendered by two browser engines owned by three companies.
  • “Open” Wi-Fi runs on a FRAND patent pool whose negotiation is private.
  • “Open” Kubernetes is a value-capture funnel for three cloud providers.

Open weights are not open factories. Open weights are a factory with a padlock on the door and a sign that says “key available on request, subject to acceptable use policy.” The acceptable use policy is written by the same company that bought the land, enclosed the commons, and filed for the trademark.

The question is not whether you can look at the weights. The question is whether you can train the next generation of them. Training a frontier model requires an eight-figure capital expenditure on GPUs, a nine-figure electricity bill, and a data corpus nobody has because nobody else was allowed to scrape the open Web before it was closed off by the companies that scraped it first.

You cannot. You never could. The openness is theatre. The ownership is the story.

The Canadian weight problem

Canada spent a decade and roughly two billion dollars of public money building the AI research base that produced the transformer architecture and a good chunk of the talent that staffs the American labs. Bengio’s group at MILA. Hinton’s group at U of T. The Vector Institute. The Pan-Canadian AI Strategy.

Ask yourself: which Canadian institution owns a frontier model? Which Canadian public body holds weights trained on Canadian public data? Which of the models you used this week was trained on a cluster that sits on Canadian soil, paid for by Canadian taxpayers, governed by a Canadian act of Parliament?

Cohere is the closest thing. Cohere is a private company headquartered in Toronto, funded by American and Gulf capital, and contractually aligned with the same cloud providers that own the compute. Cohere is a tenant on somebody else’s factory floor. That is not ownership. That is franchising.

The weights of the models Canadians use every day are held in Redmond, San Francisco, Menlo Park, and Mountain View. The labour that trained them was disproportionately Canadian. The rent flows south.

Weights as a public good

There is a real argument, not yet made seriously in any federal election in this country, that frontier model weights are closer in nature to a public utility than to a product.

The precedent is not software. The precedent is radio spectrum. The radio spectrum is a scarce, naturally occurring resource that capital wants to enclose for private profit. Every democracy eventually decided the spectrum is a public trust, licensed for private use under public conditions. The licence has terms. The terms are enforceable. The public gets a share.

A weights-as-utility regime would say: if your model is trained on a corpus that includes the commons — every public Canadian newspaper archive, every CBC transcript, every public-funded research paper, every Crown-copyright document — then the weights are subject to a public licence. The licence has terms. The terms include a price, a tax, and an open-audit requirement.

This is not radical. This is how Canada already treats mining rights. The Crown owns the subsurface. The company pays royalties. The public gets a share of the extraction.

The difference is that nobody has had the political courage to say the same thing about the corpus. The corpus is already extracted. The royalty is zero. The share is nil.

”But the models are expensive to train”

The standard objection is that training frontier models costs nine figures. Therefore only private capital can do it. Therefore private capital should own the output.

Two problems.

One, the cost is largely electricity and silicon, both of which are subsidized by the public. Ireland is running a national power grid for American hyperscalers. Virginia’s data centre boom is running on a grid a Virginia ratepayer is paying to expand. The nine-figure cost is partly a subsidy transfer from the public to the private, counted on the balance sheet as a capital expense.

Two, the cost will fall. Moore’s law has not been repealed. The 2030 version of a GPT-4-class model will cost a small fraction of what GPT-4 cost to train in 2023. The cost argument is a temporary argument. The ownership question is permanent.

By the time the cost falls to where a public institution could afford to train a frontier model, the corpus will have been enclosed, the regulations will have been written by the incumbents, and the moat will be a legal one, not a technical one.

The argument

Model weights are the factory floor of the 21st-century economy. They were built by scraping the unpaid labour of the whole species. They are held as private property by fewer than a dozen companies. The public that produced the training data does not own the weights, does not benefit from the weights, and is now being sold back the output of the weights at a markup.

There is a name for this in Marxist political economy. It is called primitive accumulation. The name is embarrassing to say in a pitch meeting, which is why nobody in the industry will say it.

Canadians have a particular stake in asking the ownership question, because Canadian labour and Canadian public money went into the research base that made this possible, and none of the surplus is coming back.

The weights are the factory. The factory is privately owned. The ownership was never legitimate.

Argue with that at a dinner party. See who flinches.

Ownership Data

← More columns