Topic

Data

Training sets, consent, scraping, and the political economy of the input.

Every frontier model was built on top of human writing, art, code, and conversation that nobody asked permission to take. The legal frame is still being written. The political frame is older: enclosure, the conversion of common resources into private property without compensation. If your work trained the model, you have a claim on the output. This topic covers data provenance, scraping disputes, opt-outs, training transparency, and the case for a data dividend.

Columns (2)

  1. Primitive Accumulation for the LLM Age

    The scraping of the public web was not a bug. It was the opening move of a classic enclosure. A column on how the training corpus became private property.

  2. Who Owns the Weights

    Model weights are the factory floor of the 21st century, built from scraped public labour and held as private property. A column on the enclosure of intelligence.

Contradictions (1)

Claim
AI is built to free humanity from drudgery.
Reality
It is trained on the unpaid labour of everyone who has ever written a sentence online, and its first deployment is to fire the person who writes the next one.

Demands (2)

  1. 01

    A data dividend. If a model was trained on your words, your face, or your voice, you own a share of the output.

  2. 02

    Open training data, public audits. Nothing trained in the dark deserves to decide anything in the light.

Other topics