Schema.org is the shared vocabulary that lets any website declare what it is to any machine that comes looking. This page is a complete reference — the history, the data, the formats, the AI implications, and live JSON-LD for six entity types.
Every website serves two audiences. The human who reads it. The machine that processes it. Schema is the bridge between the two — a standardized vocabulary that lets a website declare what it is in a format any machine can read, parse, and reason from.
The technical format is JSON-LD — JavaScript Object Notation for Linked Data. It lives in the <head> of an HTML document as a script block. Invisible to human visitors. Structural to every machine that processes the page.
The vocabulary is schema.org — 827 types, 1,528 properties, covering every category of entity that exists on the web. Business name. Location. Hours. Services. Relationships to industry, geography, and adjacent entities. All of it typed, labeled, and machine-readable in a format that search engines and AI systems are built to read.
This page documents the standard as it exists — its origin, its data, its formats, its relationship to AI reasoning, and its implementation. The live examples at the bottom are production-ready JSON-LD for six entity types. Every source cited traces to a primary document.
In 2011, four competing companies — Google, Microsoft, Yahoo, and Yandex — agreed on a shared vocabulary for structured data on the web. They launched schema.org with 297 types and 187 properties. One standard. Every major search engine would accept it.
The three engineers who drove schema.org into existence were R.V. Guha at Google — who had previously created RSS and co-led the Cyc project — Dan Brickley, who had contributed to the Semantic Web project at W3C, and Steve Macbeth at Microsoft. Their work is documented in a 2016 paper published in ACM Communications. cacm.acm.org/practice/schema-org/
Within four years of launch, 31.3% of pages in the Google index carried schema.org markup. The vocabulary has grown continuously since. As of May 2026 it contains 827 types and 1,528 properties. The W3C issued JSON-LD 1.1 as a full Recommendation on 16 July 2020 — the highest level of endorsement the standards body issues.
W3Techs publishes usage statistics for structured data formats across the web, updated daily. As of March 29, 2026, 53.2% of all websites use JSON-LD — up from 18.1% in January 2018. JSON-LD is the dominant structured data format by adoption share.
The adoption statistics measure binary presence — JSON-LD exists on a page or it does not. Coverage depth against the full schema.org vocabulary for a given entity type is a separate measurement. The Global Data Registry measures both.
| Format | 2018 | 2020 | 2022 | 2024 | Jan 2026 | Mar 2026 |
|---|---|---|---|---|---|---|
| Open Graph | 36.0% | 46.1% | 60.1% | 65.5% | 69.8% | 70.3% |
| Twitter / X Cards | 19.3% | 29.8% | 46.7% | 50.1% | 55.3% | 56.0% |
| JSON-LD | 18.1% | 28.2% | 41.3% | 46.5% | 52.5% | 53.2% |
| Generic RDFa | 13.1% | 13.5% | 32.4% | 38.9% | 39.4% | 39.0% |
| Microdata | 13.1% | 15.3% | 21.5% | 24.7% | 23.1% | 22.7% |
| None | 55.1% | 44.3% | 30.3% | 25.1% | 21.7% | 21.4% |
These statistics measure binary presence — JSON-LD exists on a page or it does not. They do not measure coverage depth against the schema.org vocabulary. The Global Data Registry measures schema coverage as a percentage of the available vocabulary for a given entity type, scored against all 827 types and 1,528 properties.
Schema.org launched in 2011 supporting three implementation formats.
Microdata embeds structured markup directly inside HTML tags. The data layer and presentation layer share the same code.
RDFa — Resource Description Framework in Attributes — is the more expressive academic format, implemented through HTML attributes.
JSON-LD separates the structured data entirely from the visible HTML. Schema lives in a script block in the document head — a self-contained JSON object that declares the entity without touching front-end code. Google began recommending JSON-LD in 2014. The W3C issued it as a full Recommendation in 2020. It is the standard format across all major search and AI systems.
Search engines historically used schema as a ranking signal — structured data provided explicit signals that improved result accuracy. Large language models introduced a different relationship with structured data.
AI systems construct entity models. When a system processes a query about a business, a location, or a concept, it draws from an entity model built during training. Schema.org markup is one of the primary structural inputs into that model. Every declared property is a confirmed fact the system can reason from. Every declared relationship is an edge in the model — a connection between entities the system uses to reason about context.
The relationship between schema coverage and AI citation confidence is structural. A declared property eliminates inference. A declared relationship establishes a confirmed edge between entities. The registry measures schema coverage, mints entities with full provenance, and injects confirmed graph edges into the JSON-LD — giving AI systems a dense, falsifiable, provenanced record to reason from.
Schema coverage measures how many of the available properties for a given entity type are declared. A Dentist entity has a defined vocabulary — every applicable property that could be declared for that type. Coverage is the ratio of declared properties to available properties.
The registry scores every indexed entity against its full vocabulary. The score is a percentage — declared properties divided by the total applicable to that entity type.
Every indexed entity is scored against the full schema.org vocabulary for its type. Coverage depth measured against all applicable properties.
A structured intake process captures what the entity is. That information is normalized and generates a complete, vocabulary-aligned JSON-LD block targeting maximum coverage for the entity type.
The entity is run against the indexed entity graph. Confirmed relationships are baked into the schema — regulatory bodies, industry clusters, geographic markets — with provenance links.
The entity receives a permanent profile in the graph — machine-readable, with a full provenance chain, edge map, schema score, UUID, and ROOT-LD context pod.
Production-ready JSON-LD for six entity types. Each block targets maximum field coverage for the entity class. Copy directly into the <head> of your HTML inside a <script type="application/ld+json"> tag. Replace placeholder values with actual data.
Every claim on this page traces to a primary document. Source, author, URL, and date — declared for each reference.