Every entity in the registry carries a UUID, a timestamp, a source URL, and a content hash. The API surfaces that record — structured, provenanced, traversable — for AI training pipelines, research datasets, and knowledge graph construction.
The registry mints entities at scale. Domains measured against the full schema vocabulary, timestamped, and provenanced. Every field carries a traceable origin.
Schema scores. Content fingerprints. Semantic matches across 42 language dictionaries. Temporal attestation. Discovery timestamps. Full recrawl history. Every entity a complete structured record.
Common edges — what entities share. Uncommon edges — what they don't. Substrate connections, dimensional relationships. Edges discovered through measurement. Nothing manually tagged.
Scores calculated against the full 916-type vocabulary. Cross-referenced with language and geography. Empirical coverage patterns across the web, by industry and region.
Every recrawl preserved. Previous passes archived, never overwritten. Temporal attestation for training on web change patterns. A record of what a domain was, not only what it is.
Three-layer linked data structure — Anchor, Body, Recursive — with full provenance, timestamped passes, and dimensional context. Open specification at root-ld.org.
Every entity has a manifest.json. Structured data without HTML parsing. Optimized for crawler ingestion, RAG retrieval, and training pipeline integration.
The registry is built for operators who need provenance-declared, machine-readable data at scale.
Citation-grounded web data with full provenance. Every entity traced to its source URL and crawl timestamp. Temporal attestation for tracking how information changes. Semantic fingerprints across 42 languages. Structured, falsifiable knowledge — not noisy crawl data.
Schema adoption across languages and geographies. Linguistic bias analysis. Knowledge graph evolution over time. The only web dataset measuring knowledge organization at the structural layer — beneath language, beneath keywords.
Pre-structured entities with active graph edges. Traverse from any entry point. Common and uncommon edges surface connections invisible to keyword search. Manifests enable direct structured data fetching. Build domain-specific graphs from the registry's foundation.
API design is active. Response format: JSON-LD with ROOT-LD wrapper. Machine-readable, traversable, provenanced. Early access partners help shape the final specification.
The specification is in development. Early access is being coordinated with frontier AI companies, research institutions, and infrastructure builders. To request access or discuss terms, reach out directly.