API — Structured Access to the Graph

Data Types

Six data types.
All provenance-backed.

The registry mints entities at scale. Domains measured against the full schema vocabulary, timestamped, and provenanced. Every field carries a traceable origin.

01 · Entity Records

Schema scores. Content fingerprints. Semantic matches across 42 language dictionaries. Temporal attestation. Discovery timestamps. Full recrawl history. Every entity a complete structured record.

02 · Graph Edge Data

Common edges — what entities share. Uncommon edges — what they don't. Substrate connections, dimensional relationships. Edges discovered through measurement. Nothing manually tagged.

03 · Schema Scores & Gap Measurements

Scores calculated against the full 916-type vocabulary. Cross-referenced with language and geography. Empirical coverage patterns across the web, by industry and region.

04 · Timestamped Crawl Archives

Every recrawl preserved. Previous passes archived, never overwritten. Temporal attestation for training on web change patterns. A record of what a domain was, not only what it is.

05 · ROOT-LD & Recursive-LD

Three-layer linked data structure — Anchor, Body, Recursive — with full provenance, timestamped passes, and dimensional context. Open specification at root-ld.org.

06 · Machine-Readable Manifests

Every entity has a manifest.json. Structured data without HTML parsing. Optimized for crawler ingestion, RAG retrieval, and training pipeline integration.

Access

Three use cases.
One infrastructure layer.

The registry is built for operators who need provenance-declared, machine-readable data at scale.

Frontier AI Companies

Training data that can be verified.

Citation-grounded web data with full provenance. Every entity traced to its source URL and crawl timestamp. Temporal attestation for tracking how information changes. Semantic fingerprints across 42 languages. Structured, falsifiable knowledge — not noisy crawl data.

Research Institutions

Empirical datasets for falsifiable work.

Schema adoption across languages and geographies. Linguistic bias analysis. Knowledge graph evolution over time. The only web dataset measuring knowledge organization at the structural layer — beneath language, beneath keywords.

Infrastructure Builders

The foundation layer for RAG and knowledge graphs.

Pre-structured entities with active graph edges. Traverse from any entry point. Common and uncommon edges surface connections invisible to keyword search. Manifests enable direct structured data fetching. Build domain-specific graphs from the registry's foundation.

Properties

What the data carries
that other sources don't.

Full Provenance

Discovery timestamp, mint timestamp, source URL, content hash, recrawl history. The origin of every field is structural — not a claim appended after the fact.

Falsifiable Measurements

Schema scores calculated against the full 916-type vocabulary. Semantic fingerprints run across 42 language dictionaries. Every number has a methodology. No black box. No opaque scoring.

Deterministic Edge Discovery

Common and uncommon edges form from accumulated measurements across the corpus. No manual tagging. No subjective classification. Relationships emerge from the data.

Temporal Attestation

Entities recrawl on schedule. Every pass generates a new timestamped record. Previous data archives in entity folders, never overwritten. A record of what a domain was across time — not only what it is today.

Multilingual by Design

42 language dictionaries. Semantic overlap patterns across language families. Linguistic bias measurements built into schema scoring from the first pass. Knowledge organization measured at the structural layer — beneath language, beneath keywords.

Technical Specifications

Endpoints.
Specification in progress.

API design is active. Response format: JSON-LD with ROOT-LD wrapper. Machine-readable, traversable, provenanced. Early access partners help shape the final specification.

Method

Endpoint

Description

GET /entities Query by domain, TLD, schema score, language matches

GET /entities/{id} Full entity record — manifest, ROOT-LD, folder contents, recrawl history

GET /entities/{id}/edges All graph edges for an entity — common, uncommon, dimensional

GET /graph/edges Query edges by type, confidence score, entity pairs

GET /schema/scores Schema score distribution by industry, geography, language

GET /manifests/{domain} Fetch entity manifest.json directly by domain — no HTML parsing required

GET /rootld/{id} ROOT-LD context pod — Anchor, Body, Recursive layers with full provenance

GET /search Search across all minted entities by keyword, schema type, or edge

Response Format

JSON-LD

ROOT-LD wrapper. Traversable. Provenanced. Machine-readable without HTML parsing.

Authentication

API Key · OAuth

Research, commercial, and enterprise tiers. Rate limits set per engagement during early access.

Status

In Development

Early access open. Specification shaped with first partners. Contact us to engage.

Structured access
to the graph.

Six data types.
All provenance-backed.

Three use cases.
One infrastructure layer.

What the data carries
that other sources don't.

Endpoints.
Specification in progress.

API access is open.

Structured accessto the graph.

Six data types.All provenance-backed.

Three use cases.One infrastructure layer.

What the data carriesthat other sources don't.

Endpoints.Specification in progress.

API access is open.

Structured access
to the graph.

Six data types.
All provenance-backed.

Three use cases.
One infrastructure layer.

What the data carries
that other sources don't.

Endpoints.
Specification in progress.