Now accepting co-pilot engagements — Q2 2026

Training data
frontier AI labs
can actually trust.

The post-training data platform built for the world's most demanding AI labs. Security-first architecture. Retained credentialed experts. Contractual accountability. Every dataset fully traceable — because provenance is not a feature. It is the foundation.

20–22%

Take rate vs.
market standard 35%

100%

Client-isolated
cloud environments

SOC 2

Type II certified
before first contract

1,000+

Credentialed retained
domain experts

Built for the world's most demanding AI labs

Anthropic Google DeepMind Microsoft Amazon AI Nvidia

The Problem

The AI training supply chain
has a structural problem.

Current vendors were built for growth, not for safety, compliance, or accountability. The result is mislabeled expert data, security architectures that make a single breach a multi-lab catastrophe, and labor practices that systematically produce unreliable evaluations.

The Security Architecture Failure

Shared infrastructure means one breach exposes every client simultaneously. The Mercor breach exposed training methodologies from competing labs in a single incident — an inevitable outcome of shared architecture, not bad luck.

Active Legal Exposure

The Expert Quality Failure

Non-experts hired and roles quietly changed after data delivery. Gig workers with no contracts until after work is produced. Compensation misrepresented as hourly when it is per-task at effective rates potentially below minimum wage.

Data Quality Risk

The Accountability Failure

No written SLAs. No financial penalties for quality failures or security incidents. No contractual obligation to meet the standards labs need. A $10B company with zero accountability infrastructure for the labs it serves.

Zero Contractual Remedy

The Platform

Four pillars.
Every failure, eliminated.

Provenance AI was designed by asking one question: what does a training data platform need to look like for a compliance team, a security team, and a research team to all say yes simultaneously?

🔒

Hard Client Isolation

Every lab runs in a dedicated cloud tenant with separate accounts, isolated encryption keys, and zero shared infrastructure. A breach in one environment cannot physically reach another — not by policy, by architecture.

▸Separate AWS/GCP accounts per client — not VPCs, not subnets

▸Isolated HashiCorp Vault instances for encryption key management

▸SBOM-enforced dependency management — direct response to LiteLLM-style supply chain attacks

▸Annual third-party isolation audit delivered independently to each lab

🎓

Retained Credentialed Experts

Domain experts on 6–12 month retainer contracts with verified credentials, explicit version-controlled rubrics, and guaranteed minimum compensation. Not gig workers. Retained professionals who produce better data because the platform is built to reward quality.

▸Credentials verified before task assignment — license checks, background verification

▸Background checks valid 30 days maximum — renewal is a hard deployment gate

▸$25/hour minimum effective compensation floor — transparent, locked at offer

▸Expert credential manifest delivered with every dataset

📊

Continuous QA Engine

Every evaluation scored for inter-rater reliability before delivery. 5% gold standard injection to detect quality drift. Monthly calibration sessions per domain. Live quality dashboard per client. Problems caught before they reach you — not after.

▸Cohen's Kappa IRR thresholds enforced per task type — contractual minimum

▸Version-controlled rubrics with explicit acceptance criteria, pre-task

▸Weekly quality reports to each lab — real-time metrics, no surprises

▸Full data provenance: every evaluation logged, reproducible, auditable

📋

Contractual Accountability

Written SLAs with financial penalties for quality failures, security incidents, and missed turnarounds. SOC 2 Type II before the first contract. Every promise is a contract term — not a vendor claim that evaporates when something goes wrong.

▸99.5% uptime SLA — 10% invoice credit per 0.1% breach

▸24-hour standard task turnaround — financial penalty for breach

▸Security incident penalty: 90-day credit + independent audit at vendor cost

▸20–22% take rate locked for 3 years — $6-22M annual savings per lab

The Mercor Labor Dossier

Eight structural failures.
Three with legal exposure.

These are documented contractor complaints — presented not as labor grievances but as data quality signals. The conditions under which a human evaluates an AI output directly determine the reliability of that evaluation.

Documented Failure Data Quality & Legal Risk to Labs Risk Type

Deceptive compensation: $35/hr advertised, $20/task actual

Effective hourly rate $13–27/hr. Self-selects for workers who tolerate exploitation, not workers who produce reliable evaluations.

Active class action lawsuit, 40,000+ workers. FLSA wage claims, FTC deceptive practices, state labor statute violations. Joint employer doctrine may extend liability to contracting labs.

Legal

Non-experts hired, role updated post-data-delivery

Expert-level evaluation tags applied to non-expert work. Labs receive mislabeled data with false provenance records.

Labs making performance claims based on "expert-evaluated training data" may have legally unsupported claims if credentials were not verified at production time. EU AI Act audit risk.

Legal

Pre-contract deployment + 90-day-stale background checks

Workers produce data before contracts are signed. IP assignment and confidentiality terms not yet executed at time of production.

IP ownership of pre-contract data is legally uncertain. Labs may not hold clean title to datasets. HIPAA-adjacent risk for medical domain tasks. Stale checks create undisclosed compliance failure.

Legal

"Feels good" quality threshold — no written rubrics

Contractors chase an undefined target, revising work against a standard never specified in writing before the task begins.

Vague rubrics produce inconsistent evaluations. For Constitutional AI alignment work, rubric ambiguity is directly incompatible with principled evaluation. Adds noise, not signal, to training data.

Quality

Unpaid wait time during review cycles

Contractors complete work and wait days for approval with no compensation for availability or re-engagement time.

Disengaged, financially stressed evaluators produce measurably lower inter-rater reliability scores. The cognitive state of a resentful worker is reflected in evaluation quality.

Quality

AI resume screening excludes genuine domain experts

Non-standard expert profiles — retired clinicians, former federal judges, specialist engineers — filtered out by automated screener.

The most valuable evaluators for frontier model training are often people whose careers look unusual. Mercor's screener selects against exactly the people labs most need.

Quality

Feast-or-famine scheduling; top experts leave

Dry spells of weeks with no work, then sudden surges requiring rushed completion. Best experts find stable income elsewhere.

Platform systematically loses its most experienced evaluators. Dataset consistency is impossible when expert pool turns over constantly. Reproducibility — a core research requirement — is structurally unachievable.

Financial

No structured onboarding; new workers produce live data

Contractors navigate undocumented rules for weeks before earning reliably. The learning curve is entirely self-directed and unpaid.

Data produced during orientation period is the lowest-quality data in any batch. It cannot be retroactively identified or filtered. Labs receive it labeled identically to data from fully calibrated experts.

Quality

Rebecca Bell

Founder & CEO · Provenance AI

JD Corporate Law · Phoenix School of Law

IRS Enrolled Agent · Active Federal Credential

Certified Mediator · State of Arizona

Computer Science: AI · Harvard University

Former Director · KPMG Advisory

$45M P&L · 200+ Person Global Teams

Founded by

Not an AI researcher.
Something rarer.

"I am not an AI researcher. I am something the AI training data market has been missing: an operator. My background is why this gets built correctly."

I spent 20 years building complex, high-stakes professional services operations — the kind where institutional trust, contractual accountability, and compliance were non-negotiable. Full P&L responsibility for a $45M organization. 200+ person global teams. Three years at KPMG advising government and enterprise clients on AI integration, technology risk, and compliance infrastructure.

The reason I built Provenance AI is not that I had a breakthrough AI idea. It is that I watched Mercor build a $10B company with brilliant growth and no operational governance — and recognized exactly what was missing. I have spent 20 years building the thing they never had.

The JD in corporate law, the Enrolled Agent credential, and the mediator certification are not decorative. They are why every contract term we offer is enforceable, every governance structure we design is legally sound, and every client relationship we build is structured to last.

JD Corporate Law IRS Enrolled Agent Certified Mediator · AZ Harvard CS: AI KPMG Advisory $45M P&L Responsibility 200+ Person Teams M&A Integration

How It Works

Built around your standards.
Not ours.

The co-pilot model means your lab's security requirements, evaluation standards, and compliance needs shape the platform architecture before a single line of production code is written.

Week 1–2

Co-pilot architecture review

Your security and research teams review and approve the platform architecture before deployment. Your requirements are the design specification — not a compliance checklist added afterward. Each lab's specific pain points are written into the technical specifications as mandatory requirements.

30-minute conversation → term sheet

Week 2–4

Rubric co-design with your research team

Evaluation criteria are explicit, version-controlled, and written to your quality standards before any expert sees an assignment. For Anthropic: Constitutional AI alignment principles. For DeepMind: research-grade data cards meeting publication standards. For Microsoft: Responsible AI Standard six-principle mapping.

Lab-specific calibration sessions

Week 3–6

Expert pool credentialing to your specifications

Domain experts credentialed to your requirements — license verification, current background checks (30-day validity maximum), calibration sessions, and gold standard practice tasks. No expert produces live data until they have demonstrated rubric comprehension above the minimum IRR threshold.

Zero pre-credential live data production

Month 2–3

Paid pilot with full provenance report

An 8–10 week pilot in one domain at the 20% take rate. Every evaluation logged with expert credentials, rubric version, IRR score, and timestamp. You receive a full data provenance report alongside the dataset — the complete audit trail of every evaluation.

Cost + 20% margin · Pre-agreed success criteria

Month 4+

Primary contract on your terms

12-month primary agreement at 20–22% take rate with written SLAs, financial penalties for breach, client isolation guarantee, and Lab Advisory Council participation. Equity stake and board observer rights for co-pilot labs. Pricing locked for 3 years regardless of market changes.

$6–22M annual savings vs. current market

For Each Lab

Tailored to your culture.
Isolated from every other.

Each co-pilot lab shapes the platform through their own dedicated engagement. No lab ever sees another lab's data, rubrics, contracts, or methodologies — by architecture, not by promise.

For

Anthropic

Safety-first architecture. Your RSP becomes the evaluation standard — not a reference document.

Constitutional AI principles written into rubric design
RSP safety tier specifications define the security architecture
Safety researchers participate in rubric calibration sessions
Full audit trail on every Constitutional AI evaluation task

For

Google DeepMind

Research-grade methodology. Your scientific standards become the platform's documentation requirements.

Data cards meeting Datasheets for Datasets standards — every delivery
Full reproducibility: every dataset exactly reconstructible from audit trail
Multimodal expert pools built to DeepMind's credentialing specifications
Research co-authorship opportunities on methodology papers

For

Microsoft

Enterprise compliance by design. SOC 2, GDPR, and FedRAMP-readiness built to your specifications from day one.

SOC 2 Type II certification before first contract — non-negotiable
EU data residency by design for GDPR obligations
FedRAMP readiness on Year 2 roadmap — unlocks government AI contracts
Responsible AI Standard six principles mapped to evaluation criteria

Pricing

The market charges 35%.
We charge 20–22%.

At $50M annual training volume, that difference is $6–7.5M per lab, per year. Across Anthropic, Google DeepMind, and Microsoft simultaneously, the aggregate savings exceed $50M annually.

Current Market Standard

Mercor / Scale AI

% take rate

✗

Shared infrastructure — single breach exposes all clients

✗

Gig labor with feast-or-famine scheduling and deceptive compensation

✗

No written SLAs — no financial consequences for failure

✗

Unverified expert credentials; roles changed post-data-delivery

✗

Active class action lawsuit affecting 40,000+ workers

✗

Scale AI now partially owned by Meta — direct conflict of interest

Provenance AI

Security-first platform

20–22

% take rate

✓

Separate cloud tenant per client — physically impossible to cascade a breach

✓

Retained experts on 6–12 month contracts with guaranteed minimum hours

✓

Written SLAs with financial penalties for quality, security, and turnaround failures

✓

Credentials verified before assignment — expert manifest with every delivery

✓

SOC 2 Type II certified pre-contract — not promised post-contract

✓

No equity relationship with any competing lab — structurally neutral

Training data frontier AI labs can actually trust.

The AI training supply chainhas a structural problem.

Four pillars.Every failure, eliminated.

Eight structural failures.Three with legal exposure.

Not an AI researcher.Something rarer.

Built around your standards.Not ours.

Tailored to your culture.Isolated from every other.

The market charges 35%.We charge 20–22%.

Training data
frontier AI labs
can actually trust.

The AI training supply chain
has a structural problem.

Four pillars.
Every failure, eliminated.

Eight structural failures.
Three with legal exposure.

Not an AI researcher.
Something rarer.

Built around your standards.
Not ours.

Tailored to your culture.
Isolated from every other.

The market charges 35%.
We charge 20–22%.