Now accepting co-pilot engagements — Q2 2026

Training data
frontier AI labs
can actually trust.

The post-training data platform built for the world's most demanding AI labs. Security-first architecture. Retained credentialed experts. Contractual accountability. Every dataset fully traceable — because provenance is not a feature. It is the foundation.

20–22%
Take rate vs.
market standard 35%
100%
Client-isolated
cloud environments
SOC 2
Type II certified
before first contract
1,000+
Credentialed retained
domain experts
Built for the world's most demanding AI labs
Anthropic Google DeepMind Microsoft Amazon AI Nvidia

The AI training supply chain
has a structural problem.

Current vendors were built for growth, not for safety, compliance, or accountability. The result is mislabeled expert data, security architectures that make a single breach a multi-lab catastrophe, and labor practices that systematically produce unreliable evaluations.

01
The Security Architecture Failure
Shared infrastructure means one breach exposes every client simultaneously. The Mercor breach exposed training methodologies from competing labs in a single incident — an inevitable outcome of shared architecture, not bad luck.
Active Legal Exposure
02
The Expert Quality Failure
Non-experts hired and roles quietly changed after data delivery. Gig workers with no contracts until after work is produced. Compensation misrepresented as hourly when it is per-task at effective rates potentially below minimum wage.
Data Quality Risk
03
The Accountability Failure
No written SLAs. No financial penalties for quality failures or security incidents. No contractual obligation to meet the standards labs need. A $10B company with zero accountability infrastructure for the labs it serves.
Zero Contractual Remedy

Four pillars.
Every failure, eliminated.

Provenance AI was designed by asking one question: what does a training data platform need to look like for a compliance team, a security team, and a research team to all say yes simultaneously?

🔒
Hard Client Isolation
Every lab runs in a dedicated cloud tenant with separate accounts, isolated encryption keys, and zero shared infrastructure. A breach in one environment cannot physically reach another — not by policy, by architecture.
Separate AWS/GCP accounts per client — not VPCs, not subnets
Isolated HashiCorp Vault instances for encryption key management
SBOM-enforced dependency management — direct response to LiteLLM-style supply chain attacks
Annual third-party isolation audit delivered independently to each lab
🎓
Retained Credentialed Experts
Domain experts on 6–12 month retainer contracts with verified credentials, explicit version-controlled rubrics, and guaranteed minimum compensation. Not gig workers. Retained professionals who produce better data because the platform is built to reward quality.
Credentials verified before task assignment — license checks, background verification
Background checks valid 30 days maximum — renewal is a hard deployment gate
$25/hour minimum effective compensation floor — transparent, locked at offer
Expert credential manifest delivered with every dataset
📊
Continuous QA Engine
Every evaluation scored for inter-rater reliability before delivery. 5% gold standard injection to detect quality drift. Monthly calibration sessions per domain. Live quality dashboard per client. Problems caught before they reach you — not after.
Cohen's Kappa IRR thresholds enforced per task type — contractual minimum
Version-controlled rubrics with explicit acceptance criteria, pre-task
Weekly quality reports to each lab — real-time metrics, no surprises
Full data provenance: every evaluation logged, reproducible, auditable
📋
Contractual Accountability
Written SLAs with financial penalties for quality failures, security incidents, and missed turnarounds. SOC 2 Type II before the first contract. Every promise is a contract term — not a vendor claim that evaporates when something goes wrong.
99.5% uptime SLA — 10% invoice credit per 0.1% breach
24-hour standard task turnaround — financial penalty for breach
Security incident penalty: 90-day credit + independent audit at vendor cost
20–22% take rate locked for 3 years — $6-22M annual savings per lab

Eight structural failures.
Three with legal exposure.

These are documented contractor complaints — presented not as labor grievances but as data quality signals. The conditions under which a human evaluates an AI output directly determine the reliability of that evaluation.

Documented Failure Data Quality & Legal Risk to Labs Risk Type
Deceptive compensation: $35/hr advertised, $20/task actual
Effective hourly rate $13–27/hr. Self-selects for workers who tolerate exploitation, not workers who produce reliable evaluations.
Active class action lawsuit, 40,000+ workers. FLSA wage claims, FTC deceptive practices, state labor statute violations. Joint employer doctrine may extend liability to contracting labs.
Legal
Non-experts hired, role updated post-data-delivery
Expert-level evaluation tags applied to non-expert work. Labs receive mislabeled data with false provenance records.
Labs making performance claims based on "expert-evaluated training data" may have legally unsupported claims if credentials were not verified at production time. EU AI Act audit risk.
Legal
Pre-contract deployment + 90-day-stale background checks
Workers produce data before contracts are signed. IP assignment and confidentiality terms not yet executed at time of production.
IP ownership of pre-contract data is legally uncertain. Labs may not hold clean title to datasets. HIPAA-adjacent risk for medical domain tasks. Stale checks create undisclosed compliance failure.
Legal
"Feels good" quality threshold — no written rubrics
Contractors chase an undefined target, revising work against a standard never specified in writing before the task begins.
Vague rubrics produce inconsistent evaluations. For Constitutional AI alignment work, rubric ambiguity is directly incompatible with principled evaluation. Adds noise, not signal, to training data.
Quality
Unpaid wait time during review cycles
Contractors complete work and wait days for approval with no compensation for availability or re-engagement time.
Disengaged, financially stressed evaluators produce measurably lower inter-rater reliability scores. The cognitive state of a resentful worker is reflected in evaluation quality.
Quality
AI resume screening excludes genuine domain experts
Non-standard expert profiles — retired clinicians, former federal judges, specialist engineers — filtered out by automated screener.
The most valuable evaluators for frontier model training are often people whose careers look unusual. Mercor's screener selects against exactly the people labs most need.
Quality
Feast-or-famine scheduling; top experts leave
Dry spells of weeks with no work, then sudden surges requiring rushed completion. Best experts find stable income elsewhere.
Platform systematically loses its most experienced evaluators. Dataset consistency is impossible when expert pool turns over constantly. Reproducibility — a core research requirement — is structurally unachievable.
Financial
No structured onboarding; new workers produce live data
Contractors navigate undocumented rules for weeks before earning reliably. The learning curve is entirely self-directed and unpaid.
Data produced during orientation period is the lowest-quality data in any batch. It cannot be retroactively identified or filtered. Labs receive it labeled identically to data from fully calibrated experts.
Quality
RB
RB
Rebecca Bell
Founder & CEO · Provenance AI
JD Corporate Law · Phoenix School of Law
IRS Enrolled Agent · Active Federal Credential
Certified Mediator · State of Arizona
Computer Science: AI · Harvard University
Former Director · KPMG Advisory
$45M P&L · 200+ Person Global Teams

Not an AI researcher.
Something rarer.

"I am not an AI researcher. I am something the AI training data market has been missing: an operator. My background is why this gets built correctly."

I spent 20 years building complex, high-stakes professional services operations — the kind where institutional trust, contractual accountability, and compliance were non-negotiable. Full P&L responsibility for a $45M organization. 200+ person global teams. Three years at KPMG advising government and enterprise clients on AI integration, technology risk, and compliance infrastructure.

The reason I built Provenance AI is not that I had a breakthrough AI idea. It is that I watched Mercor build a $10B company with brilliant growth and no operational governance — and recognized exactly what was missing. I have spent 20 years building the thing they never had.

The JD in corporate law, the Enrolled Agent credential, and the mediator certification are not decorative. They are why every contract term we offer is enforceable, every governance structure we design is legally sound, and every client relationship we build is structured to last.

JD Corporate Law IRS Enrolled Agent Certified Mediator · AZ Harvard CS: AI KPMG Advisory $45M P&L Responsibility 200+ Person Teams M&A Integration

Built around your standards.
Not ours.

The co-pilot model means your lab's security requirements, evaluation standards, and compliance needs shape the platform architecture before a single line of production code is written.

1
Week 1–2
Co-pilot architecture review
Your security and research teams review and approve the platform architecture before deployment. Your requirements are the design specification — not a compliance checklist added afterward. Each lab's specific pain points are written into the technical specifications as mandatory requirements.
30-minute conversation → term sheet
2
Week 2–4
Rubric co-design with your research team
Evaluation criteria are explicit, version-controlled, and written to your quality standards before any expert sees an assignment. For Anthropic: Constitutional AI alignment principles. For DeepMind: research-grade data cards meeting publication standards. For Microsoft: Responsible AI Standard six-principle mapping.
Lab-specific calibration sessions
3
Week 3–6
Expert pool credentialing to your specifications
Domain experts credentialed to your requirements — license verification, current background checks (30-day validity maximum), calibration sessions, and gold standard practice tasks. No expert produces live data until they have demonstrated rubric comprehension above the minimum IRR threshold.
Zero pre-credential live data production
4
Month 2–3
Paid pilot with full provenance report
An 8–10 week pilot in one domain at the 20% take rate. Every evaluation logged with expert credentials, rubric version, IRR score, and timestamp. You receive a full data provenance report alongside the dataset — the complete audit trail of every evaluation.
Cost + 20% margin · Pre-agreed success criteria
5
Month 4+
Primary contract on your terms
12-month primary agreement at 20–22% take rate with written SLAs, financial penalties for breach, client isolation guarantee, and Lab Advisory Council participation. Equity stake and board observer rights for co-pilot labs. Pricing locked for 3 years regardless of market changes.
$6–22M annual savings vs. current market

Tailored to your culture.
Isolated from every other.

Each co-pilot lab shapes the platform through their own dedicated engagement. No lab ever sees another lab's data, rubrics, contracts, or methodologies — by architecture, not by promise.

For
Anthropic
Safety-first architecture. Your RSP becomes the evaluation standard — not a reference document.
  • Constitutional AI principles written into rubric design
  • RSP safety tier specifications define the security architecture
  • Safety researchers participate in rubric calibration sessions
  • Full audit trail on every Constitutional AI evaluation task
For
Google DeepMind
Research-grade methodology. Your scientific standards become the platform's documentation requirements.
  • Data cards meeting Datasheets for Datasets standards — every delivery
  • Full reproducibility: every dataset exactly reconstructible from audit trail
  • Multimodal expert pools built to DeepMind's credentialing specifications
  • Research co-authorship opportunities on methodology papers
For
Microsoft
Enterprise compliance by design. SOC 2, GDPR, and FedRAMP-readiness built to your specifications from day one.
  • SOC 2 Type II certification before first contract — non-negotiable
  • EU data residency by design for GDPR obligations
  • FedRAMP readiness on Year 2 roadmap — unlocks government AI contracts
  • Responsible AI Standard six principles mapped to evaluation criteria

The market charges 35%.
We charge 20–22%.

At $50M annual training volume, that difference is $6–7.5M per lab, per year. Across Anthropic, Google DeepMind, and Microsoft simultaneously, the aggregate savings exceed $50M annually.

Current Market Standard
Mercor / Scale AI
35
% take rate
Shared infrastructure — single breach exposes all clients
Gig labor with feast-or-famine scheduling and deceptive compensation
No written SLAs — no financial consequences for failure
Unverified expert credentials; roles changed post-data-delivery
Active class action lawsuit affecting 40,000+ workers
Scale AI now partially owned by Meta — direct conflict of interest
Provenance AI
Security-first platform
20–22
% take rate
Separate cloud tenant per client — physically impossible to cascade a breach
Retained experts on 6–12 month contracts with guaranteed minimum hours
Written SLAs with financial penalties for quality, security, and turnaround failures
Credentials verified before assignment — expert manifest with every delivery
SOC 2 Type II certified pre-contract — not promised post-contract
No equity relationship with any competing lab — structurally neutral
Ready to talk
The window to build
this correctly is
open right now.

Co-pilot status costs 90 days of engagement and a pilot contract. The next Mercor-style incident costs far more. Request a briefing — 30 minutes, no procurement process, direct conversation with the founder.

Available for briefings — Anthropic · Google DeepMind · Microsoft · Amazon · Nvidia