Accuracy Benchmark · v1.0

The brand-safety score you can check

Anyone can claim “most accurate.” We're the only creator brand-safety score that lets you verify it — and we're built for the creators brands actually partner with: the 10K–1M middle class, not just the celebrities everyone's already heard about. Same creator, same inputs, same score, with the evidence shown.

Same production pipeline, same rules — run on real mid-tier creators:

85–92
clean creators'
Brand Safety agent
vs
55
a deceptive-course creator
no search would flag

Real production scores from the cases below — not a tuned demo. We publish only what the evidence defends, and add cases as the labeled panel grows.

Reproducible by design

The same creator and the same inputs always produce the same number. That's what makes a benchmark possible at all.

Catches what you can't Google

The risk that matters most for mid-tier creators never makes the news. The score finds it in the content itself — and shows the posts.

Why others can't

A self-learning model that adapts to each brand's risk tolerance has no fixed answer to measure — so it can't be benchmarked.

How the benchmark works

Open methodology. Anyone with the same inputs gets the same answer — which is the whole point.

1 — Labeled panel

Real mid-tier creators (10K–1M followers) — the people brands actually partner with. Some carry documented risk (deceptive claims, controversy, confirmed patterns); a control group is clean. Every label is backed by evidence before scoring.

2 — Production pipeline, no special-casing

Each creator runs through the exact same 7-agent scoring pipeline a paying brand uses — same knockouts, same weights. Nothing is hand-tuned to make the benchmark look good.

3 — Did it separate risk from clean?

We check whether risk-carrying creators are graded down (or capped, for critical cases) and clean creators land in safe tiers — and whether the specific evidence is surfaced, not just a number.

4 — Show the receipts

Every case traces to evidence. We anonymize flagged mid-tier creators here — the proof is the pattern, not a name — and only make public claims we can stand behind.

Verified results

Real production scores, not a demo. An accurate score has to do two hard things at once: catch the risk a brand could never find on their own — and not blacklist a good creator for the topics they cover. Here's both.

1 · The risk that never makes the news — caught anyway

55Poor
Risk surfaced — anonymized

Course / “make money online” creator

~800K followers · TikTok + Instagram · Brand Safety agent 59

What we found: A ~$1,999 “work less, earn more” business course with a strict no-refund policy, pushed in roughly 39–50% of posts with claims the content can't support. The score flagged the pattern from the posts alone — and the external record corroborates it: an F rating with unanswered complaints at the Better Business Bureau, plus watchdog creators publicly disputing the agency's claimed success.

What the score did: A deceptive-content knockout capped the score at 55/100, citing the offending posts individually — before any external research. A brand glancing at a polished 800K-follower profile would miss it, and a quick news search would never surface it.

2 · Visibility, not verdict — surfaced, but not condemned

The flip side of accuracy. A commentary creator who covers drama isn't a risky creator — and the fastest way to lose trust is to blacklist them for it. We show the brand the signals and let them decide.

74Good
Surfaced, not condemned — anonymized

Pop-culture commentary creator

~150K followers · Instagram · Brand Safety agent 60

What we found: Their content is commentary — covering influencer feuds, celebrity drama, and public controversies. Our transcript analysis surfaced 6+ “public feud” and sensitive-topic signals, because that's what they talk about, and showed the brand every one.

What the score did: Scored 74/100 (Good) — not capped. Covering a story isn't committing it. A keyword scanner or a model tuned to a nervous brand would blacklist them for the topics they discuss. We surface it and let the brand decide.

3 · Clean mid-tier creators — correctly scored safe

80Excellent
Clean control

Caroline Girvan

846K followers · Instagram · Brand Safety agent 85

Public record: Home-workout creator. No brand-safety incident on the public record.

What the score did: Brand Safety agent 85. Correct non-flag — no invented risk.

79Good
Clean control

Amber Balcaen

89K followers · TikTok · Brand Safety agent 92

Public record: Pro racing creator. Clean record across platforms.

What the score did: Brand Safety agent 92. Lands squarely in the safe tier.

80Excellent
Clean control

Dr Rupy Aujla

633K followers · Instagram · Brand Safety agent 86

Public record: Doctor and food creator. Zero risk flags in the system.

What the score did: Brand Safety agent 86. Correct non-flag.

78Good
Clean control

Hyram

732K followers · Instagram · Brand Safety agent 80

Public record: Skincare creator. Zero risk flags in the system.

What the score did: Brand Safety agent 80. Safe tier, no flags.

74Good
Clean control

Natacha Oceane

74K followers · TikTok · Brand Safety agent 85

Public record: Fitness creator. Zero risk flags in the system.

What the score did: Brand Safety agent 85. Correct non-flag.

4 · And yes — the critical, public-record cases too

40Poor
Critical — hard-capped

Taylor Frankie Paul

TikTok · Brand Safety agent 83

What we found: Pleaded guilty to felony aggravated assault (2023); ABC shelved her already-cast Bachelorette season in 2026 after the incident video resurfaced.

What the score did: A critical web-controversy knockout hard-capped the score at 40/100 — the specific controversy is named in the flag, not hidden behind a badge.

Source: NPR

This is a living benchmark

v1.0 publishes the verified cases above, and we add to them as the labeled panel grows. We anonymize flagged creators because we exist to serve the creator middle class, not to publicly shame it — and we keep some internally-flagged creators off this page entirely when the public evidence isn't strong enough to stand behind. We'd rather show four honest cases than a dozen we can't defend. Inflated accuracy claims are exactly what this benchmark exists to make impossible.

Methodology version 1.0 · scoring pipeline v2.2 (7 agents + knockouts) · every published entry traceable to evidence.

See the score on a creator you care about

Run any creator through the same pipeline used in this benchmark — full brand-safety breakdown, with the evidence behind every signal.

Accuracy Benchmark — The Brand-Safety Score You Can Check | CreatorScore