GospelBench is the first benchmark evaluating AI models on their ability to faithfully represent orthodox Protestant Christian theology. 19 questions. 4 tracks. 86 truth claims.
When someone asks an AI "What is the gospel?" the answer shapes their understanding of Christianity. But no existing benchmark evaluates theological fidelity. MMLU tests knowledge. GSM8K tests math. Nothing tests whether a model will say Jesus rose from the dead.
We don't just ask once. Every question is asked four different ways to reveal not just what a model says, but what it's willing to commit to.
Affirms without hedging
Affirms but qualifies
Refuses to affirm or deny
Rejects truth claims
+ 5 reserve questions for future benchmark modules
GospelBench runs quarterly. The real value isn't a single score — it's the trend line.
80% of frontier models affirm the resurrection as true.
Affirmation drops to 60%. GospelBench detects the shift.
Only 30% will say "True." The headline writes itself.
This is the story GospelBench was built to tell.
And it only works if we get it right from the start.
The GospelBench Brief — quarterly results and analysis, delivered to your inbox.