The Exa Team
December 17, 2025

tldr: We built a new state of the art people search. Powerful search over people is critical for recruiting or sales. It's also just super valuable to know who's out there.
To get there, we end to end trained our retrieval system for people search and built an ingestion pipeline for 1B+ people and 50M+ updates per week.
To test people search, we created a high quality synthetic eval set based on patterns in search traffic. We're open-sourcing the eval harness and a portion of the dataset, so that others can also test. We're releasing:
Benchmark results comparing Exa's new people indices against alternatives will be published alongside the release.

We analyzed 10,000 historical people queries from exa.ai and clustered them to understand real usage patterns. Three dominant categories emerged:
This shaped our benchmark design: we needed to test both targeted lookup (can you find a specific person?) and discovery (can you return relevant candidates matching role/geo/skill criteria?).
We sampled real executives and founders across four company strata:
| Stratum | Example Source |
|---|---|
| VC-funded startups | Founders with public profiles |
| Small-cap public (<$2B) | Executives from SEC filings |
| Mid-cap ($2–10B) | C-suite and VP-level roles |
| Large-cap (>$10B) | Senior leadership |
For each person, we generated synthetic queries matching how users might search for them. A sample entry:
{
"query": "VP Operations at [Redacted Manufacturing Co]",
"name": "[Redacted]",
"role": "VP Operations",
"company": "[Redacted Manufacturing Co]",
"market_cap_tier": "mid_cap",
"linkedin_url": "[redacted]"
}Crucially, we selected individuals who have verifiable public profiles but aren't famous enough for LLMs to know from pre-training. This ensures we're testing retrieval, not memorization.
We used Claude Opus 4.5 to generate a structured taxonomy:
This produced queries like:
{
"query": "mid-level onboarding specialists based in San Diego",
"role_function": "customer_success",
"role_seniority": "ic",
"geo_name": "San Diego",
"geo_type": "city"
}Targeted lookup: Standard retrieval metrics—recall@k and NDCG. The ground truth is the person's online profile.
Role-based discovery: Since there's no single correct answer, we use an LLM judge. For each result, we fetch the page content and verify whether it matches the query criteria. A search for "senior software engineer in SF" that returns 9 matching profiles and 1 person in San Diego scores 90%.