Exa is a modern AI search engine with SERP API, website crawler tools, and deep research API. Power your app with web search AI and web crawling API.
Introducing Exa's People Search Benchmarks

Introducing Exa's People Search Benchmarks

Introducing Exa's People Search Benchmarks

tldr: We built a new state of the art people search. Powerful search over people is critical for recruiting or sales. It's also just super valuable to know who's out there.

To get there, we end to end trained our retrieval system for people search and built an ingestion pipeline for 1B+ people and 50M+ updates per week.

To test people search, we created a high quality synthetic eval set based on patterns in search traffic. We're open-sourcing the eval harness and a portion of the dataset, so that others can also test. We're releasing:

  1. A role-based people search dataset (1,400 queries)
  2. An evaluation harness to reproduce results on your own system

Benchmark results comparing Exa's new people indices against alternatives will be published alongside the release.

People Search Evaluation

What Users Actually Search For

We analyzed 10,000 historical people queries from exa.ai and clustered them to understand real usage patterns. Three dominant categories emerged:

  1. Specific individual lookup — name plus optional affiliation ("Jane Smith at Acme Corp")
  2. Role-based search at companies — executive positions at named organizations ("CFO at TechStartup Inc")
  3. Skill or role-based discovery — finding candidates by expertise and location ("senior payroll specialists in San Diego")

This shaped our benchmark design: we needed to test both targeted lookup (can you find a specific person?) and discovery (can you return relevant candidates matching role/geo/skill criteria?).

Data Generation

Targeted lookup queries

We sampled real executives and founders across four company strata:

StratumExample Source
VC-funded startupsFounders with public profiles
Small-cap public (<$2B)Executives from SEC filings
Mid-cap ($2–10B)C-suite and VP-level roles
Large-cap (>$10B)Senior leadership

For each person, we generated synthetic queries matching how users might search for them. A sample entry:

{
  "query": "VP Operations at [Redacted Manufacturing Co]",
  "name": "[Redacted]",
  "role": "VP Operations",
  "company": "[Redacted Manufacturing Co]",
  "market_cap_tier": "mid_cap",
  "linkedin_url": "[redacted]"
}

Crucially, we selected individuals who have verifiable public profiles but aren't famous enough for LLMs to know from pre-training. This ensures we're testing retrieval, not memorization.

Role-based discovery queries

We used Claude Opus 4.5 to generate a structured taxonomy:

  • Industries: fintech, healthcare, logistics, etc.
  • Roles per industry: mapped to seniority levels (IC, manager, director, VP, C-suite)
  • Geographies: regions (EMEA, APAC), countries, states, and major cities per industry vertical (SF/Seattle for tech, Boston for biotech)
  • Modifiers: years of experience, specific skills

This produced queries like:

{
  "query": "mid-level onboarding specialists based in San Diego",
  "role_function": "customer_success",
  "role_seniority": "ic",
  "geo_name": "San Diego",
  "geo_type": "city"
}

Evaluation Methodology

Targeted lookup: Standard retrieval metrics—recall@k and NDCG. The ground truth is the person's online profile.

Role-based discovery: Since there's no single correct answer, we use an LLM judge. For each result, we fetch the page content and verify whether it matches the query criteria. A search for "senior software engineer in SF" that returns 9 matching profiles and 1 person in San Diego scores 90%.

Try it yourself