Last updated
Perplexity's API is a black box. Ask a question, get an LLM-generated answer, hope the sources are right. Exa is a search engine you control: neural search, domain filtering, full page content, highlights, people/company/code search. An independent customer eval confirmed Exa is more accurate.
Perplexity recently launched a Search API (POST /search) that returns raw, ranked web results (titles, URLs, content snippets, and dates) without LLM processing. This is a genuine product, priced at $5/1K (same as Exa), and it gives developers raw results they can feed into their own pipelines. Credit where it's due: it's a real search endpoint, not just the Sonar chat API with citations stripped out.
On a 500-query real-world evaluation by Thinking Machines, Exa scored 64.8% vs Perplexity's 60.1%. The gap shows up most on semantic queries where keyword overlap isn't enough. A query like "startups using computer vision for agriculture" retrieves by meaning on Exa, not just keyword matching.
Perplexity's Sonar models (the other API surface) add LLM synthesis on top of search, returning generated answers with citations. That's useful when you want a finished answer. But if you need to control what happens between retrieval and generation (re-ranking results, feeding specific pages into your own prompt, debugging which source led to a bad answer), you need the raw results, not a synthesis you can't decompose.
Exa's highlights extract the passages from each page that are most relevant to your specific query. Search for two different questions that return the same URL, and Exa returns different passages. This scored +10% on RAG evaluations compared to raw text retrieval because the LLM gets precisely the context it needs.
Perplexity's Search API returns content snippets, up to 4,096 tokens per page. These are substantial (not the 160-character Google snippets), but they're not query-aware extractions. The same URL returns the same content regardless of what you asked. For a RAG pipeline, the difference is between feeding your LLM a tailored excerpt and feeding it a generic page extract that may or may not contain the answer to the actual question.
For cost-sensitive applications, highlights typically output a few hundred tokens per page vs Perplexity's 4,096 token default. If you're running 100K searches per day and feeding results into an LLM, that's a significant difference in downstream token costs. And Exa's highlights give your model better signal, not just less noise.
Query: "What are Claude's rate limits?" "Claude API rate limits vary by tier: Free tier allows 50 messages/day. Pro users get 5x higher limits. API rate limits are 4,000 requests per minute for Claude 3.5 Sonnet." [Only relevant passages extracted]
Query: "What are Claude's rate limits?" "Claude is an AI assistant made by Anthropic... [company overview] [feature list] [pricing tiers] [rate limits buried in paragraph 12] [more content]..." [Static excerpt, same regardless of query]
Exa supports up to 1,200 include/exclude domains per query. Perplexity's Sonar API caps domain filtering at 20 domains. The Search API accepts a domain filter array but doesn't document a limit; it may be higher, but it's not specified. Either way, Exa's 1,200-domain limit is documented and covers most enterprise allow/block lists.
Beyond domain filtering, Exa has content keyword filtering (require or exclude specific terms in the page body, not just the URL), published date and crawl date filtering independently, location filtering, and category filters for companies, people, news, and papers. Perplexity's Search API offers date filtering (both recency and exact date ranges, a solid set of options), language filtering, country filtering, and academic/SEC search modes. Perplexity's date filtering is actually good: five filter options including exact date ranges and last-updated filtering.
Where Exa pulls ahead: content keyword filtering (Perplexity has none), category-based search (Perplexity has none), and the sheer scale of domain filtering for enterprise compliance use cases.
Exa indexes 1B+ LinkedIn profiles with 50M+ updates per week. Set category: "people" and search for "machine learning engineers in Berlin with experience at autonomous driving companies." You get structured profile data, not web pages that happen to mention those keywords.
Company search works the same way. Exa maintains a dedicated company index benchmarked against ~800 queries that deliberately target Series A/B startups, sub-500-employee companies, and regional players across EU, APAC, and LATAM. The benchmark covers named lookups, attribute filtering (industry, geography, founding year, employee count), funding queries, and composite multi-constraint searches. The dataset was designed so LLMs can't answer from pre-training knowledge alone, testing actual retrieval.
Perplexity has no people search and no company search. Neither the Search API nor Sonar offers category-based vertical search. If your agent needs to find candidates, research companies, or enrich CRM data, you need a separate provider alongside Perplexity. With Exa, it's the same API and the same billing.
// People search - 1B+ LinkedIn profilesconst people = await exa.search("machine learning engineers in Berlin with autonomous driving experience",{ category: "people", numResults: 10 });// Company search - dedicated indexconst companies = await exa.search("Series A healthtech startups with < 100 employees",{ category: "company", numResults: 10 });
Exa charges a flat rate per request: $5 per 1,000 searches, $1 per 1,000 content retrievals. You know what you'll pay before you make the call.
Perplexity has two pricing models. The Search API is $5/1K, flat, same as Exa, no complaints there. The Sonar models layer request fees that scale with "context depth" ($5–$14 per 1,000 requests depending on model) plus per-token charges for input ($1–$3/M tokens) and output ($1–$15/M tokens). A typical Sonar query costs $0.006–$0.013 total when you add it all up. At scale, modeling the total Sonar bill requires estimating average query length, response length, and context depth tier, none of which are fixed.
If you're using only Perplexity's Search API, pricing is straightforward and comparable to Exa. If you're using Sonar (which is what most Perplexity marketing pushes), the variable pricing makes budgeting harder. Exa's pricing is two line items regardless of which features you use: search calls and content calls.
Exa returns full page text. The entire document, with no token cap. From there, you choose how to use it: pass the whole thing to your LLM, use highlights to extract relevant passages, or use summaries with custom direction to compress further. The choice is yours per query.
Perplexity's Search API returns content snippets capped at 4,096 tokens per page (10,000 total token budget across all results). That's a meaningful amount of content (far more than Google's 160-character snippets), but it's still a cap. If the answer is in paragraph 40 of a long page, the snippet might not include it. And you can't request the full page to check.
For most simple queries, 4,096 tokens is enough. For complex research tasks, legal document analysis, or long-form content extraction, the cap becomes a constraint. Exa's full content endpoint has no such limit, and the highlights feature means you usually don't need the full page anyway; you get the relevant passages in a few hundred tokens.
const result = await exa.getContents([url], {
text: true,
highlights: true,
maxAgeHours: 24
});
// Full page text: unlimited
// Highlights: query-relevant passages
// Livecrawl: guaranteed fresh contentPOST /search
{
"query": "...",
"max_tokens_per_page": 4096,
"max_tokens": 10000
}
// Per-page cap: 4,096 tokens
// Total budget: 10,000 tokens
// No livecrawl equivalentExa Code searches 1B+ webpages and extracts token-succinct code examples relevant to your query. In testing, it eliminated more code hallucinations than other popular context sources.
LLMs lack current knowledge about millions of libraries, APIs, and SDKs. They hallucinate parameter names, invent deprecated methods, and confidently produce code that doesn't compile. Exa Code returns real code examples from documentation and repositories, typically in a few hundred tokens rather than entire doc pages.
Perplexity has no code-specific search or extraction. You could ask Sonar a coding question and get an LLM-synthesized answer, but that answer has the same hallucination problem you're trying to solve. The point of code search is to ground the LLM in real, current documentation, which requires returning actual code snippets, not another LLM's interpretation of them.
const codeResults = await exa.search("how to authenticate with Stripe API in Node.js",{ type: "code" });// Returns token-succinct code from real documentation:// "const stripe = require('stripe')('sk_test_...');// const session = await stripe.checkout.sessions.create({// payment_method_types: ['card'],// line_items: [{ price: 'price_...', quantity: 1 }],// mode: 'payment',// success_url: 'https://example.com/success'// });"
| Feature | Exa | Perplexity API |
|---|---|---|
Neural/semantic search | Yes | No |
Raw search results (URLs + content) | Yes | Yes (Search API) |
LLM-generated answers | Yes (/answer) | Yes (Sonar) |
Domain filtering | Up to 1,200 | Available (limit undocumented for Search API; 20 for Sonar) |
Category search (people, company) | Yes | No |
Max results | 100 | 10 default (max undocumented) |
Date filtering | Published + crawl date | Recency, exact dates, last-updated |
Content keyword filtering | Yes | No |
Academic/SEC search modes | No | Yes |
Multi-query batching | No | Yes (up to 5) |
| Feature | Exa | Perplexity API |
|---|---|---|
Full page text retrieval | Yes (no cap) | Snippets (4,096 tokens/page default) |
Query-dependent highlights | Yes (+10% RAG accuracy) | No |
Summaries with custom direction | Yes | No |
Structured output (JSON Schema) | Yes | Yes |
Livecrawl control | Yes (maxAgeHours) | No |
| Feature | Exa | Perplexity API |
|---|---|---|
Own search index | Yes (neural + BM25) | Claims own index ("continuously refreshed") |
Pricing model | Flat per-request | Search API: flat; Sonar: per-request + per-token |
MCP support | Yes (2M+ sessions/month) | No |
SDKs | Python, JavaScript | Python SDK + OpenAI-compatible |
Code search | Yes (Exa Code) | No |
Trusted by the fastest-growing AI companies
Perplexity has two APIs: a Search API for raw web results ($5/1K) and Sonar models for LLM-generated answers (variable pricing). Exa returns ranked URLs, full page content, query-dependent highlights, and structured data. Exa also has neural search, people/company search, code search, and 1,200-domain filtering that Perplexity lacks.
Yes. Perplexity launched a Search API that returns raw ranked results (titles, URLs, content snippets, dates) without LLM processing, at $5/1K requests. It's comparable to Exa's search endpoint, though Exa adds neural search, full page content (not capped snippets), and query-dependent highlights.
Perplexity describes a "continuously refreshed index" for its Search API. For the Sonar models, Perplexity has historically used Serper and Brave for retrieval. The extent to which Perplexity's index is independent vs augmented by third-party providers is not fully documented.
Perplexity's Search API returns content snippets capped at 4,096 tokens per page (10,000 total). Exa returns full page text with no cap, plus query-dependent highlights and summaries with custom direction.
Exa offers zero data retention, enterprise SLAs, 1,200-domain filtering, and dedicated people/company search. Exa's flat per-request pricing means no token-cost surprises at scale.
Exa supports up to 1,200 include/exclude domains. Perplexity's Sonar caps at 20; the Search API accepts domain filters but doesn't document a limit. For enterprise use cases with content policies, Exa's documented 1,200 is the safer bet.
Exa's /answer endpoint does the same thing: AI answers with citations. The difference is Exa also gives you raw search results, so you can build your own answer generation and control the prompt and sources yourself.
Exa's server-side latency is <200ms vs Perplexity's ~300ms. If your agent chains 10+ searches per task, that gap adds up.
No. Exa's MCP server handles 2M+ sessions/month and works with Claude, Cursor, and other AI tools natively.
Yes. Exa indexes 1B+ LinkedIn profiles (50M+ updates/week) and has a dedicated company search index benchmarked against ~800 queries. Neither Perplexity's Search API nor Sonar has an equivalent.
Exa Code searches 1B+ webpages for token-succinct code examples and documentation. It's designed to reduce LLM hallucinations in coding tasks. Perplexity has no code-specific search.
Perplexity's Sonar models are strong for getting quick LLM-generated answers with citations in one call. Perplexity also has academic and SEC filing search modes, multi-query batching (up to 5 queries per request), and a Python SDK with OpenAI-compatible interface. If you want a ready-made answer and don't need to control the retrieval or generation steps, Sonar works well.