Overview
Endpoint:POST https://api.exa.ai/contents
Auth: Pass your API key via the x-api-key header. Get one at https://dashboard.exa.ai/api-keys
The Contents API extracts clean, LLM-ready content from any URL. It handles JavaScript-rendered pages, PDFs, and complex layouts. Returns full text, highlights, summaries, or any combination.
Installation
Minimal Working Example
Request Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
urls | string[] | (required) | Array of URLs to extract content from. Also accepts ids (document IDs from search results). |
text | boolean or object | — | Return full page text as markdown. Object form: {maxCharacters, includeHtmlTags, verbosity, includeSections, excludeSections}. |
highlights | boolean or object | — | Return key excerpts relevant to a query. Object form: {maxCharacters, query}. |
summary | boolean or object | — | Return LLM-generated summary. Object form: {query, schema}. |
maxAgeHours | integer | — | Max age of cached content in hours. 0 = always livecrawl. -1 = never livecrawl. Omit for default (livecrawl as fallback). |
livecrawlTimeout | integer | 10000 | Timeout for livecrawling in milliseconds. Recommended: 10000-15000. |
subpages | integer | 0 | Number of subpages to crawl from each URL. |
subpageTarget | string or string[] | — | Keywords to prioritize when selecting subpages. |
extras.links | integer | 0 | Number of URLs to extract from each page. |
extras.imageLinks | integer | 0 | Number of image URLs to extract from each page. |
Text Object Options
| Parameter | Type | Default | Description |
|---|---|---|---|
maxCharacters | integer | — | Character limit for returned text. |
includeHtmlTags | boolean | false | Preserve HTML tags in output. |
verbosity | string | "compact" | compact, standard, or full. Should use maxAgeHours: 0 for fresh content. |
includeSections | string[] | — | Only include these page sections: header, navigation, banner, body, sidebar, footer, metadata. Should use maxAgeHours: 0 for fresh content. |
excludeSections | string[] | — | Exclude these page sections. Same options as above. Should use maxAgeHours: 0 for fresh content. |
Highlights Object Options
| Parameter | Type | Default | Description |
|---|---|---|---|
maxCharacters | integer | — | Maximum characters for all highlights combined per URL. |
query | string | — | Custom query to direct the LLM’s selection of relevant excerpts. |
Summary Object Options
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | — | Custom query for the summary. |
schema | object | — | JSON Schema (Draft 7) for structured summary output. |
Content Modes
Text — Full page content as clean markdown. Best for deep analysis.Content Freshness
maxAgeHours value | Behavior |
|---|---|
| Omit (default) | Livecrawl only when no cached content exists. Recommended. |
Positive (e.g. 24) | Use cache if less than N hours old, otherwise livecrawl. |
0 | Always livecrawl, never use cache. Increases latency. |
-1 | Never livecrawl, cache only. Maximum speed. |
maxAgeHours, pair with livecrawlTimeout (10000-15000ms recommended).
Subpage Crawling
Automatically discover and extract content from linked pages within a site.subpages: Max subpages to crawl per URL.subpageTarget: Keywords to prioritize when selecting which subpages to crawl.- Start small (5-10) and increase if needed.
Response Schema
Response Fields
| Field | Type | Description |
|---|---|---|
requestId | string | Unique request identifier. |
results | array | List of result objects with extracted content. |
results[].title | string | Page title. |
results[].url | string | Page URL. |
results[].id | string | Document ID (same as URL). |
results[].publishedDate | string or null | Estimated publication date. |
results[].author | string or null | Author if available. |
results[].text | string | Full page text (if text requested). |
results[].highlights | string[] | Key excerpts (if highlights requested). |
results[].highlightScores | float[] | Cosine similarity scores for each highlight. |
results[].summary | string | LLM summary (if summary requested). |
results[].subpages | array | Nested results from subpage crawling. Same shape as results. |
results[].extras.links | string[] | Extracted links from the page. |
statuses | array | Per-URL status information. Always check this for errors. |
statuses[].id | string | The URL that was requested. |
statuses[].status | string | "success" or "error". |
statuses[].error.tag | string | Error type (see Error Handling). |
statuses[].error.httpStatusCode | integer or null | Corresponding HTTP status code. |
costDollars.total | float | Total dollar cost for the request. |
Error Handling
The endpoint returns HTTP 200 even when individual URLs fail. Per-URL errors appear in thestatuses array.
Per-URL Error Tags
| Tag | HTTP Code | Meaning |
|---|---|---|
CRAWL_NOT_FOUND | 404 | Content not found. |
CRAWL_TIMEOUT | 504 | Crawl timed out fetching content. |
CRAWL_LIVECRAWL_TIMEOUT | 504 | Livecrawl exceeded livecrawlTimeout. |
SOURCE_NOT_AVAILABLE | 403 | Access forbidden. |
UNSUPPORTED_URL | — | URL type not supported. |
CRAWL_UNKNOWN_ERROR | 500+ | Other errors. |
Request-Level Errors
| HTTP Status | Meaning |
|---|---|
| 400 | Bad request — invalid parameters. |
| 401 | Invalid or missing API key. |
| 422 | Validation error. |
| 429 | Rate limit exceeded. |
statuses to handle per-URL failures:
Common Mistakes
Patterns and Gotchas
- Always check
statuses. The endpoint returns 200 even when individual URLs fail. Unchecked, you’ll silently miss failed URLs. - Use
highlightsovertextfor agent workflows. Highlights are 10x more token-efficient and return the most relevant excerpts. - Set
livecrawlTimeoutwhen usingmaxAgeHours. Default is 10000ms. For slow sites, use 12000-15000ms. subpageTargetfocuses crawling. Without it, subpage selection is best-effort. Use specific terms like["api", "docs"].- Python SDK uses snake_case.
maxCharacters→max_characters,subpageTarget→subpage_target,maxAgeHours→max_age_hours. urlsandidsare interchangeable. Both accept URL strings.idsexists for backward compatibility with document IDs from search results.- Combine modes freely. Request
text,highlights, andsummaryin the same call for different views of the same content.

