Skip to main content
Websets is an asynchronous search system that finds, verifies, and enriches web results against your criteria. It handles complex multi-step queries that would take hours manually — but getting the most out of it requires understanding how to write good queries, criteria, and enrichments. Recommended: Try our Coding Agent Quickstart — get a working webset in under a minute, then come back here for the full reference.

Key Benefits

  • Automated verification: Every result is checked against your criteria before becoming an item — no manual filtering needed.
  • Structured enrichments: Extract specific data points (names, emails, URLs, numbers) from each result using web research, not just page scraping.
  • Real-time updates: Use webhooks and monitors to keep websets continuously updated without polling.

Writing Good Queries

The query drives search behavior. Be specific and descriptive — Websets uses Exa’s neural search, so natural language works well.
ApproachExampleWhy It Works
Specific + contextual”AI startups in the US that raised Series A in 2025”Constrains geography, stage, and timing
Entity-focused”VP of Engineering at mid-size fintech companies”Clear entity type with role specificity
Domain-scoped”Machine learning research papers on efficient transformers”Targets a content type naturally
Avoid vague queries like “interesting companies” or “good articles” — these produce noisy results that waste verification tokens. Include URLs for context: Any URL in the query will be crawled and used as additional context for the search. Use this when you want results similar to a specific page.

Writing Effective Criteria

Criteria determine whether a search result becomes an item. Each result is verified against every criterion — only results matching all criteria are kept. Keep criteria verifiable. Each criterion should be something that can be confirmed from publicly available web content.
{
  "criteria": [
    {"description": "Company is headquartered in the United States"},
    {"description": "Company has raised Series A or later funding"},
    {"description": "Company focuses on AI or machine learning products"}
  ]
}
Avoid subjective criteria like “Company is innovative” or “Company has a good culture” — these are hard to verify and produce inconsistent results. Use 1-3 criteria for best results. More criteria means stricter filtering, which can reduce item count significantly. Start with fewer criteria and add more if you’re getting too many irrelevant results. Check successRate on search responses to see what percentage of evaluated items matched each criterion. A very low success rate might indicate an overly strict or ambiguous criterion.

Choosing Entity Types

The entity field shapes how results are found and structured. Auto-detection works well in most cases, but explicit types give you more control.
TypeWhen to UseProperties You Get
companyCompany websites, LinkedIn company pagescompany.name, company.location, company.employees, company.industry
personPeople profiles, LinkedIn individualsperson.name, person.location, person.position
articleBlog posts, news articlesarticle.author, article.publishedAt
research_paperAcademic papers, arXivresearchPaper.author, researchPaper.publishedAt
customAnything else (job postings, events, etc.)custom.author, custom.publishedAt
{
  "entity": {"type": "custom", "description": "Job Postings for ML Engineers"}
}

Designing Enrichments

Enrichments extract additional data from each item using web research. Think of them as questions Websets answers for every result. Be specific about what you want. Vague enrichments produce vague results.
GoodBad
”Find the CEO’s full name""Who runs this company?"
"Company’s LinkedIn page URL""Find social media"
"Estimated annual revenue in USD""How big is this company?"
"Year the company was founded""When did they start?”
Choose the right format to get structured output:
FormatUse ForExample Description
textNames, descriptions, freeform answers”Find the CEO’s full name”
numberCounts, amounts, years”Estimated employee count”
dateSpecific dates”Date of most recent funding round”
urlLinks to specific pages”Company’s LinkedIn page URL”
emailContact emails”General contact email address”
phonePhone numbers”Main office phone number”
optionsMultiple choice classification”Industry vertical” with options
Use options for classification tasks — it’s more reliable than asking for freeform text when you have a known set of categories:
{
  "description": "Primary industry vertical",
  "format": "options",
  "options": [
    {"label": "Healthcare"},
    {"label": "Finance"},
    {"label": "Education"},
    {"label": "Developer Tools"},
    {"label": "Other"}
  ]
}
Limit to 5-7 enrichments per webset. Each enrichment runs web research for every item, so more enrichments means longer processing time. Prioritize the data points you actually need.

Async Patterns

Websets are asynchronous — results arrive over time as searches complete and enrichments process. Use wait_until_idle in SDKs for simple workflows:
webset = exa.websets.create(params=CreateWebsetParameters(...))
webset = exa.websets.wait_until_idle(webset.id)  # blocks until done
items = exa.websets.items.list(webset_id=webset.id)
Use webhooks for production systems — they’re more reliable than polling and give you real-time updates:
{
  "url": "https://your-server.com/webhook",
  "events": ["webset.item.created", "webset.item.enriched", "webset.idle"]
}
  • webset.item.created — fires as each item passes verification (stream results as they arrive)
  • webset.item.enriched — fires when an enrichment result is ready for an item
  • webset.idle — fires when all searches and enrichments complete
Items are available immediately. You can list items while the webset is still running — you don’t have to wait for idle.

Idempotency and Deduplication

Use externalId to prevent duplicate websets. If you create a webset with an externalId that already exists, you’ll get a 409 error instead of a duplicate. You can then use externalId in place of id for all API calls.
{
  "search": {"query": "...", "count": 50},
  "externalId": "weekly-leads-2025-03"
}

Monitors for Recurring Searches

Monitors run searches on a schedule to keep websets updated with fresh results.
{
  "websetId": "ws_abc123",
  "cadence": {"cron": "0 9 * * 1", "timezone": "America/New_York"},
  "behavior": {
    "type": "search",
    "config": {
      "parameters": {
        "query": "AI startups that raised Series A in the last week",
        "count": 10,
        "criteria": [{"description": "Company raised Series A in the last week"}],
        "entity": {"type": "company"},
        "behavior": "append"
      }
    }
  }
}
Monitor cron triggers at most once per day — this is a system constraint. Use "behavior": "append" in monitor searches to add new items without removing existing ones. Use "override" to re-evaluate existing items against new criteria.

Imports for Your Own Data

Imports let you bring URLs from your own sources (CRM exports, spreadsheets, etc.) and run enrichments on them:
{
  "websetId": "ws_abc123",
  "urls": [
    "https://company-a.com",
    "https://company-b.com",
    "https://company-c.com"
  ]
}
This is useful when you already have a list of targets and want to enrich them with additional data, rather than discovering new results.

Common Patterns

Lead Generation Pipeline

webset = exa.websets.create(
    params=CreateWebsetParameters(
        search={
            "query": "B2B SaaS companies in healthcare with 50-200 employees",
            "count": 100,
            "criteria": [
                {"description": "Company sells B2B SaaS products"},
                {"description": "Company operates in healthcare or healthtech"},
            ],
            "entity": {"type": "company"},
        },
        enrichments=[
            CreateEnrichmentParameters(description="CEO or founder full name", format="text"),
            CreateEnrichmentParameters(description="Company LinkedIn page URL", format="url"),
            CreateEnrichmentParameters(description="Estimated employee count", format="number"),
            CreateEnrichmentParameters(description="Most recent funding round and amount", format="text"),
        ],
    )
)

Competitive Intelligence

webset = exa.websets.create(
    params=CreateWebsetParameters(
        search={
            "query": "Companies building vector databases or embedding search infrastructure",
            "count": 30,
            "criteria": [
                {"description": "Company builds vector database or similarity search technology"},
                {"description": "Company has a publicly available product or API"},
            ],
            "entity": {"type": "company"},
        },
        enrichments=[
            CreateEnrichmentParameters(description="Primary product name", format="text"),
            CreateEnrichmentParameters(description="Pricing model (free tier, pay-as-you-go, enterprise)", format="text"),
            CreateEnrichmentParameters(description="Key technical differentiator", format="text"),
        ],
    )
)

Tips

  • Enrichment results are always arrays. Even for single values, result is ["value"] or null if not found.
  • Enrichment results have enrichmentId, not description. Build a map from webset.enrichments to resolve IDs to descriptions: {e.id: e.description for e in webset.enrichments}.
  • Item data is nested under properties. Access item.properties.url, item.properties.company.name — not item.url.
  • expand=items saves a call. GET /websets/{id}?expand=items returns the webset and its latest 100 items in one request.
  • Webhook secrets are shown once. Store the secret from the create response immediately for signature verification.
  • Preview before committing. Use POST /websets/preview to test a query without creating a webset.