Exa is a modern AI search engine with SERP API, website crawler tools, and deep research API. Power your app with web search AI and web crawling API.
Introducing Exa's People Search Benchmarks

Exa × Fireworks

Build your own Research Assistant in under 60 seconds with Exa and Fireworks

Exa × Fireworks

APIs Used

Search

Industry

AI Infrastructure

Outcomes

Real-time research discovery
Fast PDF processing
Semantic paper ranking

Learn how to build a real-time research assistant that can look up seminal papers in a field, pull contents and summarize cited learnings in seconds.

Exa is the search engine for AI applications and Fireworks is a fast, scalable inference platform for open source AI models. Using both allows users to access all real-time data on the web and interact with that information using frontier open source language models at blazing speed. By the end of this tutorial, you'll have code to run this yourself and free API credits on Exa to get started.

Cookbook: https://colab.research.google.com/drive/1pusqjvE7oWQ8Cb9M4NTlwTaCiSsixcAk?usp=sharing

Tutorial

Building a Real-time Research Assistant

There are 4 key steps in this cookbook to build a real-time research assistant:

Step 1: Search for Research Papers

Exa search (category="papers" or filter PDF, date>2024) with type=fast

Step 2: Extract PDF Text

Download/extract PDF text (could add post-processing)

Step 3: Create Embeddings

Fireworks embed, cluster, pick top 2 papers

Step 4: Generate Summary

Fireworks LLM summary

Implementation

Step-by-Step Code Walkthrough

### Step 1 — Search for Research Papers (ExaSearch)

Setting the Exa API key and using the search API to return research pdfs from a specific date range and relevant topic.

import requests
EXA_API_KEY = userdata.get('EXA_API_KEY')
query = "new methods for federated learning accuracy vs latency 2025"
exa_url = "https://api.exa.ai/search"
payload = {
"query": query,
"category": "papers",
"type": "fast",
"numResults": 10,
"filters": {"date": ">2024", "filetype": "pdf"}
}
headers = {"Authorization": f"Bearer {EXA_API_KEY}"}
response = requests.post(exa_url, json=payload, headers=headers)
papers = response.json()["results"]
for p in papers:
print(p["title"], "—", p["url"])

### Step 2 — Download and Extract PDF Text

Extract contexts from research papers:

import io
from PyPDF2 import PdfReader
def extract_pdf_text(url):
pdf_data = requests.get(url).content
pdf = PdfReader(io.BytesIO(pdf_data))
text = " ".join([page.extract_text() for page in pdf.pages if page.extract_text()])
return text[:15000] # limit to first 15k chars for efficiency
docs = []
for p in papers:
try:
text = extract_pdf_text(p["url"])
docs.append({"title": p["title"], "url": p["url"], "text": text})
except:
continue

### Step 3 — Create Embeddings using Fireworks Qwen3-Embedding-8B

Enter a Fireworks API key and create embeddings:

import numpy as np
fw_url = "https://api.fireworks.ai/inference/v1/embeddings"
headers = {
"Authorization": f"Bearer {FIREWORKS_API_KEY}",
"Content-Type": "application/json"
}
def get_embedding(text):
payload = {"input": text[:2000], "model": "fireworks/qwen3-embedding-8b"}
response = requests.post(fw_url, json=payload, headers=headers)
return np.array(response.json()["data"][0]["embedding"])
for d in docs:
d["embedding"] = get_embedding(d["text"])

### Step 4 — Rank Papers by Semantic Similarity

The model ranks papers by similarities:

from sklearn.metrics.pairwise import cosine_similarity
query_emb = get_embedding(query)
similarities = [cosine_similarity([query_emb], [d["embedding"]])[0][0] for d in docs]
ranked = sorted(zip(similarities, docs), reverse=True, key=lambda x: x[0])
top_docs = [d for _, d in ranked[:2]]

### Step 5 — Summarize with Fireworks DeepSeek V3.1 (LLM)

Use a open source model on Fireworks to summarize key learnings from the research paper at lightning speed:

import json, requests
fw_llm_url = "https://api.fireworks.ai/inference/v1/chat/completions"
def summarize_paper(text, title):
prompt = f"""
Summarize this research paper titled '{title}' in 5 key points.
Focus on: core idea, methods, results, and relevance to current AI research.
Make it concise enough to brief someone in 30 seconds.
Text:
{text[:4000]}
"""
payload = {
"model": "fireworks/deepseek-v3p1",
"messages": [{"role": "user", "content": prompt}]
}
headers = {
"Authorization": f"Bearer {FIREWORKS_API_KEY}",
"Content-Type": "application/json"
}
response = requests.post(fw_llm_url, json=payload, headers=headers)
print(response.text)
# Check if API call succeeded
try:
data = response.json()
except json.JSONDecodeError:
return f"[Error decoding Fireworks response: {response.text}]"
if response.status_code != 200:
return f"[Fireworks error {response.status_code}: {data.get('error', data)}]"
if "choices" not in data:
return f"[Unexpected response: {data}]"
return data["choices"][0]["message"]["content"]
for d in top_docs:
d["summary"] = summarize_paper(d["text"], d["title"])

### Step 6 — Generate the Real-Time Brief

print("### 🔬 Real-Time Research Brief ###\n")
for d in top_docs:
print(f"📘 {d['title']}")
print(f"🔗 {d['url']}")
print(f"🧠 Summary: {d['summary']}\n")

Sample Results

Real-Time Research Brief

FAST: A Lightweight Mechanism Unleashing Arbitrary Client Participation in Federated Learning

https://www.ijcai.org/proceedings/2025/0628.pdf

Based on the provided text, here is a 5-point summary:

  • Core Idea: Proposes FAST, a lightweight mechanism to handle Arbitrary Client Participation (ACP) in Federated Learning, where clients join/leave unpredictably, a common real-world problem.
  • Method: Clients periodically take a "snapshot" of their local model. These snapshots are used to update the global model, ensuring progress is retained even if a client later drops out.
  • Theoretical Result: The authors prove FAST achieves convergence rates matching those of ideal, controlled participation scenarios for both convex and non-convex problems.
  • Practical Result: Experiments show FAST significantly improves model performance and stability under high data heterogeneity and the challenging conditions of ACP.
  • Relevance: It's a simple, effective, and plug-and-play solution that addresses a major practical hurdle in deploying FL at scale, making it more robust and reliable.

One-shot Federated Learning Methods: A Practical Guide

https://www.ijcai.org/proceedings/2025/1174.pdf

Integrate Exa Search