Learn how to build a real-time research assistant that can look up seminal papers in a field, pull contents and summarize cited learnings in seconds.
Exa is the search engine for AI applications and Fireworks is a fast, scalable inference platform for open source AI models. Using both allows users to access all real-time data on the web and interact with that information using frontier open source language models at blazing speed. By the end of this tutorial, you'll have code to run this yourself and free API credits on Exa to get started.
Cookbook: https://colab.research.google.com/drive/1pusqjvE7oWQ8Cb9M4NTlwTaCiSsixcAk?usp=sharing
Tutorial
Building a Real-time Research Assistant
There are 4 key steps in this cookbook to build a real-time research assistant:
Step 1: Search for Research Papers
Exa search (category="papers" or filter PDF, date>2024) with type=fast
Step 2: Extract PDF Text
Download/extract PDF text (could add post-processing)
Step 3: Create Embeddings
Fireworks embed, cluster, pick top 2 papers
Step 4: Generate Summary
Fireworks LLM summary
Implementation
Step-by-Step Code Walkthrough
### Step 1 — Search for Research Papers (ExaSearch)
Setting the Exa API key and using the search API to return research pdfs from a specific date range and relevant topic.
import requestsEXA_API_KEY = userdata.get('EXA_API_KEY')query = "new methods for federated learning accuracy vs latency 2025"exa_url = "https://api.exa.ai/search"payload = {"query": query,"category": "papers","type": "fast","numResults": 10,"filters": {"date": ">2024", "filetype": "pdf"}}headers = {"Authorization": f"Bearer {EXA_API_KEY}"}response = requests.post(exa_url, json=payload, headers=headers)papers = response.json()["results"]for p in papers:print(p["title"], "—", p["url"])
### Step 2 — Download and Extract PDF Text
Extract contexts from research papers:
import iofrom PyPDF2 import PdfReaderdef extract_pdf_text(url):pdf_data = requests.get(url).contentpdf = PdfReader(io.BytesIO(pdf_data))text = " ".join([page.extract_text() for page in pdf.pages if page.extract_text()])return text[:15000] # limit to first 15k chars for efficiencydocs = []for p in papers:try:text = extract_pdf_text(p["url"])docs.append({"title": p["title"], "url": p["url"], "text": text})except:continue
### Step 3 — Create Embeddings using Fireworks Qwen3-Embedding-8B
Enter a Fireworks API key and create embeddings:
import numpy as npfw_url = "https://api.fireworks.ai/inference/v1/embeddings"headers = {"Authorization": f"Bearer {FIREWORKS_API_KEY}","Content-Type": "application/json"}def get_embedding(text):payload = {"input": text[:2000], "model": "fireworks/qwen3-embedding-8b"}response = requests.post(fw_url, json=payload, headers=headers)return np.array(response.json()["data"][0]["embedding"])for d in docs:d["embedding"] = get_embedding(d["text"])
### Step 4 — Rank Papers by Semantic Similarity
The model ranks papers by similarities:
from sklearn.metrics.pairwise import cosine_similarityquery_emb = get_embedding(query)similarities = [cosine_similarity([query_emb], [d["embedding"]])[0][0] for d in docs]ranked = sorted(zip(similarities, docs), reverse=True, key=lambda x: x[0])top_docs = [d for _, d in ranked[:2]]
### Step 5 — Summarize with Fireworks DeepSeek V3.1 (LLM)
Use a open source model on Fireworks to summarize key learnings from the research paper at lightning speed:
import json, requestsfw_llm_url = "https://api.fireworks.ai/inference/v1/chat/completions"def summarize_paper(text, title):prompt = f"""Summarize this research paper titled '{title}' in 5 key points.Focus on: core idea, methods, results, and relevance to current AI research.Make it concise enough to brief someone in 30 seconds.Text:{text[:4000]}"""payload = {"model": "fireworks/deepseek-v3p1","messages": [{"role": "user", "content": prompt}]}headers = {"Authorization": f"Bearer {FIREWORKS_API_KEY}","Content-Type": "application/json"}response = requests.post(fw_llm_url, json=payload, headers=headers)print(response.text)# Check if API call succeededtry:data = response.json()except json.JSONDecodeError:return f"[Error decoding Fireworks response: {response.text}]"if response.status_code != 200:return f"[Fireworks error {response.status_code}: {data.get('error', data)}]"if "choices" not in data:return f"[Unexpected response: {data}]"return data["choices"][0]["message"]["content"]for d in top_docs:d["summary"] = summarize_paper(d["text"], d["title"])
### Step 6 — Generate the Real-Time Brief
print("### 🔬 Real-Time Research Brief ###\n")for d in top_docs:print(f"📘 {d['title']}")print(f"🔗 {d['url']}")print(f"🧠 Summary: {d['summary']}\n")
Sample Results
Real-Time Research Brief
FAST: A Lightweight Mechanism Unleashing Arbitrary Client Participation in Federated Learning
https://www.ijcai.org/proceedings/2025/0628.pdf
Based on the provided text, here is a 5-point summary:
- Core Idea: Proposes FAST, a lightweight mechanism to handle Arbitrary Client Participation (ACP) in Federated Learning, where clients join/leave unpredictably, a common real-world problem.
- Method: Clients periodically take a "snapshot" of their local model. These snapshots are used to update the global model, ensuring progress is retained even if a client later drops out.
- Theoretical Result: The authors prove FAST achieves convergence rates matching those of ideal, controlled participation scenarios for both convex and non-convex problems.
- Practical Result: Experiments show FAST significantly improves model performance and stability under high data heterogeneity and the challenging conditions of ACP.
- Relevance: It's a simple, effective, and plug-and-play solution that addresses a major practical hurdle in deploying FL at scale, making it more robust and reliable.
One-shot Federated Learning Methods: A Practical Guide
https://www.ijcai.org/proceedings/2025/1174.pdf
