Learn how to build a real-time research paper discovery and briefing system that finds the latest scientific papers and generates concise summaries in seconds.
Exa is the search engine for AI applications and Fireworks is a fast, scalable inference platform for open source AI models. Using both allows users to access all real-time data on the web and interact with that information using frontier open source language models at blazing speed. By the end of this tutorial, you'll have code to run this yourself and free API credits on Exa to get started.
Cookbook: Open in Google Colab
Tutorial
Building a Real-time Research Assistant
This notebook demonstrates how to build a real-time research paper discovery and briefing system. We leverage Exa's semantic search to discover relevant research papers with full content from arXiv and OpenReview, then use Fireworks AI for lightning-fast embeddings (Qwen3-8B) and intelligent summarization (DeepSeek V3.1) to create actionable insights.
Step 1: Search for Research Papers
Exa search with category="research paper" and includeDomains for arxiv.org and openreview.net
Step 2: Build Documents from Content
Extract text content directly from Exa results (no PDF parsing needed)
Step 3: Optional Embedding Rerank
Fireworks Qwen3-8B embeddings with cosine similarity for custom ranking
Step 4: Generate Summary
Fireworks DeepSeek V3.1 LLM summarization
Implementation
Step-by-Step Code Walkthrough
Step 1 — API Keys Setup
Set up your API keys using Colab userdata:
from google.colab import userdataEXA_API_KEY = userdata.get("EXA_API_KEY")assert EXA_API_KEY, "Missing EXA_API_KEY in Colab userdata"FIREWORKS_API_KEY = userdata.get("FIREWORKS_API_KEY")assert FIREWORKS_API_KEY, "Missing FIREWORKS_API_KEY in Colab userdata"
Step 1.5 — Settings / Toggles
Configure the reranking behavior and number of papers to summarize:
DO_EMBED_RERANK = True# RERANKING TOGGLE:# - Exa already performs semantic search with its own high-quality reranking# - Setting this to True is mainly for DEMO PURPOSES to showcase Fireworks' embedding capabilities# - Also useful if you want CUSTOM RANKING logic (e.g., prioritize certain content types,# use domain-specific embeddings, or apply your own similarity metrics)# - False = use Exa's native ranking order (faster, no extra API calls)TOP_K = 5# how many papers to summarizeRERANK_QUERY = ("federated fine-tuning of LLMs with LoRA; averaging LoRA adapters is suboptimal; ""methods that aggregate a small matrix between B and A (LoRA-SB / square matrix R) ""to reduce communication independent of number of clients; strong results + privacy/DP benefits")# leave "" to reuse `query`, or set custom rerank query
Step 2 — Search for Research Papers (Exa) WITH CONTENT
Use Exa's search API to find research papers with full text content:
import requestsquery = "federated learning accuracy vs latency 2025"exa_url = "https://api.exa.ai/search"payload = {"query": query,"category": "research paper","type": "fast","numResults": 20,"includeDomains": ["arxiv.org","openreview.net",],"contents": {"text": True}}headers = {"x-api-key": EXA_API_KEY}response = requests.post(exa_url, json=payload, headers=headers, timeout=30)response.raise_for_status()papers = response.json().get("results", [])print("Results:", len(papers))print("Example keys:", list(papers[0].keys()) if papers else "no results")
Step 3 — Build Docs from Exa Content
Extract and filter documents from the search results:
docs = []for p in papers:title = (p.get("title") or "").strip() or "[untitled]"url = p.get("url")text = (p.get("text") or "").strip()if len(text) < 200:continuedocs.append({"title": title, "url": url, "text": text[:8000]})print("Docs built from Exa content:", len(docs))
Step 4-5 — Optional Embedding + Cosine Similarity Rerank
Use Fireworks embeddings for custom ranking:
if DO_EMBED_RERANK:import numpy as npfrom sklearn.metrics.pairwise import cosine_similarityfw_embed_url = "https://api.fireworks.ai/inference/v1/embeddings"fw_embed_headers = {"Authorization": f"Bearer {FIREWORKS_API_KEY}","Content-Type": "application/json",}def get_embedding(text: str) -> np.ndarray:payload = {"input": text[:2000], "model": "fireworks/qwen3-embedding-8b"}r = requests.post(fw_embed_url, json=payload, headers=fw_embed_headers, timeout=30)r.raise_for_status()return np.array(r.json()["data"][0]["embedding"], dtype=np.float32)# Embed docsfor d in docs:d["embedding"] = get_embedding(d["text"])# Embed rerank query (customizable)rerank_text = (RERANK_QUERY or query).strip()query_emb = get_embedding(rerank_text)# Vectorized cosine similaritydoc_embeddings = np.vstack([d["embedding"] for d in docs])scores = cosine_similarity([query_emb], doc_embeddings)[0]ranked = sorted(zip(scores, docs), reverse=True, key=lambda x: x[0])top_docs = [d for _, d in ranked[:TOP_K]]else:# No rerank: just take Exa ordertop_docs = docs[:TOP_K][(d["title"], d["url"]) for d in top_docs]
Step 6 — Summarize with Fireworks LLM
Use DeepSeek V3.1 on Fireworks to generate concise summaries:
fw_llm_url = "https://api.fireworks.ai/inference/v1/chat/completions"fw_llm_headers = {"Authorization": f"Bearer {FIREWORKS_API_KEY}","Content-Type": "application/json",}def summarize_paper(text: str, title: str) -> str:prompt = f"""Summarize this paper titled '{title}' in exactly 5 key points.Focus on: core idea, methods, results, and relevance.Keep it concise enough to brief someone in 30 seconds.IMPORTANT: Output ONLY a numbered list (1-5) with NO introduction, preamble, or concluding remarks.Start directly with "1. " and end with point 5.Content:{text[:4000]}"""payload = {"model": "fireworks/deepseek-v3p1","messages": [{"role": "user", "content": prompt}],"temperature": 0.2,}r = requests.post(fw_llm_url, json=payload, headers=fw_llm_headers, timeout=60)r.raise_for_status()return r.json()["choices"][0]["message"]["content"]for d in top_docs:d["summary"] = summarize_paper(d["text"], d["title"])
Step 7 — Generate the Real-Time Brief
print("### Real-Time Research Brief ###\n")print(f"Search query: {query}")if DO_EMBED_RERANK:print(f"Rerank query: {(RERANK_QUERY or query).strip()}\n")else:print("Rerank: (disabled)\n")for d in top_docs:print(f"{d['title']}")print(f"{d['url']}")print(f"Summary:\n{d['summary']}\n")
Sample Results
Core Idea: Fed-SB is a new method for federated learning that fine-tunes large language models (LLMs) with extreme communication efficiency and high performance, applicable to both private and non-private settings.
Method: It leverages LoRA-SB, a low-rank adaptation technique that learns a small square matrix (R) between adapters, enabling direct and exact averaging of this single matrix across clients instead of averaging all adapter parameters.
Communication Efficiency: This approach drastically reduces communication costs (up to 230x) because the transmitted matrix size is fixed and independent of the number of clients, unlike traditional methods where cost scales linearly.
Results: Fed-SB achieves state-of-the-art performance on commonsense reasoning, arithmetic reasoning, and language inference tasks while maintaining this extreme communication efficiency.
Relevance for Privacy: The method enhances private federated learning by reducing the number of trainable parameters (lowering the noise needed for differential privacy) and avoiding the noise amplification issues common in other federated fine-tuning approaches.
FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models
Core Idea: Proposes FedEx-LoRA, a method for exact aggregation of Low-Rank Adaptation (LoRA) parameters in federated learning, eliminating the approximation errors inherent in standard federated averaging of LoRA adapters.
Methods: Introduces a residual error term added to the frozen pre-trained weight matrix, allowing for precise, mathematically exact aggregation of client updates while maintaining the low-rank structure and efficiency of LoRA.
Results: Demonstrates consistent performance gains over state-of-the-art methods across diverse tasks including arithmetic reasoning, commonsense reasoning, and natural language generation, showing the significance of exact updates.
Efficiency: Achieves these exact updates with minimal computational and communication overhead, preserving the core efficiency benefits that make LoRA suitable for federated learning.
Relevance: Provides a simple, effective, and broadly applicable solution for the accurate federated fine-tuning of large foundation models, a critical need for privacy-preserving machine learning.
