Learn how to build a real-time research paper discovery and briefing system that finds the latest scientific papers and generates concise summaries in seconds.

Exa is the search engine for AI applications and Fireworks is a fast, scalable inference platform for open source AI models. Using both allows users to access all real-time data on the web and interact with that information using frontier open source language models at blazing speed. By the end of this tutorial, you'll have code to run this yourself and free API credits on Exa to get started.

Cookbook: Open in Google Colab

Tutorial

Building a Real-time Research Assistant

This notebook demonstrates how to build a real-time research paper discovery and briefing system. We leverage Exa's semantic search to discover relevant research papers with full content from arXiv and OpenReview, then use Fireworks AI for lightning-fast embeddings (Qwen3-8B) and intelligent summarization (DeepSeek V3.1) to create actionable insights.

Step 1: Search for Research Papers

Exa search with category="research paper" and includeDomains for arxiv.org and openreview.net

Step 2: Build Documents from Content

Extract text content directly from Exa results (no PDF parsing needed)

Step 3: Optional Embedding Rerank

Fireworks Qwen3-8B embeddings with cosine similarity for custom ranking

Step 4: Generate Summary

Fireworks DeepSeek V3.1 LLM summarization

Implementation

Step-by-Step Code Walkthrough

Step 1 — API Keys Setup

Set up your API keys using Colab userdata:

from google.colab import userdata

EXA_API_KEY = userdata.get("EXA_API_KEY")
assert EXA_API_KEY, "Missing EXA_API_KEY in Colab userdata"

FIREWORKS_API_KEY = userdata.get("FIREWORKS_API_KEY")
assert FIREWORKS_API_KEY, "Missing FIREWORKS_API_KEY in Colab userdata"

Step 1.5 — Settings / Toggles

Configure the reranking behavior and number of papers to summarize:

DO_EMBED_RERANK = True
# RERANKING TOGGLE:
# - Exa already performs semantic search with its own high-quality reranking
# - Setting this to True is mainly for DEMO PURPOSES to showcase Fireworks' embedding capabilities
# - Also useful if you want CUSTOM RANKING logic (e.g., prioritize certain content types,
#   use domain-specific embeddings, or apply your own similarity metrics)
# - False = use Exa's native ranking order (faster, no extra API calls)

TOP_K = 5
# how many papers to summarize

RERANK_QUERY = (
  "federated fine-tuning of LLMs with LoRA; averaging LoRA adapters is suboptimal; "
  "methods that aggregate a small matrix between B and A (LoRA-SB / square matrix R) "
  "to reduce communication independent of number of clients; strong results + privacy/DP benefits"
)
# leave "" to reuse `query`, or set custom rerank query

Step 2 — Search for Research Papers (Exa) WITH CONTENT

Use Exa's search API to find research papers with full text content:

import requests

query = "federated learning accuracy vs latency 2025"
exa_url = "https://api.exa.ai/search"

payload = {
    "query": query,
    "category": "research paper",
    "type": "fast",
    "numResults": 20,
    "includeDomains": [
        "arxiv.org",
        "openreview.net",
    ],
    "contents": {
        "text": True
    }
}

headers = {"x-api-key": EXA_API_KEY}
response = requests.post(exa_url, json=payload, headers=headers, timeout=30)
response.raise_for_status()

papers = response.json().get("results", [])
print("Results:", len(papers))
print("Example keys:", list(papers[0].keys()) if papers else "no results")

Step 3 — Build Docs from Exa Content

Extract and filter documents from the search results:

docs = []
for p in papers:
    title = (p.get("title") or "").strip() or "[untitled]"
    url = p.get("url")
    text = (p.get("text") or "").strip()

    if len(text) < 200:
        continue

    docs.append({"title": title, "url": url, "text": text[:8000]})

print("Docs built from Exa content:", len(docs))

Step 4-5 — Optional Embedding + Cosine Similarity Rerank

Use Fireworks embeddings for custom ranking:

if DO_EMBED_RERANK:
    import numpy as np
    from sklearn.metrics.pairwise import cosine_similarity

    fw_embed_url = "https://api.fireworks.ai/inference/v1/embeddings"
    fw_embed_headers = {
        "Authorization": f"Bearer {FIREWORKS_API_KEY}",
        "Content-Type": "application/json",
    }

    def get_embedding(text: str) -> np.ndarray:
        payload = {"input": text[:2000], "model": "fireworks/qwen3-embedding-8b"}
        r = requests.post(fw_embed_url, json=payload, headers=fw_embed_headers, timeout=30)
        r.raise_for_status()
        return np.array(r.json()["data"][0]["embedding"], dtype=np.float32)

    # Embed docs
    for d in docs:
        d["embedding"] = get_embedding(d["text"])

    # Embed rerank query (customizable)
    rerank_text = (RERANK_QUERY or query).strip()
    query_emb = get_embedding(rerank_text)

    # Vectorized cosine similarity
    doc_embeddings = np.vstack([d["embedding"] for d in docs])
    scores = cosine_similarity([query_emb], doc_embeddings)[0]

    ranked = sorted(zip(scores, docs), reverse=True, key=lambda x: x[0])
    top_docs = [d for _, d in ranked[:TOP_K]]

else:
    # No rerank: just take Exa order
    top_docs = docs[:TOP_K]

[(d["title"], d["url"]) for d in top_docs]

Step 6 — Summarize with Fireworks LLM

Use DeepSeek V3.1 on Fireworks to generate concise summaries:

fw_llm_url = "https://api.fireworks.ai/inference/v1/chat/completions"
fw_llm_headers = {
    "Authorization": f"Bearer {FIREWORKS_API_KEY}",
    "Content-Type": "application/json",
}

def summarize_paper(text: str, title: str) -> str:
    prompt = f"""Summarize this paper titled '{title}' in exactly 5 key points.
Focus on: core idea, methods, results, and relevance.
Keep it concise enough to brief someone in 30 seconds.

IMPORTANT: Output ONLY a numbered list (1-5) with NO introduction, preamble, or concluding remarks.
Start directly with "1. " and end with point 5.

Content:
{text[:4000]}
"""
    payload = {
        "model": "fireworks/deepseek-v3p1",
        "messages": [{"role": "user", "content": prompt}],
        "temperature": 0.2,
    }
    r = requests.post(fw_llm_url, json=payload, headers=fw_llm_headers, timeout=60)
    r.raise_for_status()
    return r.json()["choices"][0]["message"]["content"]

for d in top_docs:
    d["summary"] = summarize_paper(d["text"], d["title"])

Step 7 — Generate the Real-Time Brief

print("### Real-Time Research Brief ###\n")
print(f"Search query: {query}")
if DO_EMBED_RERANK:
    print(f"Rerank query: {(RERANK_QUERY or query).strip()}\n")
else:
    print("Rerank: (disabled)\n")

for d in top_docs:
    print(f"{d['title']}")
    print(f"{d['url']}")
    print(f"Summary:\n{d['summary']}\n")

Sample Results

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning

Core Idea: Fed-SB is a new method for federated learning that fine-tunes large language models (LLMs) with extreme communication efficiency and high performance, applicable to both private and non-private settings.

Method: It leverages LoRA-SB, a low-rank adaptation technique that learns a small square matrix (R) between adapters, enabling direct and exact averaging of this single matrix across clients instead of averaging all adapter parameters.

Communication Efficiency: This approach drastically reduces communication costs (up to 230x) because the transmitted matrix size is fixed and independent of the number of clients, unlike traditional methods where cost scales linearly.

Results: Fed-SB achieves state-of-the-art performance on commonsense reasoning, arithmetic reasoning, and language inference tasks while maintaining this extreme communication efficiency.

Relevance for Privacy: The method enhances private federated learning by reducing the number of trainable parameters (lowering the noise needed for differential privacy) and avoiding the noise amplification issues common in other federated fine-tuning approaches.

FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models

Core Idea: Proposes FedEx-LoRA, a method for exact aggregation of Low-Rank Adaptation (LoRA) parameters in federated learning, eliminating the approximation errors inherent in standard federated averaging of LoRA adapters.

Methods: Introduces a residual error term added to the frozen pre-trained weight matrix, allowing for precise, mathematically exact aggregation of client updates while maintaining the low-rank structure and efficiency of LoRA.

Results: Demonstrates consistent performance gains over state-of-the-art methods across diverse tasks including arithmetic reasoning, commonsense reasoning, and natural language generation, showing the significance of exact updates.

Efficiency: Achieves these exact updates with minimal computational and communication overhead, preserving the core efficiency benefits that make LoRA suitable for federated learning.

Relevance: Provides a simple, effective, and broadly applicable solution for the accurate federated fine-tuning of large foundation models, a critical need for privacy-preserving machine learning.

Integrate Exa Search

Test for free

Talk to Sales

Exa × Fireworks

APIs Used

Industry

Outcomes

Tutorial

Building a Real-time Research Assistant

Step 1: Search for Research Papers

Step 2: Build Documents from Content

Step 3: Optional Embedding Rerank

Step 4: Generate Summary

Implementation

Step-by-Step Code Walkthrough

Sample Results

Integrate Exa Search

Products

Company

Developers

Resources

Connect