Our information ecosystem is broken, and the best way to fix it is to combine LLMs with high quality content from the Internet. If you want to skip to us shilling our API, scroll to the bottom.
Our decaying information ecosystem
We started Exa because we were frustrated with the state of the Internet. Slowly but steadily, that beautiful place where you could find the best information about anything has become warped by the competitive monetization of your attention.
In our opinion, nowhere is this clearer than in the deterioration of Google search. An entire industry – search engine optimization – is dedicated to the science of ranking higher in Google results in order to monetize your attention. The effect is that even a query as simple as “eggplant parmesan recipe” results in a ferocious competition among websites not to have the best content but to rank higher in Google’s search results.
At Exa, we wanted to figure out how to make search feel magical again, and developments like GPT3 gave us confidence that it could be done using the power of large language models. We raised a seed round, bought a GPU cluster, and set off to figure out how to improve search. Our goal was (and still is) to make Internet search feel like you’re being personally guided through the grand total of human knowledge.
After over a year of experimenting with different architectures and training datasets, we arrived at a completely different way of searching the internet. The key insight was that the way people talk about a link is a great indicator of both the link’s content and its quality. For example, someone might post about a great article they read like this:
Found an amazing article I read about the history of Rome’s architecture: [LINK]
We trained a neural network to take text like this and predict the link that comes afterward. The end result is a totally different way to search the internet – search as if you’re about to share the link you want. While a little unintuitive at first, searching this way can return extremely high quality results. Some ways you can search:
- Search with descriptors or vibes
- Search only for the type of entity that you want
- Find content that Google simply doesn’t surface well, maybe because keywords aren’t the right tool or maybe just because Google doesn’t care about returning good results for that type of content.
- Search by a link itself, finding links most similar to it.
If you want, you can try it below or at https://exa.ai/search.
And then ChatGPT happened
The craziest thing happened just a few weeks after we put Exa into the wild - ChatGPT was released. Overnight, the biggest revolution in our information ecosystem since the Internet was thrust into the world. Then, just a few months later, GPT4 rocked the world again.
Since GPT’s release, the early AI adopters among us have experienced a dramatic shift in how we consume information. Whether the topic is programming, history, or your love life, you can often get the answer you’re looking for by simply asking ChatGPT. This alien intelligence is destabilizing the internet, and when the dust settles the internet will be a very different place. The fall of Stack Overflow, the reddit API fiasco, and the data lawsuits are just some of the early secondary effects.
But LLMs have limitations
For a while, we were pretty taken aback by these developments. Why does search matter if you can just ask an intelligent agent for a direct answer? We worried that making search feel magical wasn’t important anymore.
But as the LLM ecosystem has developed, we’ve seen that despite their high intelligence and reasoning abilities, LLMs suffer from serious deficiencies.
- Hallucination – LLMs, even GPT4, often output incorrect or fake information.
- Stale knowledge – LLMs have training cutoff dates. Their weights aren’t updated continuously, so they can’t know about the latest and greatest developments in the world.
- Limited knowledge capacity – LLMs can’t memorize the entire internet. Even if an LLM knows the plot of “The Great Gatsby”, it doesn’t remember all the words. It’s not designed to be a database.
To fix these problems, LLMs need to query the external world. They need to search and consume content on the Internet. This insight led us to a new hypothesis about the future of search:
LLMs will soon perform more searches than humans
What a search engine for LLMs could look like
The ability for LLMs to find high quality external information will become increasingly important as LLMs become a key human-computer interface. Paired with a good search engine, LLMs can give you the answer to any type of question you have, provided the LLM is sufficiently intelligent and the search engine sufficiently powerful.
Well what would such a search engine look like?
We’ll probably write another blog post about this topic, but a couple things:
- Quality over clickbait. LLMs prefer to ingest information-dense content that they can use to answer someone’s question in the most informed way. For instance, a search engine designed for LLMs shouldn’t return listicles of entity X because they’d probably be better off researching entities and creating lists themselves.
- The ability to handle complex natural language. LLMs can instantly output long, complex queries that specify exactly what their user wants, and they need a search engine that can handle that.
But there’s no good tool for LLMs to do search like this, right?...
Exa API – the only tool you need to connect an LLM to the internet
It turned out that all along, the neural search engine we’ve been building is well-suited for LLMs to use! Exa handles natural language extremely well, and the model promotes search results that can more closely match your query.
Today, we’re officially announcing the Exa API, a one stop shop to connect your LLM to the internet. With a few lines of code you can:
- Do a Exa (or keyword) search
- Instantly return clean, parsed HTML for any content. No need to do web scraping.
Using the API is as easy as:
It's a great tool for connecting an LLM to high quality information.
You can learn more here. We’re absolutely free to use for individual developers, up to 1000 requests/month. And if you’re a student or working on a non-monetizable project, please reach out to us and we can probably increase that.
Thanks for reading. And stay tuned for more posts about what we’re up to and how we’re thinking about the information ecosystem.