Half the Web, None of the Signal
Agents are text-literate but audio-blind
If you've spent time building an AI agent — a research assistant, a market intelligence tool, a news monitor — you've bumped into the same ceiling. Your agent can read an article. It can parse a PDF. It can search the web and pull back a dozen URLs to scrape and summarize.
But the moment the relevant information exists only as audio, your agent goes quiet. Not because the information isn't out there. The Fed chair gave a speech last Tuesday. The CEO said something interesting on a panel in Davos. A senator made an explicit commitment during a committee hearing. All of it happened. All of it is technically "on the internet." None of it is retrievable by your agent.
Audio is arguably where the most unguarded, substantive information lives — and it's been invisible to AI agents until now.
Executives say things on earnings calls that never make it into press releases. Scientists describe their findings at conferences weeks before the paper drops. Politicians commit to positions in hearings that don't surface in any searchable text. The gap isn't in what was said. It's in what was indexed.
Why the obvious workarounds fall short
What Sonar is
One tool call. Ranked clips. Citable sources.
Sonar is a search API designed specifically for the AI agent loop. You pass a natural language question — a topic, event, speaker, or claim — and you get back ranked audio clips with accurate transcripts, precise timestamps, and speaker attribution. Your agent can quote them, summarize them, or chain further searches from them.
The index spans news broadcasts, podcasts, radio, earnings calls, social audio, and archives. It updates continuously. You describe what you're looking for; Sonar finds where it was said.
Agent calls Sonar
A natural language question — topic, event, speaker, or claim — passed as a tool call in your agent loop
Sonar searches audio
Sonar searches news, podcasts, radio, earnings calls, social audio, and archives — and returns what matches.
Results your agent uses
Transcript, timestamp, speaker, and source — ready to quote, summarize, or follow up on
Real scenarios
What this actually unlocks
The clearest way to understand what Sonar enables is to think about specific agents and the questions that would previously stop them.
An analyst agent tracking a public company can now catch what the CFO said on a CNBC appearance — not just the quarterly filing. Executives hedge and forecast in interviews in ways that never reach investor relations documents.
"What has Jensen Huang said about TSMC dependence?"Committee hearings produce hours of audio that takes days to become searchable text — if ever. A policy agent can now track what senators and agency heads are actually saying in real time, before the summary gets written.
"What are lawmakers saying about grid storage?"News radio and podcast discussions often run ahead of the written record. A media monitoring agent can track narratives as they emerge in audio — surfacing a talking point in syndicated radio six hours before it hits a headline.
"How is the tariff reversal being framed on business radio?"Conference talks and expert panels contain direct primary-source statements. A research agent can now cite a scientist's conference remarks with a timestamp — the way you'd cite a paper — weeks before the preprint exists.
"What did Demis Hassabis say about protein folding at NeurIPS?"The API
Designed to drop into your existing agent
Sonar exposes two modes: Retrieve for fast semantic search returning ranked clips, and Research for a full synthesis — a summary with key themes and cited audio evidence. Both are reachable through the same endpoint.
retrieve (default) for ranked clips · research for synthesized summary with citations
news, podcast, radio, earnings, social, archive
{ from, to } ISO 8601 dates. Omit for all-time.
5, max 20.
Each result in the response looks like this:
{
"results": [
{
"source": "Bloomberg Surveillance",
"source_type": "news",
"speaker": "Mary Daly, SF Fed President",
"timestamp": "2025-03-18T09:14:32Z",
"clip_start": 412, // seconds into recording
"clip_end": 447,
"transcript": "We're not seeing the kind of labor market softening
that would give us confidence to move faster. I think
two cuts this year is the right framing.",
"relevance": 0.94,
"source_url": "https://..."
}
]
}
The response is citation-ready. Your agent gets the source, the speaker, the exact timestamp, and a clean transcript — everything needed to ground a claim in a primary source rather than paraphrasing something that paraphrased something else.
Using it as an LLM tool
Sonar is built to live inside a tool loop. Here's the full integration as a function definition for an OpenAI-compatible tool schema — paste this in and it works:
// Tool definition for your agent const sonarTool = { type: "function", function: { name: "search_audio", description: "Search public audio — news, podcasts, radio, earnings calls, social audio, and archives. Returns ranked clips with transcripts, timestamps, and speaker attribution.", parameters: { type: "object", properties: { query: { type: "string" }, mode: { type: "string", enum: ["retrieve", "research"] } }, required: ["query"] } } }; // Handler async function search_audio({ query, mode = "retrieve" }) { const res = await fetch("https://api.sonarapi.dev/v1/audio.search", { method: "POST", headers: { "Content-Type": "application/json", "Authorization": `Bearer ${SONAR_API_KEY}` }, body: JSON.stringify({ query, mode }) }); return res.json(); }
What a result looks like in practice
Say your agent is building a brief on Federal Reserve rate expectations. It calls Sonar with "Fed rate cut outlook 2025". Here's what comes back:
Three primary sources, three speakers, three timestamps. Your agent didn't scrape anything, didn't pre-download any files, and didn't rely on a journalist's summary of what was said. It has the actual words, with provenance.
This is the difference between your agent knowing what people are saying and your agent knowing what people wrote about what was said. For anything that requires grounding a claim in a primary source, the gap is significant.
Under the hood
How the index works
Sonar continuously indexes audio from public sources — news networks, podcast directories, radio station feeds, public earnings calls, congressional archives, and social audio platforms. The corpus grows every day.
When your agent asks a question, Sonar searches that index and returns the moments that matter — ranked clips with transcript, timestamp, speaker, and source. Not pages to scrape. Not articles to parse. The actual spoken record, made usable.
The two modes reflect how agents actually use audio search. Retrieve is for when your agent needs raw evidence — the actual clip — to cite or quote. Research is for when your agent needs to synthesize across multiple sources: it finds related clips and returns a structured summary with every claim pinned to its audio source.
What's next
We're building this with early users
Sonar is in public beta. If you're building agents that need to listen to the web — not just read it — we'd love to hear from you at sonarapi.com.
Early users get a direct line to the founding team. We're onboarding a limited number of developers and teams right now, and the people who build on Sonar in the next few months will directly shape what the API becomes. If you have a use case we're not covering yet — a source type, a workflow, a constraint we should know about — we want to hear from you.
Audio is half the web. It's time agents could use it.
Questions? Reach us at contact@sonarapi.com.
— The Sonar team