I Indexed 4,000+ Hours of Sales Calls. Here's What I Found.
I was working with a SaaS company that had two call recording platforms. Sales used one, customer success used the other. Neither could search the other's calls. I unified both into one data warehouse with semantic search, speaker resolution, and automated insight extraction. 4,000+ hours. 26 million words. Here's how I built it and what I found.
The Numbers
The Problem: Two Platforms, Zero Cross-Visibility
This is a pattern I see everywhere. Sales records calls on one platform. Customer success records on another. Both platforms have transcripts, both have search, and neither talks to the other.
The result: when an account manager prepares for a business review, they can't see what Sales promised during the original pitch. When Sales preps for an expansion call, they can't see the support conversations CS has been having. When leadership asks "how often are customers mentioning competitor X?", nobody can answer without manually searching both platforms.
This isn't a recording problem. It's a data silo problem. The calls exist. The transcripts exist. They're just locked inside two separate platforms with incompatible schemas and no shared search.
The Architecture
I built a unified call intelligence layer on Supabase Postgres. Here's how it works:
Two ingest pipelines, one warehouse
Each platform gets its own Edge Function for ingestion, but they write to a shared schema:
| Platform | Calls | Words | Transcript Format | Refresh |
|---|---|---|---|---|
| Platform A (Sales) | 2,160 | 12.3M | Speaker IDs mapped to parties | Hourly incremental + weekly full |
| Platform B (CS) | 2,189 | 14M | Array of {speaker, text} | Hourly incremental + weekly full |
The speaker resolution problem
This was harder than expected. Each platform has its own speaker identification system:
- Platform A: Speaker IDs are different from user IDs. Speaker names come from a
partiesfield on each call, not from the users table. You have to join across three different entities to figure out who said what. - Platform B: Transcripts arrive as arrays of
{speaker, text}objects. Speaker names are sometimes first names only, sometimes full names, sometimes phone numbers.
I solved this by building a speaker mapping layer that normalizes names across both platforms. It's not perfect — occasional unknown speakers still appear — but it's good enough that downstream queries like "find all calls where Sarah from the team discussed pricing" actually work.
Dual refresh strategy
New calls need to appear quickly (for dashboards), but full data integrity matters too (for analytics). I run both:
- Hourly incremental sync: Queries the latest call timestamp from the warehouse, fetches only newer calls, UPSERTs (no delete). This keeps dashboards near-real-time.
- Weekly full refresh: DELETE + INSERT with batch processing (5-20 transcripts per batch, one-at-a-time retry on failures). This catches any gaps the incremental missed — retroactive edits, backfilled transcripts, deleted duplicates.
The batch processing matters because transcripts are large. A 30-minute call generates a transcript that's 5,000-10,000 words. Trying to INSERT 2,000 of those at once will timeout any serverless function. Batching with retry logic handles this gracefully.
Semantic Search with pgvector
Keyword search on transcripts is barely useful. People don't say the same words every time they discuss the same topic. A customer describing churn risk might say "we're evaluating alternatives", "the contract is up for renewal and we're not sure", or "our team is frustrated with the onboarding experience." All mean the same thing. None share keywords.
I embedded every transcript chunk using Gemini Embedding 2 and stored 10,041 vectors in pgvector. Now you can search by meaning:
-- "Which calls discuss churn risk?"
SELECT call_id, content, similarity
FROM call_embeddings
ORDER BY embedding <=> query_embedding
LIMIT 20;
This surfaces relevant calls regardless of the exact words used. It's the difference between finding 3 results and finding 30.
Automated Insight Extraction
Raw transcripts are useful for search. But the real value comes from extracting structured insights automatically:
Event mention extraction
I built a keyword + context extraction pipeline that scans all transcripts for mentions of specific topics: conferences, product features, competitors, pricing objections, integration requests. 894 event mentions extracted across 4,349 calls. This powers downstream dashboards that show which topics are trending in customer conversations without anyone reading a single transcript.
Sentiment classification
Each extracted mention gets a sentiment tag: positive, neutral, or negative. This lets us answer questions like "are customers talking about competitor X positively or negatively?" and "is sentiment around feature Y improving after the last release?"
Voice of Customer quotes
455 customer quotes extracted and categorized by topic (onboarding, support, product feedback, enrollment). These feed directly into certification impact analysis, marketing case studies, and product feedback reports — all automated, no manual listening required.
What This Unlocks: Use Cases by Team
The warehouse doesn't just exist for analytics. It feeds 6 different systems:
Sales
Pre-call prep with full account history. AI-generated pitch decks pull relevant call insights. Competitive mentions tracked across all conversations.
Customer Success
account review prep with call sentiment trends. Churn risk signals from transcript analysis. Support teams discuss product health 2x more than sales (I measured).
Partner / Affiliate Management
Track how partners talk about your product on calls. Surface optimization opportunities from partner feedback patterns. Identify top-performing partner relationships by conversation quality.
Product
Feature request mining across all calls. Pain point frequency analysis. "Customers mentioned X 47 times this quarter" is a stronger signal than a feature request ticket.
Events & Marketing
894 event mentions extracted. Which conferences are reps pitching? Which events do customers ask about? Which competitor stories come up most?
Knowledge Base
4,585 docs indexed for an enterprise AI agent. Call transcripts are one of the richest sources — they capture the questions customers actually ask, not the ones you assume they ask.
Industry Applications
This architecture works for any business that records customer conversations. Here's what it looks like across industries:
SaaS / Tech
The original use case. Sales and support calls unified for churn prediction, account review prep, and competitive intelligence. If you have more than 50 customer calls per month across two or more platforms, you're sitting on searchable intelligence that nobody can access.
Real Estate / Property Management
Agent calls with buyers, tenant complaints, vendor negotiations. Extract: pricing trends, common objections, maintenance request patterns, neighborhood preferences. Feed into lead scoring and property matching.
Healthcare / Telehealth
Patient intake calls, follow-up consultations, insurance coordination. Extract: symptom frequency, medication questions, appointment friction, satisfaction signals. (Compliance note: you'll need HIPAA-compliant storage and consent tracking.)
Financial Services / Insurance
Claims calls, policy discussions, advisor consultations. Extract: common claim patterns, policy confusion points, competitive switching signals, compliance language usage.
E-commerce / D2C
Support calls, returns conversations, VIP customer calls. Extract: product quality signals, sizing/fit issues, competitor mentions, loyalty indicators. Feed into product development and inventory planning.
Agencies / Consulting
Client calls, prospect discovery sessions, project check-ins. Extract: scope creep signals, satisfaction trends, upsell opportunities, common pain points by vertical. The agency that can say "here's what your industry peers are asking about" wins the pitch.
The Technical Gotchas
1. Speaker IDs are not user IDs
Every call recording platform has a different model for who's speaking. Don't assume you can just join speaker_id to user_id. Budget time for building a speaker resolution layer.
2. Transcripts are bigger than you think
A 30-minute call is 5,000-10,000 words. A 60-minute call is 10,000-20,000 words. Multiply by thousands of calls and you're dealing with hundreds of megabytes of text. Batch your INSERTs, implement retry logic, and use cursor-based pagination on the API side.
3. Incremental sync is necessary but insufficient
Hourly incremental sync catches new calls quickly but misses retroactive changes (edited transcripts, backfilled metadata, platform corrections). You need a weekly full refresh as a safety net. It's more expensive but it's how you maintain data integrity.
4. Rate limits are the real bottleneck
Both platforms I integrated enforced ~60 requests per minute. When you're doing a full refresh of 2,000+ calls, that's 30+ minutes of API time. Design for this: run full refreshes during off-hours, implement backoff, and split large syncs across multiple pg_cron windows.
5. Embedding everything is expensive (but worth it)
10,041 embeddings isn't free. I used Gemini Embedding 2, which is relatively cheap, but embedding 26M words still takes time and API credits. Chunk transcripts by speaker turn (not by arbitrary character count) for better semantic coherence. And only re-embed changed transcripts on incremental runs.
6. The AI features on recording platforms are limited
Both platforms offered some built-in AI features (summaries, action items). Both had significant gaps — one returned 405 errors on the AI content endpoint (not available on our plan), the other's search only indexed metadata, not full transcripts. Don't rely on the platform's built-in AI. Build your own extraction layer.
The Stack
Database: Supabase Postgres + pgvector (HNSW indexes)
Ingest: 4 Edge Functions (incremental + full refresh per platform)
Scheduling: pg_cron (hourly incremental, weekly full refresh)
Embeddings: Gemini Embedding 2 (1536 dimensions)
Extraction: Keyword + context extraction, sentiment classification
Downstream: AI presentation generator, enterprise knowledge agent, churn alerts, event dashboards, content pipeline
Cost: Supabase free tier for storage, Gemini API for embeddings. No servers.
What I Actually Found
The architecture is useful. But the insights are what made people care. Here are three that changed how teams operate:
Support talks about product health 2x more than sales
I measured topic frequency by team. Customer Success reps spend twice as much call time discussing product performance, traffic quality, and technical issues compared to Sales. Sales focuses on pricing, onboarding timelines, and competitive positioning. This seems obvious in hindsight, but nobody had measured it before. It changed how the company staffed product feedback channels.
The most effective competitive story comes from customer pain
I found 26+ mentions of a specific competitor's domain failure in call transcripts. Reps who told that story closed more deals. But most reps didn't know it existed. Surfacing the winning competitive narratives from transcript data gave the whole team the best reps' playbook.
Zero reps cited ROI proof on calls
Despite having case studies and impact data, not a single rep referenced quantitative ROI proof during product pitch calls. The data existed. Nobody used it on calls. This led directly to building the context-injected presentation generator — because the problem wasn't that reps didn't have the data, it was that the data wasn't in front of them during the call.
"I've been on this team for two years and I had no idea what CS was hearing from customers. Now I can search it in 10 seconds."
— a sales rep, after the warehouse went live
Should You Build This?
If you record customer calls on any platform, you're already paying for the data. It's sitting in a vendor's database, searchable only through their UI, siloed from everything else you know about your customers.
(A project like this generates dozens of architectural decisions, API gotchas, and speaker mapping rules that you need to remember across sessions. I used Brain Kit — a persistent memory server for AI tools — to capture all of it. Semantic search across past decisions saved me from re-learning the same lessons.)
Building a call intelligence warehouse isn't a massive infrastructure project. It's 4 Edge Functions, a Postgres database with pgvector, and some pg_cron jobs. The hard parts are speaker resolution (budget a day for mapping logic) and transcript batching (don't try to INSERT 2,000 transcripts at once).
The payoff: every team in your company gets access to what customers actually say — not what the CRM says they said, not what the meeting notes summarize, but the actual words. Searchable by meaning, not just keywords. That's a competitive advantage most companies are leaving on the table.
Brain Kit ($29)
Brain Kit uses the same pgvector + semantic search architecture from this post. Give every AI tool you use persistent, searchable memory.
Get Brain Kit โ $29Like what I build? Check out the shop — deploy-ready kits starting at $14. The Client Intelligence Kit ($19) is the productized version of this kind of customer data analysis.
More from the build log
I write about building AI-powered data systems, automation pipelines, and developer tools. No hype, just what works.
Read more posts