April 4, 2026 · 12 min read

I Indexed 4,000+ Hours of Sales Calls. Here's What I Found.

I was working with a SaaS company that had two call recording platforms. Sales used one, customer success used the other. Neither could search the other's calls. I unified both into one data warehouse with semantic search, speaker resolution, and automated insight extraction. 4,000+ hours. 26 million words. Here's how I built it and what I found.

The Numbers

4,349Calls indexed

26M+Words transcribed

10,041Embeddings stored

894Event mentions extracted

The Problem: Two Platforms, Zero Cross-Visibility

This is a pattern I see everywhere. Sales records calls on one platform. Customer success records on another. Both platforms have transcripts, both have search, and neither talks to the other.

The result: when an account manager prepares for a business review, they can't see what Sales promised during the original pitch. When Sales preps for an expansion call, they can't see the support conversations CS has been having. When leadership asks "how often are customers mentioning competitor X?", nobody can answer without manually searching both platforms.

This isn't a recording problem. It's a data silo problem. The calls exist. The transcripts exist. They're just locked inside two separate platforms with incompatible schemas and no shared search.

The Architecture

I built a unified call intelligence layer on Supabase Postgres. Here's how it works:

Two ingest pipelines, one warehouse

Each platform gets its own Edge Function for ingestion, but they write to a shared schema:

Platform	Calls	Words	Transcript Format	Refresh
Platform A (Sales)	2,160	12.3M	Speaker IDs mapped to parties	Hourly incremental + weekly full
Platform B (CS)	2,189	14M	Array of {speaker, text}	Hourly incremental + weekly full

The speaker resolution problem

This was harder than expected. Each platform has its own speaker identification system:

Platform A: Speaker IDs are different from user IDs. Speaker names come from a parties field on each call, not from the users table. You have to join across three different entities to figure out who said what.
Platform B: Transcripts arrive as arrays of {speaker, text} objects. Speaker names are sometimes first names only, sometimes full names, sometimes phone numbers.

I solved this by building a speaker mapping layer that normalizes names across both platforms. It's not perfect — occasional unknown speakers still appear — but it's good enough that downstream queries like "find all calls where Sarah from the team discussed pricing" actually work.

Dual refresh strategy

New calls need to appear quickly (for dashboards), but full data integrity matters too (for analytics). I run both:

Hourly incremental sync: Queries the latest call timestamp from the warehouse, fetches only newer calls, UPSERTs (no delete). This keeps dashboards near-real-time.
Weekly full refresh: DELETE + INSERT with batch processing (5-20 transcripts per batch, one-at-a-time retry on failures). This catches any gaps the incremental missed — retroactive edits, backfilled transcripts, deleted duplicates.

The batch processing matters because transcripts are large. A 30-minute call generates a transcript that's 5,000-10,000 words. Trying to INSERT 2,000 of those at once will timeout any serverless function. Batching with retry logic handles this gracefully.

Semantic Search with pgvector

Keyword search on transcripts is barely useful. People don't say the same words every time they discuss the same topic. A customer describing churn risk might say "we're evaluating alternatives", "the contract is up for renewal and we're not sure", or "our team is frustrated with the onboarding experience." All mean the same thing. None share keywords.

I embedded every transcript chunk using Gemini Embedding 2 and stored 10,041 vectors in pgvector. Now you can search by meaning:

-- "Which calls discuss churn risk?"
SELECT call_id, content, similarity
FROM call_embeddings
ORDER BY embedding <=> query_embedding
LIMIT 20;

This surfaces relevant calls regardless of the exact words used. It's the difference between finding 3 results and finding 30.

Automated Insight Extraction

Raw transcripts are useful for search. But the real value comes from extracting structured insights automatically:

Event mention extraction

I built a keyword + context extraction pipeline that scans all transcripts for mentions of specific topics: conferences, product features, competitors, pricing objections, integration requests. 894 event mentions extracted across 4,349 calls. This powers downstream dashboards that show which topics are trending in customer conversations without anyone reading a single transcript.

Sentiment classification

Each extracted mention gets a sentiment tag: positive, neutral, or negative. This lets us answer questions like "are customers talking about competitor X positively or negatively?" and "is sentiment around feature Y improving after the last release?"

Voice of Customer quotes

455 customer quotes extracted and categorized by topic (onboarding, support, product feedback, enrollment). These feed directly into certification impact analysis, marketing case studies, and product feedback reports — all automated, no manual listening required.

What This Unlocks: Use Cases by Team

The warehouse doesn't just exist for analytics. It feeds 6 different systems:

Sales

Pre-call prep with full account history. AI-generated pitch decks pull relevant call insights. Competitive mentions tracked across all conversations.

Customer Success

account review prep with call sentiment trends. Churn risk signals from transcript analysis. Support teams discuss product health 2x more than sales (I measured).

Partner / Affiliate Management

Track how partners talk about your product on calls. Surface optimization opportunities from partner feedback patterns. Identify top-performing partner relationships by conversation quality.

Product

Feature request mining across all calls. Pain point frequency analysis. "Customers mentioned X 47 times this quarter" is a stronger signal than a feature request ticket.

Events & Marketing

894 event mentions extracted. Which conferences are reps pitching? Which events do customers ask about? Which competitor stories come up most?

Knowledge Base

4,585 docs indexed for an enterprise AI agent. Call transcripts are one of the richest sources — they capture the questions customers actually ask, not the ones you assume they ask.

Industry Applications

This architecture works for any business that records customer conversations. Here's what it looks like across industries:

SaaS / Tech

The original use case. Sales and support calls unified for churn prediction, account review prep, and competitive intelligence. If you have more than 50 customer calls per month across two or more platforms, you're sitting on searchable intelligence that nobody can access.

Real Estate / Property Management

Agent calls with buyers, tenant complaints, vendor negotiations. Extract: pricing trends, common objections, maintenance request patterns, neighborhood preferences. Feed into lead scoring and property matching.

Healthcare / Telehealth

Patient intake calls, follow-up consultations, insurance coordination. Extract: symptom frequency, medication questions, appointment friction, satisfaction signals. (Compliance note: you'll need HIPAA-compliant storage and consent tracking.)

Financial Services / Insurance

Claims calls, policy discussions, advisor consultations. Extract: common claim patterns, policy confusion points, competitive switching signals, compliance language usage.

E-commerce / D2C

Support calls, returns conversations, VIP customer calls. Extract: product quality signals, sizing/fit issues, competitor mentions, loyalty indicators. Feed into product development and inventory planning.

Agencies / Consulting

Client calls, prospect discovery sessions, project check-ins. Extract: scope creep signals, satisfaction trends, upsell opportunities, common pain points by vertical. The agency that can say "here's what your industry peers are asking about" wins the pitch.

The Technical Gotchas

1. Speaker IDs are not user IDs

Every call recording platform has a different model for who's speaking. Don't assume you can just join speaker_id to user_id. Budget time for building a speaker resolution layer.

2. Transcripts are bigger than you think

A 30-minute call is 5,000-10,000 words. A 60-minute call is 10,000-20,000 words. Multiply by thousands of calls and you're dealing with hundreds of megabytes of text. Batch your INSERTs, implement retry logic, and use cursor-based pagination on the API side.

3. Incremental sync is necessary but insufficient

Hourly incremental sync catches new calls quickly but misses retroactive changes (edited transcripts, backfilled metadata, platform corrections). You need a weekly full refresh as a safety net. It's more expensive but it's how you maintain data integrity.

4. Rate limits are the real bottleneck

Both platforms I integrated enforced ~60 requests per minute. When you're doing a full refresh of 2,000+ calls, that's 30+ minutes of API time. Design for this: run full refreshes during off-hours, implement backoff, and split large syncs across multiple pg_cron windows.

5. Embedding everything is expensive (but worth it)

10,041 embeddings isn't free. I used Gemini Embedding 2, which is relatively cheap, but embedding 26M words still takes time and API credits. Chunk transcripts by speaker turn (not by arbitrary character count) for better semantic coherence. And only re-embed changed transcripts on incremental runs.

6. The AI features on recording platforms are limited

Both platforms offered some built-in AI features (summaries, action items). Both had significant gaps — one returned 405 errors on the AI content endpoint (not available on our plan), the other's search only indexed metadata, not full transcripts. Don't rely on the platform's built-in AI. Build your own extraction layer.

The Stack

Architecture

Database: Supabase Postgres + pgvector (HNSW indexes)
Ingest: 4 Edge Functions (incremental + full refresh per platform)
Scheduling: pg_cron (hourly incremental, weekly full refresh)
Embeddings: Gemini Embedding 2 (1536 dimensions)
Extraction: Keyword + context extraction, sentiment classification
Downstream: AI presentation generator, enterprise knowledge agent, churn alerts, event dashboards, content pipeline
Cost: Supabase free tier for storage, Gemini API for embeddings. No servers.

What I Actually Found

The architecture is useful. But the insights are what made people care. Here are three that changed how teams operate:

Support talks about product health 2x more than sales

I measured topic frequency by team. Customer Success reps spend twice as much call time discussing product performance, traffic quality, and technical issues compared to Sales. Sales focuses on pricing, onboarding timelines, and competitive positioning. This seems obvious in hindsight, but nobody had measured it before. It changed how the company staffed product feedback channels.

The most effective competitive story comes from customer pain

I found 26+ mentions of a specific competitor's domain failure in call transcripts. Reps who told that story closed more deals. But most reps didn't know it existed. Surfacing the winning competitive narratives from transcript data gave the whole team the best reps' playbook.

Zero reps cited ROI proof on calls

Despite having case studies and impact data, not a single rep referenced quantitative ROI proof during product pitch calls. The data existed. Nobody used it on calls. This led directly to building the context-injected presentation generator — because the problem wasn't that reps didn't have the data, it was that the data wasn't in front of them during the call.

"I've been on this team for two years and I had no idea what CS was hearing from customers. Now I can search it in 10 seconds."
— a sales rep, after the warehouse went live

Should You Build This?

If you record customer calls on any platform, you're already paying for the data. It's sitting in a vendor's database, searchable only through their UI, siloed from everything else you know about your customers.

(A project like this generates dozens of architectural decisions, API gotchas, and speaker mapping rules that you need to remember across sessions. I used Brain Kit — a persistent memory server for AI tools — to capture all of it. Semantic search across past decisions saved me from re-learning the same lessons.)

Building a call intelligence warehouse isn't a massive infrastructure project. It's 4 Edge Functions, a Postgres database with pgvector, and some pg_cron jobs. The hard parts are speaker resolution (budget a day for mapping logic) and transcript batching (don't try to INSERT 2,000 transcripts at once).

The payoff: every team in your company gets access to what customers actually say — not what the CRM says they said, not what the meeting notes summarize, but the actual words. Searchable by meaning, not just keywords. That's a competitive advantage most companies are leaving on the table.

From the shop

Brain Kit ($29)

Brain Kit uses the same pgvector + semantic search architecture from this post. Give every AI tool you use persistent, searchable memory.

Get Brain Kit — $29

Like what I build? Check out the shop — deploy-ready kits starting at $14. The Client Intelligence Kit ($19) is the productized version of this kind of customer data analysis.

More from the build log

I write about building AI-powered data systems, automation pipelines, and developer tools. No hype, just what works.