Two Claudes, One Spec, and A Weekend: I Build My Own Local AI Notetaker

Refusing Zoom's new AI notetaker subscription, I built a local alternative in 48 hours. The two-Claude workflow that produced it turned out to be the more useful takeaway.

Published: 2026-05-14

I don't want a software vendor to have my meeting audio. And I don't want to pay a forever-subscription for transcription when my laptop can run it for free.

Those two refusals are the entire reason this weekend project exists.

On March 24, 2026, Zoom shipped an AI notetaker that transcribes any voice or video call happening on my device, including calls in apps that aren't Zoom. It's part of Zoom's AI Companion product and carries a recurring fee. My first reaction was annoyance as my private conversations now came with a price tag, set by a company I was only using because work required it. The fee was a rent on someone else holding the data and the inference.

My MacBook could actually do the same job locally. A 3B-parameter LLM and faster-whisper were already installed. So I built the alternative, a locally-hosted meeting transcription and summary tool.

Why The Refusal Isn't Paranoia

I live in California. That matters more for AI notetakers than most people realize.

The California Invasion of Privacy Act is a two-party consent statute with statutory damages of $5,000 per violation. Written in 1967 for telephone wiretapping, it now applies to internet communications and third-party software vendors that intercept calls. Otter.ai is facing a class-action lawsuit in California federal court alleging exactly this: that its notetaker secretly records across video platforms without all-party consent, trains models on the recordings, and offloads compliance onto its users. Zoom's new feature works the same way: a pop-up asks me to grant permission when a meeting starts, and the consent click moves the legal exposure onto me. The other party is never notified, and the transcribed content still flows to Zoom's cloud regardless.

A few other concerns stack on top of CIPA:

NDA exposure. Work-related meetings are often under NDA. An AI vendor that retains conversation data may not qualify as a permitted recipient under standard NDA language, which means routing a confidential conversation through Zoom's notetaker could itself be a breach of a non-disclosure agreement.
GDPR exposure. Any meeting with a participant in the EU brings GDPR into scope. The conversation data is personal data under EU law; processing it requires a lawful basis (usually explicit consent), and transferring it to US servers requires additional cross-border safeguards.
Attorney-client privilege. Conversations with a lawyer can lose privilege the moment a third party (here, Zoom) hears them, which makes the transcript discoverable in litigation.
Data leak. The moment Zoom's servers hold a transcript, the data has been disclosed to a third party the user never authorized. One breach at Zoom and the contents are outside Zoom's control.

Not generating the cloud-side transcript removes most of these asymmetries. The CIPA consent question stays on me regardless.

What I Built And Why

I called it LocalMeetingKit. It's a command-line tool on my MacBook with this shape:

Capture. Two audio streams kept separate: my microphone and whatever's playing in a browser tab. System audio is routed through BlackHole, a virtual audio driver.
Transcribe. faster-whisper, running locally, processing each stream independently.
Summarize. Ollama running qwen2.5:3b. Produces a structured Markdown summary with decisions, action items, open questions, and an overview.
Output. A session folder per recording with transcript.jsonl, notes.md, and the raw WAVs.

Nothing leaves the machine. For now, the raw WAVs stay on disk after each session. I plan to auto-delete them in a future version, once I've tested transcription quality enough to trust the transcripts without the audio backup.

Two use cases:

Browser meetings. Any Zoom, Meet, or Teams call I'm on in a tab. No bot joins the call. I ask the other party for verbal consent before starting.
Solo podcasts. My microphone-only stream is enough to record dictated material for blog drafts, talks, or essays-in-progress.

On-device inference has gotten quietly real, which makes this architecture credible now. A 3B model and faster-whisper running on a recent MacBook produce summaries and transcription usable without manual cleanup. I don't have to pay for a service that puts my data at risk.

The interesting part of the build was the compression: getting from a much bigger design space down to something I could finish in 48 hours.

How I Built LocalMeetingKit

From Brainstorm To Spec

Seeing those risks, the PM in me wanted to design the alternative. That same week, I opened a Claude chat to brainstorm, and the conversation ran about 3,000 words. My brainstorming momentum runs toward venture-scale products, and this one was no exception. The shorthand was "a local Granola," but the conversation kept growing: architecture options, a competitor scan, a commercialization analysis, five proposed product layers, a section on relationship memory as a long-term moat, and a four-week phased rollout.

The four-week plan was real, but it was the brainstorm's venture-scale momentum talking. So I opened a new Claude session with the brainstorm transcript as context, and applied first principles. As my own first user, I needed one loop: start a recording, capture audio locally, end with a transcript and structured notes on disk. Everything else got cut because the brainstorm had been secretly designing a product. Three examples:

Vector search and cross-meeting RAG. Useful for someone with months of meeting history to search through. The brainstorm was designing for a user I haven't become.
Consent UX as a first-class feature. For a personal tool, the consent layer is me deciding to run it. The brainstorm was thinking like a compliance product.
Relationship memory. The brainstorm called this out as the potential long-term moat. Moats are for products. This is a weekend tool.

I gave myself the weekend, and the session produced a spec that described every decision I'd made and why. Each got a letter and number, A1 through G4, with one paragraph naming the choice and the trade-off. The spec was about 1,800 words. An hour later, the weekend had a plan.

That session stayed open through the build. A second Claude joined me in VS Code and started typing.

The Two-Claude Setup

Through the weekend, two Claudes were running side by side. Each had a different job.

Chat-Claude lived in the Claude desktop app (the same session that produced the initial spec). During the build, it stayed open as my architect at the system/strategy level, but never wrote production code. It also reviewed execution notes, SPEC.md updates, terminal output from smoke tests that I pasted in. We'd discuss design options and model selection.
Code-Claude lived in VS Code. It joined when the spec was ready and broke it into smaller execution steps. For each step, it surfaced trade-offs requiring my decision, folded my choices into SPEC.md, then produced the code and a smoke test plan for me to run in the terminal.

SPEC.md was the contract between them. Code-Claude wrote the file from chat-Claude's design, then maintained it through the build. Chat-Claude reviewed each diff for blind spots.

One benefit was catching bugs. The pipeline transcribed audio in 30-second chunks with a small overlap, so a word at the edge of one chunk could also appear in the next. Decision G4 in the spec called for dedupe at the boundary. Code-Claude's first version mostly worked. But words that split across the boundary, broken mid-syllable, sometimes survived the dedupe and showed up twice in the transcript. Chat-Claude caught the edge case reviewing the diff and proposed pulling the dedupe logic into a shared helper, with unit tests specifically for boundary cases. Code-Claude wrote both. On replay, the log showed +0 segment(s): no duplicates introduced the second time through. Chat-Claude was good at this kind of catch because it could review the diff fresh against the spec, while code-Claude was inside the implementation and focused on whether the code compiled and the tests passed.

Another benefit was prompt translation. With chat-Claude already holding the design context, I stopped writing long-form prompts for code-Claude by hand. I'd describe a goal to chat-Claude, and it would translate into the precise instruction code-Claude needed. The prompts came out more succinct and more accurate than mine would have been.

Anthropic recently shipped /btw in Claude Code. It's for tagging side notes so they don't crowd the main context. But btws are sparse: scattered tags across a project, with nothing connecting them logically. Running two Claudes is the same instinct at a larger scale. Chat-Claude is the connecting thread.

What Broke In Unexpected Places

Three things broke during the build, and I found them by running the thing rather than just from code review.

A bug three modules away. After Phase 4 (crash recovery), I added a sanity check: finalize a session that was already complete. The operation should be a no-op, no new segments added, zero diff. I expected ten seconds. It took forty, and the diff had one new word: "there.", at 32.86 seconds. That word was spoken in the live session, but the live worker dropped it during clean shutdown. The worker was racing to exit before the last audio chunk finished processing. The test I wrote for crash recovery caught a bug three modules away from anything Phase 4 touched. The fix was easy once I knew where to look.

An empty file that wasn't a bug. For the Phase 3 smoke test I needed a transcript with real content to feed the summarizer. I grabbed the file closest to hand and read SPEC.md aloud for sixty seconds. The transcript came out clean. The summarizer ran. notes.md had three section headers and nothing under them: no decisions, no action items, no open questions. At that moment I thought qwen2.5:3b was broken. Then I reread what I'd dictated. It was pure technical exposition; nothing in it fit any of the three categories the model was looking for. The empty file was the right answer. A 3B model that refuses to invent items where none exist is doing exactly what I asked it to. The spec had quietly tested a property of the summarizer I hadn't thought to verify.

Two stacked layers beyond coding. I was testing the dual-stream capture: mic on one channel, BlackHole on another, two separate WAVs. I played a YouTube video to fake the "other side of the call" for the system stream. After two minutes of recording, I opened the files. audio_system.wav was silent. audio_mic.wav had my voice with YouTube audio underneath it.

OS routing. I'd assumed my Multi-Output Device was sending system audio to BlackHole. It wasn't. My system output was set directly to the soundbar, not to a Multi-Output Device, so BlackHole was receiving zero signal. The fix was in Audio MIDI Setup: create a Multi-Output Device with both BlackHole and the soundbar checked, then set it as system output. audio_system.wav came back to life on the next test.
Physics. The soundbar was still playing audio out into the room, and the mic was correctly picking it up. That wasn't a bug; the mic was working, the room was just acoustically transparent. The fix for clean recording later was headphones.

All three of these were hard to diagnose from inside the implementation. Story A's bug was in Phase 1 code, three modules from the Phase 4 change. Story B was correct behavior I'd misread as a bug. Story C's failure was in macOS audio configuration, a layer cloud-recorded products never have to navigate. Chat-Claude's wider view (spec, architecture, environment) located the cause each time. Code-Claude wrote the fixes.

What's Changed About How I Work

Typing speed used to be the bottleneck. Then it was decision-making capacity. Now it's structuring decisions for two different kinds of attention:

Spec-writing needs slow, skeptical attention: refusing premature abstraction, saying "no, defer that, you don't need it yet" when the agent suggests another feature.
Implementation needs fast, tactical attention: running tests, surfacing failures, iterating without revisiting design.

If you try to handle both in one Claude session, the two attentions don't share a head well. Switching between them costs time, and they leak into each other: implementation attention bleeds into spec attention as premature optimization, and spec attention bleeds into implementation attention as over-design.

There are several ways to engineer this separation:

Running two Claudes (cross-context). This way is the most complete resolution. Chat-Claude lives in slow, skeptical mode (architecture, design review, blind-spot catching). Code-Claude lives in fast, tactical mode (implementation, decision-surfacing, smoke tests). The separation runs continuously across the build.
/btw (sub-context tagging). Tag a side note so it doesn't crowd the main focus. Same head, different routing. Sparse tags don't connect logically across a project.
Cross-model debugging (cross-model). Codex reviewing code Claude wrote, or vice versa. The separation lives between model architectures, not session contexts. Useful for one-shot reviews, less suited to a continuous design thread.

All three are implementations of the same instinct: separate the author from the reviewer. In AI-augmented work, the reviewer's value comes from not having written the code under review.

The highest-leverage skill in this setup is PM-shaped work: writing crisp specs, refusing scope creep mid-build, designing smoke tests that catch boring failures, and pushing back on agent-suggested optimizations until they earn their place. I still need enough coding background to know when a decision actually matters.

Where This Lands

I built this tool for myself, not as a product. The setup worked for this greenfield build, but I don't know how well it generalizes to ambiguous research or unbounded refactoring. And CIPA consent stays my problem regardless; what LocalMeetingKit solves is data custody.

Sunday night, I tagged v1.0.0. When a brainstorm gave me the maximalist version of the product, the spec gave me the minimalist version I actually needed. LocalMeetingKit is what I'll use for now. The two-Claude workflow that built it is what I'll keep using elsewhere.