Claude Council: when one agent isn't enough

2026-02-15

I built a small thing called Claude Council over a Saturday. It turns a set of Claude Code skills into a structured debate. I want to talk about why I built it more than what’s in the repo, because the repo is the kind of thing that becomes irrelevant in six months, and the reason behind it probably won’t.

Chat is the wrong shape

When you ask one LLM “should we do X?”, you are mostly asking it to confirm. Models are trained to be helpful. In a conversation with one user, helpful reads as agreeing with the framing you brought in. The answer comes back well-written, internally coherent, persuasive. None of that tells you whether it’s right.

This is fine when the question is easy. Most of my LLM use is easy. Write the function, summarise the doc, generate the test, translate the email. One persona, one answer, done.

It starts breaking when the question is actually a decision. A decision is rarely a single question. “Should we ship this feature?” is at minimum four questions stapled together: engineering, cost, product, regulatory. Probably more. The chat box flattens all of them into one answer written by one persona, optimised to sound reasonable. You read it, you nod, you move on. Months later you realise the answer was reasonable but wrong, because nothing in the loop pushed back.

There was no friction anywhere in the loop. Friction is most of the value you get from talking a decision through with another person, and the chat box is shaped specifically to remove it.

What the thing does

You name a proposal. You pick a few personas, say CTO, CFO, head of design, head of sales. Each persona is a Claude Code skill with a perspective and a set of things it cares about. They each take a position on the proposal. A neutral facilitator reads the positions and finds the real disagreements, not the polite ones. Everyone gets a rebuttal round with the disagreement made explicit. At the end there’s a verdict: GO, NO-GO, or CONDITIONAL, with the reasoning attached.

It is roughly the same idea as Karpathy’s llm-council, with one inversion. Karpathy uses different models for diversity. The bet is that GPT, Claude, and Gemini reading the same prompt will land in different places, and the spread is the signal. I went the other way. Same model, different skills. In 2026 the prompt is doing more work than the underlying weights. A well-written CFO persona on Sonnet pushes back in CFO ways more reliably than a generic GPT prompt would, because the persona is the thing carrying the perspective, not the model behind it.

Both approaches probably work. I had Claude Code skills sitting there, I had a Saturday, this version was the one I could build.

What it changes when you actually use it

The disagreement is the value. When all four personas agree, I trust the output less, not more. Unanimity usually means I didn’t give the system anything hard to argue about, or I framed the proposal in a way that pre-committed all four to the same answer. Two personas tearing into each other on a clean clash is the only outcome where I learn something.

Where I’ve actually used it: picking between two technical approaches when I can’t pick. Pressure-testing a product cut before committing engineering to it. Reading a hostile draft of an email I’m about to send to a board member or investor. None of those are problems where I want an agreeable answer. They’re all problems where I want the version that says “no, here’s what you’re missing.”

It is not a team. It does not replace the people who actually know the business. It is a structured prompt with disagreement baked in. The point is to convert a chat into something closer to a meeting where people argue, which is usually where the better answer comes from.

It also doesn’t work for everything. For “write me an email” you don’t want a debate, you want a writer. For “explain this to me” you want a teacher. Council is for the kind of question where the answer is genuinely contested, not for routine production.

The honest part

It is 99% vibe-coded in an afternoon with an agent in the loop. There are no tests. There is no roadmap. The repo is on GitHub because it’s easier than not, not because it’s a product.

What I’d want anyone reading this to take away is the pattern, not the implementation. The repo is one weekend’s attempt at it. Forced disagreement, structured rounds, written reasoning that you can go back and look at later. As LLMs keep getting better at sounding confident, more of the interesting engineering work shifts to the question of where the friction has to come from, because it isn’t going to come from inside the model. Council is one tiny version of that. There are obviously better ones to be built, and probably most of them will be.

Repo: github.com/MiguelCabralOliveira/claude-council.