← Back

About

An open, developer-driven quality index for AI coding agents.

The Problem

AI coding agents have good days and bad days. Sometimes the issues stem from service interruptions or backend problems. Other times a model just seems off — and it's hard to tell whether the fault lies with the task, your codebase, or something deeper.

Without data, it's impossible to know. The companies building these models have that data, but developers don't. The people with the best intuition for how these tools actually perform — working developers — have no systematic way to share what they know.

Why Benchmarks Aren't Enough

Benchmarks like SWE-bench and HumanEval are useful controlled experiments, but they measure performance in a vacuum — not in your codebase and not based on your lived experience as a developer. Models are frequently over-optimized for individual evaluation frameworks and can behave very differently in practice.

As developers become more dependent on coding agents, we need an independent, real-world source of truth about how they're actually behaving.

What Code Canary Does

Code Canary is a lightweight, open feedback system that lets developers rate their AI coding sessions and publishes the results as a public, continuously-updated comparison dashboard. Here's how it works:

  1. You install a hook. A single shell command that fires when your coding agent session ends. It takes about 30 seconds to set up.
  2. You rate your session. A small prompt asks: how did the agent do? A quick 10-second interaction — no code ever leaves your machine.
  3. Ratings are anonymized and aggregated. No code, no prompts, no file names. Just structured metadata: which tool, which model, and your rating.
  4. The dashboard updates. Aggregated data powers a public leaderboard that anyone can explore — sliced by model, agent, and more.

The Canary in the Code Mine

The name comes from canaries in coal mines, where the birds served as early warning systems for lethal gases. Code Canary works with a similar idea (thankfully with zero avian casualties): when a tool's quality starts slipping — maybe after a model update, maybe after a rushed release — the developers using it every day will know first. Their ratings are the earliest, most reliable signal available.

Get Started

Install the hook for your preferred AI coding tool and start rating your sessions:

curl -sL https://codecanary.net/install | bash

The more developers who contribute ratings, the more useful the data becomes for everyone. Explore the dashboard to see current results, or read the FAQ for details on privacy, data collection, and how the system works under the hood.