An open, developer-driven quality index for AI coding agents.
AI coding agents have good days and bad days. Sometimes the issues stem from service interruptions or backend problems. Other times a model just seems off — and it's hard to tell whether the fault lies with the task, your codebase, or something deeper.
Without data, it's impossible to know. The companies building these models have that data, but developers don't. The people with the best intuition for how these tools actually perform — working developers — have no systematic way to share what they know.
Benchmarks like SWE-bench and HumanEval are useful controlled experiments, but they measure performance in a vacuum — not in your codebase and not based on your lived experience as a developer. Models are frequently over-optimized for individual evaluation frameworks and can behave very differently in practice.
As developers become more dependent on coding agents, we need an independent, real-world source of truth about how they're actually behaving.
Code Canary is a lightweight, open feedback system that lets developers rate their AI coding sessions and publishes the results as a public, continuously-updated comparison dashboard. Here's how it works:
The name comes from canaries in coal mines, where the birds served as early warning systems for lethal gases. Code Canary works with a similar idea (thankfully with zero avian casualties): when a tool's quality starts slipping — maybe after a model update, maybe after a rushed release — the developers using it every day will know first. Their ratings are the earliest, most reliable signal available.
Install the hook for your preferred AI coding tool and start rating your sessions:
curl -sL https://codecanary.net/install | bash
The more developers who contribute ratings, the more useful the data becomes for everyone. Explore the dashboard to see current results, or read the FAQ for details on privacy, data collection, and how the system works under the hood.