Architecture
CommitBee is ~18K lines of Rust compiled to a single static binary with LTO.
Crate Structure
src/
βββ main.rs # Entry point, tracing setup
βββ lib.rs # Library exports (for integration tests)
βββ app.rs # Application orchestrator (all the glue)
βββ cli.rs # CLI argument parsing (clap derive)
βββ config.rs # Configuration loading (figment layered)
βββ error.rs # Error types (thiserror + miette diagnostics)
βββ domain/
β βββ change.rs # FileChange, StagedChanges, ChangeStatus
β βββ symbol.rs # CodeSymbol, SymbolKind, SpanChangeKind
β βββ diff.rs # SymbolDiff, ChangeDetail (structural AST diffs)
β βββ context.rs # PromptContext β assembles the LLM prompt
β βββ commit.rs # CommitType enum (single source of truth)
βββ services/
βββ git.rs # GitService β gix for discovery, git CLI for diffs
βββ analyzer.rs # AnalyzerService β tree-sitter parsing via rayon
βββ context.rs # ContextBuilder β evidence flags, token budget
βββ differ.rs # AstDiffer β structural comparison of old/new symbols
βββ safety.rs # Secret scanning (24 patterns), conflict detection
βββ sanitizer.rs # CommitSanitizer + CommitValidator
βββ splitter.rs # CommitSplitter β diff-shape + Jaccard clustering
βββ progress.rs # Progress indicators (indicatif spinners, TTY-aware)
βββ llm/
βββ mod.rs # LlmBackend enum dispatch, SYSTEM_PROMPT
βββ ollama.rs # OllamaProvider β streaming NDJSON
βββ openai.rs # OpenAiProvider β SSE streaming
βββ anthropic.rs # AnthropicProvider β SSE streaming
Key Design Decisions
Hybrid Git β gix (pure Rust) is used for fast repo discovery, but the git CLI is used for diffs and staging operations.
This avoids the complexity of reimplementing diff parsing in pure Rust while keeping startup fast.
Full File Parsing β Tree-sitter parses the complete staged and HEAD versions of files, not just the diff hunks.
Diff hunks are then mapped to symbol spans. This means CommitBee knows the full context of what changed, not just the changed lines.
Enum Dispatch β The LLM provider uses an enum (LlmBackend) rather than a trait object.
This avoids async-trait overhead and the complexity of dyn dispatch for async methods.
Streaming with Cancellation β All providers support Ctrl+C cancellation via tokio_util::CancellationToken.
The streaming display runs in a separate tokio task with tokio::select! for responsive cancellation.
Token Budget β The context builder tracks character usage (~4 chars per token) and truncates the diff if it exceeds the budget, prioritizing the most important files.
The budget adapts based on available information: when structural AST diffs are present, the symbol allocation shrinks (20%) since the diffs carry precise detail; when only
signatures are available, symbols get 30%. The default 24K char budget (~6K tokens) is safe for 8K context models.
Single Source of Truth for Types β CommitType::ALL is a const array that defines all valid commit types.
The system promptβs type list is verified at compile time (via a #[test]) to match this array exactly.
Error Philosophy
Every error in CommitBee is:
- Actionable β Tells you what went wrong and how to fix it (via
miettehelp messages) - Typed β Uses
thiserrorfor structured error variants, not string errors - Diagnostic β Error codes like
commitbee::git::no_stagedfor programmatic handling
No panics in user-facing code paths. The sanitizer and validator are tested with proptest to ensure they never panic on arbitrary input.
Testing Strategy
CommitBee has 424 tests across multiple strategies:
| Strategy | What It Covers |
|---|---|
| Unit tests | Individual functions (sanitizer rules, type parsing, config defaults) |
| Snapshot tests (insta) | Output format stability |
| Property tests (proptest) | Never-panic guarantees for parsers |
| Integration tests (wiremock) | Full provider round-trips with mocked HTTP |
| Git fixture tests | Real git operations in temp directories |
Run them:
cargo test # All 424 tests
cargo test --test sanitizer # Just sanitizer tests
cargo test --test integration # LLM provider mocks
COMMITBEE_LOG=debug cargo test -- --nocapture # With logging