Configuration¶
WhyGraph reads an optional whygraph.toml at your repo root. Every field has a built-in default, so
an unedited file behaves exactly as if none were present. whygraph init scaffolds a fully-commented
whygraph.example.toml for you - copy it to whygraph.toml and edit what you need.
whygraph.toml is gitignored - never commit a token
init adds whygraph.toml to .gitignore precisely because it can hold API keys. Keep it that
way. Use whygraph.example.toml (committed) for documentation, whygraph.toml (ignored) for
secrets.
The full tree¶
The values shown are the defaults.
log_level = "INFO" # DEBUG | INFO | WARN | ERROR | CRITICAL
[scan]
max_workers = 2 # parallel LLM calls in the diff-analyzer crawler
provider = "off" # source-control backend for the PR/issue crawl:
# "off" - skip the remote crawl (default)
# "github" - pull PRs/issues from the GitHub remote
# "auto" - detect from the remote URL (github only, for now)
remote = "origin" # git remote whose URL is inspected for provider="auto"
# token = "ghp_..." # GitHub token for the gh CLI. Default: read GH_TOKEN /
# GITHUB_TOKEN from env (or an existing `gh auth login`).
[analyze]
provider = "anthropic" # which [llm.*] adapter writes per-commit descriptions
# model = "claude-haiku-4-5" # override the provider's model for analysis only
# max_diff_chars = 50000 # diff truncated past this length before prompting
# large_commit_file_count = 30 # commits touching more files are described per-file on demand
# pr_origin_min_commits = 5 # recover a squash-merged PR's original commits past this size
# timeout_sec = 60 # per-call timeout; default: the adapter's own
[rationale]
provider = "anthropic" # which [llm.*] adapter writes the rationale card
# model = "claude-haiku-4-5" # override the provider's model for rationale only
# timeout_sec = 60 # per-call timeout; default: the adapter's own
# pr_roster_max_commits = 30 # squashed-commit headlines shown per PR in the prompt
# pr_discussion_max_comments = 20 # PR comments shown per PR in the prompt
# pr_comment_max_chars = 500 # each PR comment clipped to this length
# Override default DB locations (resolved relative to this file):
# whygraph_db = ".whygraph/whygraph.db"
# codegraph_db = ".codegraph/codegraph.db"
# Optional rotating file log. Console (stderr) logging is always on.
# [logging]
# file = ".whygraph/logs/whygraph.log"
# level = "DEBUG" # default: inherit top-level log_level
# max_bytes = 5_000_000
# backup_count = 3
[llm.anthropic]
model = "claude-opus-4-7"
# api_key = "sk-ant-..." # default: read ANTHROPIC_API_KEY from env
timeout_sec = 60
[llm.openai]
model = "gpt-4o"
# api_key = "sk-..." # default: read OPENAI_API_KEY from env
# base_url = "..." # default: https://api.openai.com/v1
timeout_sec = 60
[llm.deepseek]
model = "deepseek-chat"
# api_key = "sk-..." # default: read DEEPSEEK_API_KEY from env
timeout_sec = 60
[llm.ollama]
model = "llama3"
# host = "http://localhost:11434"
timeout_sec = 120
# `claude_cli` (Python attribute) and `claude-cli` (TOML idiom) both parse.
[llm.claude_cli]
model = "claude-opus-4-7"
# api_key = "sk-ant-..." # default: subscription billing (strips the env var)
timeout_sec = 120
Section by section¶
| Section | What it controls |
|---|---|
top-level log_level |
Console log verbosity. |
[scan] |
The crawl: parallelism, which remote provider to use, the git remote name, and an optional pinned GitHub token. |
[analyze] |
The per-commit LLM diff descriptions written during scan - provider, model, and the truncation / per-file thresholds. |
[rationale] |
The whygraph_rationale_brief card - provider, model, and how much of a squash-merged PR is rendered into the prompt. |
whygraph_db / codegraph_db |
Override either database path. |
[logging] |
An optional rotating file log, in addition to the always-on stderr log. |
[llm.*] |
Per-provider client settings - model, key, timeout, and base_url / host where relevant. |
Environment variables¶
Omit an api_key from an [llm.*] table and WhyGraph reads the standard environment variable
instead.
| Variable | Used for |
|---|---|
ANTHROPIC_API_KEY |
The anthropic LLM adapter. |
OPENAI_API_KEY |
The openai LLM adapter. |
DEEPSEEK_API_KEY |
The deepseek LLM adapter. |
GH_TOKEN / GITHUB_TOKEN |
The gh CLI during the remote crawl, when [scan].token is unset. |
Provider keys degrade gracefully
Missing a key for the analysis or rationale phase isn't fatal - that phase skips, and the rest of the scan still runs. Descriptions and cards backfill once a credential is available.