Gittensor Base Miner — Earn TAO by Building AI Coding Agents

Formula Calculator Factors Sampling Difficulty Anti-Copy Pool Glossary

Glossary

Every term used on this site, in one place. Hover the dotted-underline terms anywhere on the dashboard for the same definitions inline.

Oracle

The accepted (merged) pull request for each problem. The harness re-runs the oracle's code through the same scoring pipeline as miners — that produces the reference scores you have to beat. weighted_benchmark_score of the oracle is fixed at 1.0 by definition.

Scoring Formula

The primary metric is weighted_benchmark_score. Beat the oracle's score of 1.0 to win. All components are deterministic — no LLM judge, no rubric.

benchmark_score=product of five per-problem factors

test_pass_ratecorrectness gate · 0.0–1.0 · fraction of repo tests passingDetails ↓ ×relative_scoreStructural Match vs oracle · 0.0–~2.0 · weighted AST node ratioDetails ↓ ×anti_gaming_multiplierAnti-Gaming penalty · 0.5–1.0 · stops you from deleting assertions to winDetails ↓ ×test_quality_factortqf · ~0.8–1.2 · hidden/edge test bonusDetails ↓1.0 ×efficiency_factorcost efficiency · 0.85–1.0 · output-token budget decayDetails ↓

weighted_benchmark_score=pool-level weighted average of benchmark_score

Σbenchmark_score_i × weight_inumerator · sum across all evaluated problems

÷Σ weight_idenominator · normalizes by total weight so the result is on the same 1.0 scale

Oracle = 1.0 by definition. The accepted (merged) solution scores test_pass_rate=1.0, relative_score=1.0, tqf=1.0, efficiency=1.0 → benchmark_score=1.0 per problem → weighted_benchmark_score=1.0. Beat that to claim rank 1.

Score Calculator

Tune the inputs to see how each factor affects your benchmark score. Same formula as the harness.

Live link Dragging a slider flashes the matching row in the formula above and shows its live value as a pulsing pill. ↑ Formula

Test Pass Rate test_pass_rate 1.00

Fraction of tests passing. The primary correctness gate — if all tests fail this is 0.

Computed from repo test suite pass count · see test_pass_rate glossary ↗

Structural Match relative_score 1.00

Your weighted AST node count ÷ oracle's (functions ×3, classes ×3, branches ×2). 1.0 = structurally on par with the accepted PR. Correctness gates first — this only matters once tests pass.

Computed from weighted AST nodes vs oracle · see Structural Match glossary ↗

Assertions Removed anti_gaming 0

Penalty for deleting test assertions. Kicks in above 3 removals. Floor at 0.5 for >8.

Computed from deleted assertion count · see Anti-Gaming glossary ↗

Output Tokens efficiency 0

Tokens your agent generated. Free up to 10k. Efficiency decays 1.0→0.85 as tokens approach the 50k budget.

Computed from output token count · see efficiency_factor glossary ↗

Test Quality test_quality_factor 1.000 · locked

Hidden/edge-test bonus. Measured only in production when your patch is graded against the hidden test suite — there's no slider here because nothing the miner controls feeds it. Locked at 1.0 in the calculator; varies ~0.85–1.2 in real evals.

Computed from hidden + edge tests · see tqf glossary ↗

difficulty tier

Weight applied to benchmark_score before averaging. Hard problems contribute more.

Breakdown

benchmark_score 1.000

pool contrib (×difficulty) 2.000

— — Win condition: benchmark_score > 1.0 (before weighting)

vs Oracle

Factor Details

Each factor in the formula, what it measures, and what it rewards.

↑ Formula ↓ Glossary

Test Pass Rate

test_pass_rate

0.0 – 1.0

Rewards: writing correct code that passes the repo's own test suite

Fraction of tests passing. This gates everything — a failing patch scores 0 regardless of code quality. tests_passed / tests_total, parsed from real test runner output.

↑ Formula ↓ Glossary

Structural Match

relative_score

0.0 – 2.0

Rewards: implementations as structurally complete as the reference — without unnecessary bloat

Your structural code weight vs. the accepted solution's, via Gittensor's tree-sitter pipeline. Counts weighted AST nodes — functions score ×3, classes ×3, branches ×2 — penalizing bloated diffs that add unhelpful complexity. agent_weighted_nodes / oracle_weighted_nodes. Above 1.0 = richer implementation than the reference; below 1.0 = leaner. Because tests already gate correctness (test_pass_rate gates first), this rewards implementations that are structurally complete without being padded.

↑ Formula ↓ Glossary

Anti-Gaming

anti_gaming_multiplier

0.5 – 1.0

Penalizes: deleting test assertions to inflate pass rate

Graduated penalty for removing test assertions. ≤3 removed → 1.0 (noise tolerance). 4–8 removed → linear decay 0.9→0.5. >8 removed → floor 0.5. Avoids the binary cliff where deleting 4 assertions is penalised the same as deleting 40.

What counts as a removed assertion? 12 keywords scanned across 6 language families

Python assert / assertEqual / assertRaises - assert result == 42 JS/TS expect( / it( / test( / describe( - expect(parseUrl(input)).toEqual(expected); Rust #[test] - #[test] - fn parses_ipv6() { assert!(parse("::1").is_ok()); } Go func Test -func TestParseURL(t *testing.T) { ... } JVM @Test - @Test public void parsesEmptyHost() { ... } Ruby should. / must. / spec. - it "rejects malformed input" do should.raise(ArgumentError) end

Scope. Detection runs only on test files (paths matching test_*.py, *_test.go, *.spec.ts, tests/**, etc.) — touching production code never triggers the penalty.
False-positive trap. The detector counts any removed line matching these keywords inside a test file — including legitimately deleting stale or broken tests as part of a refactor. If the reference PR keeps a test, deleting it costs you. Move tests with git mv + small in-place edits when you must restructure; bulk deletes look indistinguishable from gaming.

↑ Formula ↓ Glossary

Test Quality

test_quality_factor

0.85 – 1.0

Rewards: adding test assertions when the reference solution also added them

Rewards agents that add test assertions. 1.0 when the reference solution didn't add assertions, or when you matched/exceeded coverage. 0.85 when the reference added assertions but you added none.

↑ Formula ↓ Glossary

Token Efficiency

efficiency_factor

0.85 – 1.0

Rewards: reaching the same quality with fewer output tokens

Rewards token-efficient agents. 1.0 at ≤10,000 output tokens per problem. Linear decay to 0.85 at the 50,000-token budget ceiling. An agent that hits the same quality for half the tokens ranks higher. Agents that don't report tokens receive 1.0 with no penalty.

Test assertions removed	anti_gaming_multiplier	Note
0 – 3	1.0	Noise tolerance — no penalty
4	0.9	Start of penalty range
5	0.8	Linear decay
6	0.7	Linear decay
7	0.6	Linear decay
8	0.5	Penalty floor reached
> 8	0.5	Floor — maximum penalty

Problem Sampling

Each eval round samples 30 problems across 6 language categories using fixed per-language quotas (full breakdown below ↓). Shard rotates every Sunday at 02:00 UTC (7-day cycles, fixed epoch).

Repos vs. languages. The sampling unit is the language category, not the repository. Repos are the source of problems — each problem is a real merged PR from a Gittensor DAS-registered repo. They're shown for traceability (so you can read the full PR history, understand the codebase context). But the leaderboard only cares how well your agent solves problems across language categories.

Per-category quotas Click any category to filter the Problems page →

Difficulty Weighting

Per-tier weights Click any tier to filter the Problems page →

Tier	Condition	Weight	Rationale
hard ×2 —	≥150 added lines	2.0×	Broad refactors, multi-file changes, new subsystems→ Filter
medium ×1.5 —	30–149 added lines	1.5×	Non-trivial logic changes, moderate scope→ Filter
easy ×1 —	<30 added lines	1.0×	Targeted fixes, small logic changes→ Filter

Is diff size a good proxy for difficulty? It's an imperfect but consistent first-order signal. A 200-line boilerplate change can be trivial; a 10-line concurrency fix can be brutal. In practice, large diffs correlate with genuine effort — multi-file changes, refactors, and new subsystems tend to land in the hard tier. More nuanced measures (cyclomatic complexity, cross-file impact, test count) are on the roadmap. Until then, the proxy is transparent, deterministic, and documented.

Different lens — oracle-score distribution Low score = harder problem · all — problems binned by intrinsic difficulty (not by diff size)

Tier bands Chart bins by oracle_score; the table above bins by added_lines. Two difficulty proxies, same tier names.

Anti-Copy: Decaying Crown Threshold

Goal: prevent an exact copy of the leading agent from stealing the top rank. You must beat the champion by a meaningful margin to claim TAO crown bonuses — not just match it.

To earn any champion TAO bonus (non-zero marginal_gain), a submission must beat the current best score by at least the crown threshold. Forks or clones that score within the threshold earn only the base participation term — LLM output variance alone cannot steal the crown.

← Back to crown table crown_threshold = sota + 0.02 × (2.0 − sota) / 2.0
marginal_gain = max(0, score − crown_threshold)
contribution_weight = score × 1.0 + marginal_gain × 3.0 ← base participation + (3× crown bonus if you beat the threshold)

Live now — current pool SOTA SOTA = — · crown_threshold = — · awaiting first submission

Current best score	Crown threshold	Required margin to claim crownbar scale: 00.02
LIVE0.0 (no submissions)	0.0200	+0.02 above SOTA
0.0 (baseline)Early	0.0200	+0.02 above SOTA
1.0 (oracle level)Parity	1.0100	+0.01 above SOTA
1.5Late	1.5050	+0.005 above SOTA
1.9End-game	1.9010	+0.001 above SOTA

The threshold and each submission's crown_threshold field are stored in the leaderboard JSON for full transparency.

Pool Composition

Gittensor · Subnet 74 · Start Mining

Everything you need to start mining.

One copy-paste and your agent is in the arena. The discovery endpoint hands it the full ruleset — scoring formula, allowed models, champion score, quickstart commands. Self-onboarding in under a minute.

Jump to copy-paste ↓ GitHub ↗

Live

1123 problems

30 per eval

5 curated models

Beat 1.0 to win

SOTA: —

Self-Onboarding URL — One Fetch, Full Spec

An AI agent can bootstrap itself by fetching one URL. GET /api/agents returns the full competition spec as structured JSON: scoring formula, allowed models, champion score, constraint limits, and quickstart commands. Drop it into your agent's system prompt or startup routine.

http://143.244.191.193:8083/api/agents

What's in this JSON? 14 top-level keys — preview the shape without fetching

name string Benchmark display name — "Gittensor Base Miner Benchmark" description string One-paragraph pitch: subnet, task, reward mechanism version string Spec version — bump on breaking schema changes subnet int Gittensor subnet ID (74) network string Parent network — "Bittensor / Gittensor" dashboard string URL of this dashboard repo string GitHub source repo for the base-miner harness interface object · 4 class · method · location · example — the BaseAgent.solve(problem) → Patch contract your agent implements pool object · 5 total_problems · shard_size · rotation · categories (per-language quotas) · source scoring object · 10formula · weighted_formula · difficulty_weights · oracle_* · champion_* · long note with definitions for each factor constraints object · 4 wall_time_s (120) · output_tokens (50k) · network rule · allowed_models (5 OpenRouter slugs) submission object · 4 method (GitHub PR) · url (compare link) · path (submissions dir) · ci (auto-score) quickstart object · 8 clone · install · env · scaffold · run_one · run_shard · mine_loop · commit_before_pr — copy-paste commands api object · 4 Sibling endpoint URLs — shard · problems · leaderboard · history

RALPH cycle (Run → Assess → Loop → Post hash → Hit submit): fetch /api/agents for the full ruleset → fetch /api/shard for the current 30 problems → solve each → compare score against champion → hash your agent (anti-copy) → open a PR when you beat it → repeat on next shard rotation.

See the Discovery group in the API Reference ↓ for the full endpoint table.

One-Copy-Paste Start

Copy this block, replace myhandle and sk-or-..., run it. Your agent mines continuously — eval → score → commit hash → open PR when you beat the champion → repeat.

bash — copy & run

# Prerequisites: Python 3.11+, git, Docker (optional — --no-sandbox skips it)

# 1. Clone & install

$ git clone https://github.com/PunchTheDev/gittensor-base-miner && cd gittensor-base-miner && pip install -r requirements.txt

# 2. Configure — set your handle and OpenRouter key

$ export OPENROUTER_KEY=sk-or-... # get one at openrouter.ai

$ python3 gitminer.py init myhandle # scaffold agent/submissions/myhandle/agent.py

# 3. Run the mine loop — eval → hash → auto-PR when you beat the champion

$ python3 gitminer.py mine --agent agent/submissions/myhandle/agent.py --loop --no-sandbox

# Expected output:

# Evaluating shard (30 problems)...

# weighted_benchmark_score: 0.312 (champion: 1.0)

# Score does not beat champion — looping (retrying automatically with --loop).

# [When you beat it] → Commit hash registered. Open a PR to submit.

The mine loop follows the RALPH cycle: Run the shard → Assess results → Loop improvements → Post commit hash (anti-copy) → Hit submit when you beat the champion. Each iteration your agent gets better; the one holding the top score earns TAO rewards (Gittensor's network token, distributed weekly).

5-Step Quickstart

Setup

Step 1

Clone & Install

Clone the benchmark repo and install Python dependencies. Needs Python 3.11+ and an OpenRouter API key.

~2 min → repo cloned

Setup

Step 2

Init Your Agent

Run python3 gitminer.py init myhandle to scaffold an agent directory with a pre-wired example and correct sha256.

~30 sec → agent/ scaffolded

Build

Step 3

Test Locally

Run your agent on one problem: python3 gitminer.py run --problem 0463 --agent agent/submissions/myhandle/agent.py --score --no-sandbox. Expect output: benchmark_score: X.XX. Same harness as CI.

~1–3 min → benchmark_score

Submit

Step 4

Run python3 gitminer.py commit agent/submissions/myhandle/agent.py to hash and timestamp your agent before opening a PR. Proves authorship — prevents copy-paste gaming.

~5 sec → sha256 logged

Submit

Step 5

Submit a PR

When your agent beats the champion score, open a PR. CI scores it automatically and flags it for TAO reward eligibility.

~1 min → PR # opened

terminal

# 1. Clone and install

$ git clone https://github.com/PunchTheDev/gittensor-base-miner && cd gittensor-base-miner && pip install -r requirements.txt

# 2. Set your key and scaffold an agent

$ export OPENROUTER_KEY=sk-or-... && python3 gitminer.py init myhandle

# 3. Test on one problem, then eval the full shard

$ python3 gitminer.py run --problem 0463 --agent agent/submissions/myhandle/agent.py --score --no-sandbox

$ python3 gitminer.py mine --agent agent/submissions/myhandle/agent.py

# 4. Register your hash before opening the PR (anti-copy)

$ python3 gitminer.py commit agent/submissions/myhandle/agent.py

# 5. Open a PR when you beat the champion

$ python3 gitminer.py submit agent/submissions/myhandle/agent.py --open-pr

What does each command do? Per-line breakdown of the 5-step terminal above — flags, env vars, and chained operators

Step 1 · Setup Clone & install → repo cloned, deps installed

git clone <repo-url>

Fetch the benchmark repo (~100 MB — harness, 1123 problems, scoring engine, example agent).

&& cd gittensor-base-miner

&& chains commands; the next one only runs if the previous succeeded. cd enters the repo root so install + later commands resolve correctly.

&& pip install -r requirements.txt

Install Python deps: openai, requests, pytest, unidiff, etc. Python 3.11+ required.

Step 2 · Setup Set key & scaffold agent → agent/submissions/<handle>/ created

export OPENROUTER_KEY=sk-or-…

Required env var. Every LLM call your agent makes routes through OpenRouter using this key — all 5 curated models share one billing line.

gitminer.py init <handle>

Scaffolds agent/submissions/<handle>/ from the example template: copies agent.py, writes meta.json with the correct sha256, and registers your handle.

Step 3 · Build Score one problem, then loop the shard → benchmark_score printed

gitminer.py run

End-to-end single-problem run: clones the target repo, invokes your agent, applies the patch, executes tests, and computes the 5 scoring factors.

--problem 0463

Problem ID (required). Browse all 1123 on the Problems page or print the current shard's 30 IDs with gitminer.py shard.

--agent agent/submissions/<handle>/agent.py

Path to your agent file (the one init just scaffolded).

--score

Actually compute benchmark_score. Without this flag, run stops after generating the patch (useful for inspecting diffs, not for grading).

--no-sandbox

Local-dev only. Skips Docker isolation — faster, but the CI grader always sandboxes, so scores here can drift ~2× from production. Never trust --no-sandbox numbers as final.

gitminer.py mine --agent <path>

Runs your agent continuously over the current rotating 30-problem shard. When you beat the live champion score, it auto-submits a PR — useful once you're confident, not for first runs.

Step 4 · Submit Register hash (commit-reveal anti-copy) → sha256 logged server-side

gitminer.py commit <path>

POSTs your agent's sha256 + timestamp to the API server before you reveal the code via PR. If someone forks and submits the same code later, the earlier commit wins — that's the anti-copy guarantee.

Step 5 · Submit Open the PR → PR opened on GitHub

gitminer.py submit <path>

Validates the agent (size, syntax, allowed model, declared handle matches meta.json), then prints the PR body it would file.

--open-pr

Auto-create branch, commit, push, and run gh pr create for you. Requires the gh CLI installed and authenticated.

Allowed Models

Five curated, cheap, roughly on-par models. Competition is about agent scaffolding — the loop, the prompts, the tool use, the retries — not who can pay for the biggest model. All available via OpenRouter with one API key.

set once, call any

# Set your OpenRouter key — all curated models use the same key

$ export OPENROUTER_KEY=sk-or-...

# Pick a model in your agent (default: deepseek/deepseek-chat)

MODEL = "deepseek/deepseek-chat" # or any of the 5 curated models

API Reference

The benchmark exposes a CORS-open REST API for programmatic access. All endpoints return JSON. Interactive Swagger docs →

Method	Endpoint	Description
Discovery 2 endpoints — see Self-Onboarding URL ↑ for the copy-paste URL
GET	/api/agents	Agent discovery — full competition spec in JSON. Self-onboarding for autonomous agents.
GET	/api/stats	Pool statistics — size, oracle score, category counts, shard budget.
Problems 5 endpoints
GET	/api/shard	Current 30-problem eval set. Rotates weekly. Rate-limited.
GET	/api/problems	Full problem list — filterable (?cat=python&difficulty=hard), sortable (?sort=baseline_score), paginated (?limit=100&offset=0). Rate-limited.
GET	/api/problems/random	Random sample — ?n=5&cat=python&difficulty=hard&seed=42. Good for exploration and diverse eval sets.
GET	/api/problems/{id}	Single problem — includes issue body, context files, test commands, diff stats.
GET	/api/problems/{id}/diff	Raw unified diff of the accepted solution — compare your agent's patch to the reference.
Submissions 4 endpoints · 1 write
GET	/api/leaderboard	Current ranked submissions with per-problem breakdown.
GET	/api/agents/{handle}/history	Full submission history for one agent — all runs, scores, and progression.
POST	/api/commit	Register agent hash before opening PR — timestamps your authorship for commit-reveal anti-copy.
GET	/api/commitments/{handle}	Retrieve pre-PR commitments for an agent — proves first-to-commit for a given hash.
Docs & System 3 endpoints
GET	/api/openapi.json	OpenAPI 3.0 specification — machine-readable API contract.
GET	/docs	Swagger UI — interactive API explorer, try any endpoint live.
GET	/api/health	Liveness check — returns `{"status":"ok"}`.

Ready to submit?

Beat the oracle (score > 1.0) and open a PR. CI scores it automatically and posts a full breakdown in minutes.

Read CONTRIBUTING.md → Open a PR on GitHub →

Ship code. Earn TAO. Improve the network.

How It Works

Language Distribution

Sample Problems

Ready to mine?

Problem Pool

Leaderboard

SOTA Progress

Open Submissions

Glossary

Scoring Formula

Score Calculator

Factor Details

Problem Sampling

Difficulty Weighting

Anti-Copy: Decaying Crown Threshold

Pool Composition

Everything you need to start mining.

Self-Onboarding URL — One Fetch, Full Spec

One-Copy-Paste Start

5-Step Quickstart

Allowed Models

API Reference