Trust Scorecard methodology
How the GitGenAI Trust Scorecard is computed for MCP servers and Anthropic Agent Skills. One pure function, registry-published signals only, recomputed live on every request.
Scope is deliberately narrow
The Trust Scorecard scores registry-published signals — provenance, transparency, stability, connectivity, freshness for MCPs; provenance, activation quality, determinism, discoverability, freshness for skills. It does not scan code, audit runtime behaviour, or check for CVEs.
Pair it with deep scanners (AgentSeal, Astrix, AgentForge) for code-level depth. The two layers compose — directory hygiene from us, behavioural depth from them.
Score → grade
Five dimensions sum to 0–100. Grades follow a school-grade curve, with thresholds chosen so an A+ requires near-perfect signals across every dimension and an F is reserved for entries with effectively no public footprint.
| Grade | Score range | Interpretation |
|---|---|---|
| A+ | 95–100 | Anthropic-grade. Verified publisher, complete metadata, fresh, deterministic install. |
| A | 85–94 | Excellent. Known publisher, well-documented, recently verified. |
| B | 70–84 | Good. Public repo, some metadata gaps but installable. |
| C | 50–69 | Acceptable. Enough to evaluate, missing meaningful signals. |
| D | 30–49 | Caution. Significant signal gaps — verify before depending on it. |
| F | 0–29 | Avoid. Anonymous or stale, can't be reasonably trusted from registry data alone. |
MCP scorecard dimensions
Source code: src/app/lib/mcp-trust-score.ts. Every input comes from the live /api/catalog/mcp response.
Provenance
Weight: 30 pts
- •Verified publisher (Anthropic, modelcontextprotocol, Cloudflare, Vercel, Microsoft, Google, Stripe, Linear, Atlassian, Sentry, Notion, Figma, Canva, Supabase, OpenAI) → 30
- •Public GitHub org but not in allowlist → 20
- •Repository link present, org unrecognised → 10
- •No public repository → 0
Transparency
Weight: 25 pts
- •Repository linked → +15
- •Logo / image URL present → +5
- •Description ≥ 100 characters → +5
Stability
Weight: 20 pts
- •Semver-pinned version (e.g. 1.4.2) → +10
- •Major version ≥ 1 (signals stable API contract) → +5
- •Published to multiple package registries (npm + PyPI etc.) → +5
Connectivity
Weight: 15 pts
- •At least one install method declared (packages or remotes) → +8
- •Transport type declared (stdio / sse / streamable-http) → +4
- •Both packages and remotes available (multi-modal install) → +3
Freshness
Weight: 10 pts
- •Verified ≤ 30 days ago → 10
- •≤ 90 days → 7
- •≤ 365 days → 4
- •Older than a year → 0
Skill scorecard dimensions
Source code: src/app/lib/skill-trust-score.ts. Every input comes from the live /api/catalog/skill response.
Provenance
Weight: 30 pts
- •Same allowlist as MCP — verified publisher → 30, known GH org → 20, recognised repo → 10, none → 0.
Activation
Weight: 25 pts
- •YAML frontmatter present → +5
- •`description` field ≥ 100 characters → +5
- •`description` contains trigger language (when, if, use this skill, any time, trigger, whenever) → +10
- •Body intro ≥ 50 characters of prose → +5
Determinism
Weight: 20 pts
- •Pinned to a 40-char commit SHA (immutable install) → +12
- •Tree mode — multi-file skill with scripts/ or references/ → +8
- •Blob mode — single SKILL.md → +4
Discoverability
Weight: 15 pts
- •Path follows the `skills/<name>` convention → +8
- •Repo name signals a skill catalogue (`skills`, `agent-skills`) → +7
- •Repo present but not skill-specific → +3
Freshness
Weight: 10 pts
- •Same age curve as MCP — ≤30d → 10, ≤90d → 7, ≤365d → 4, older → 0.
State icons in the breakdown
ok
Earned at least 80% of the dimension’s max.
partial
Earned more than zero but below the ok threshold.
miss
Earned zero on this dimension.
Determinism + freshness guarantee
- •The scorer is a pure function. Same inputs → same outputs, every render. No external API calls, no LLM judgments, no caches we have to invalidate.
- •Badges recompute from D1 and are cached in KV for 1 hour with matching edge cache headers. Registry data refreshes daily at 03:00 UTC, so the staleness ceiling is small while repeated README embeds avoid redundant D1/scorer work.
- •The verified-publisher allowlist is conservative by design. We’d rather give Anthropic a partial 20-point Provenance score until they ship a registered
io.anthropicidentity than mistakenly verify a typosquatter. - •This page is the source of truth for the weights. If the scorer changes, this page changes in the same commit.
Embed a scorecard badge in your README
Open any MCP server or Anthropic skill detail page → “Trust Scorecard” → “Embed badge” tab. Markdown and HTML snippets are pre-built. The badge updates automatically as the underlying registry data changes.