Trust Scorecard methodology

How the GitGenAI Trust Scorecard is computed for MCP servers and Anthropic Agent Skills. One pure function, registry-published signals only, recomputed live on every request.

Scope is deliberately narrow

The Trust Scorecard scores registry-published signals — provenance, transparency, stability, connectivity, freshness for MCPs; provenance, activation quality, determinism, discoverability, freshness for skills. It does not scan code, audit runtime behaviour, or check for CVEs.

Pair it with deep scanners (AgentSeal, Astrix, AgentForge) for code-level depth. The two layers compose — directory hygiene from us, behavioural depth from them.

Score → grade

Five dimensions sum to 0–100. Grades follow a school-grade curve, with thresholds chosen so an A+ requires near-perfect signals across every dimension and an F is reserved for entries with effectively no public footprint.

GradeScore rangeInterpretation
A+95100Anthropic-grade. Verified publisher, complete metadata, fresh, deterministic install.
A8594Excellent. Known publisher, well-documented, recently verified.
B7084Good. Public repo, some metadata gaps but installable.
C5069Acceptable. Enough to evaluate, missing meaningful signals.
D3049Caution. Significant signal gaps — verify before depending on it.
F029Avoid. Anonymous or stale, can't be reasonably trusted from registry data alone.

MCP scorecard dimensions

Source code: src/app/lib/mcp-trust-score.ts. Every input comes from the live /api/catalog/mcp response.

Provenance

Weight: 30 pts

30 / 100
  • Verified publisher (Anthropic, modelcontextprotocol, Cloudflare, Vercel, Microsoft, Google, Stripe, Linear, Atlassian, Sentry, Notion, Figma, Canva, Supabase, OpenAI) → 30
  • Public GitHub org but not in allowlist → 20
  • Repository link present, org unrecognised → 10
  • No public repository → 0

Transparency

Weight: 25 pts

25 / 100
  • Repository linked → +15
  • Logo / image URL present → +5
  • Description ≥ 100 characters → +5

Stability

Weight: 20 pts

20 / 100
  • Semver-pinned version (e.g. 1.4.2) → +10
  • Major version ≥ 1 (signals stable API contract) → +5
  • Published to multiple package registries (npm + PyPI etc.) → +5

Connectivity

Weight: 15 pts

15 / 100
  • At least one install method declared (packages or remotes) → +8
  • Transport type declared (stdio / sse / streamable-http) → +4
  • Both packages and remotes available (multi-modal install) → +3

Freshness

Weight: 10 pts

10 / 100
  • Verified ≤ 30 days ago → 10
  • ≤ 90 days → 7
  • ≤ 365 days → 4
  • Older than a year → 0

Skill scorecard dimensions

Source code: src/app/lib/skill-trust-score.ts. Every input comes from the live /api/catalog/skill response.

Provenance

Weight: 30 pts

30 / 100
  • Same allowlist as MCP — verified publisher → 30, known GH org → 20, recognised repo → 10, none → 0.

Activation

Weight: 25 pts

25 / 100
  • YAML frontmatter present → +5
  • `description` field ≥ 100 characters → +5
  • `description` contains trigger language (when, if, use this skill, any time, trigger, whenever) → +10
  • Body intro ≥ 50 characters of prose → +5

Determinism

Weight: 20 pts

20 / 100
  • Pinned to a 40-char commit SHA (immutable install) → +12
  • Tree mode — multi-file skill with scripts/ or references/ → +8
  • Blob mode — single SKILL.md → +4

Discoverability

Weight: 15 pts

15 / 100
  • Path follows the `skills/<name>` convention → +8
  • Repo name signals a skill catalogue (`skills`, `agent-skills`) → +7
  • Repo present but not skill-specific → +3

Freshness

Weight: 10 pts

10 / 100
  • Same age curve as MCP — ≤30d → 10, ≤90d → 7, ≤365d → 4, older → 0.

State icons in the breakdown

ok

Earned at least 80% of the dimension’s max.

partial

Earned more than zero but below the ok threshold.

miss

Earned zero on this dimension.

Determinism + freshness guarantee

  • The scorer is a pure function. Same inputs → same outputs, every render. No external API calls, no LLM judgments, no caches we have to invalidate.
  • Badges recompute from D1 and are cached in KV for 1 hour with matching edge cache headers. Registry data refreshes daily at 03:00 UTC, so the staleness ceiling is small while repeated README embeds avoid redundant D1/scorer work.
  • The verified-publisher allowlist is conservative by design. We’d rather give Anthropic a partial 20-point Provenance score until they ship a registered io.anthropic identity than mistakenly verify a typosquatter.
  • This page is the source of truth for the weights. If the scorer changes, this page changes in the same commit.

Embed a scorecard badge in your README

Open any MCP server or Anthropic skill detail page → “Trust Scorecard” → “Embed badge” tab. Markdown and HTML snippets are pre-built. The badge updates automatically as the underlying registry data changes.