Trust Scorecard methodology

Name: GitGenAI
Rating: 4.9 (50000 reviews)
Author: GitGenAI

How the GitGenAI Trust Scorecard is computed for MCP servers and Anthropic Agent Skills. One pure function, registry-published signals only, recomputed live on every request.

Scope is deliberately narrow

The Trust Scorecard scores registry-published signals — provenance, transparency, stability, connectivity, freshness for MCPs; provenance, activation quality, determinism, discoverability, freshness for skills. It does not scan code, audit runtime behaviour, or check for CVEs.

Pair it with deep scanners (AgentSeal, Astrix, AgentForge) for code-level depth. The two layers compose — directory hygiene from us, behavioural depth from them.

Score → grade

Five dimensions sum to 0–100. Grades follow a school-grade curve, with thresholds chosen so an A+ requires near-perfect signals across every dimension and an F is reserved for entries with effectively no public footprint.

Grade	Score range	Interpretation
A+	95–100	Anthropic-grade. Verified publisher, complete metadata, fresh, deterministic install.
A	85–94	Excellent. Known publisher, well-documented, recently verified.
B	70–84	Good. Public repo, some metadata gaps but installable.
C	50–69	Acceptable. Enough to evaluate, missing meaningful signals.
D	30–49	Caution. Significant signal gaps — verify before depending on it.
F	0–29	Avoid. Anonymous or stale, can't be reasonably trusted from registry data alone.

MCP scorecard dimensions

Source code: src/app/lib/mcp-trust-score.ts. Every input comes from the live /api/catalog/mcp response.

Provenance

Weight: 30 pts

30 / 100

•Verified publisher (Anthropic, modelcontextprotocol, Cloudflare, Vercel, Microsoft, Google, Stripe, Linear, Atlassian, Sentry, Notion, Figma, Canva, Supabase, OpenAI) → 30
•Public GitHub org but not in allowlist → 20
•Repository link present, org unrecognised → 10
•No public repository → 0

Transparency

Weight: 25 pts

25 / 100

•Repository linked → +15
•Logo / image URL present → +5
•Description ≥ 100 characters → +5

Stability

Weight: 20 pts

20 / 100

•Semver-pinned version (e.g. 1.4.2) → +10
•Major version ≥ 1 (signals stable API contract) → +5
•Published to multiple package registries (npm + PyPI etc.) → +5

Connectivity

Weight: 15 pts

15 / 100

•At least one install method declared (packages or remotes) → +8
•Transport type declared (stdio / sse / streamable-http) → +4
•Both packages and remotes available (multi-modal install) → +3

Freshness

Weight: 10 pts

10 / 100

•Verified ≤ 30 days ago → 10
•≤ 90 days → 7
•≤ 365 days → 4
•Older than a year → 0

Skill scorecard dimensions

Source code: src/app/lib/skill-trust-score.ts. Every input comes from the live /api/catalog/skill response.

Provenance

Weight: 30 pts

30 / 100

•Same allowlist as MCP — verified publisher → 30, known GH org → 20, recognised repo → 10, none → 0.

Activation

Weight: 25 pts

25 / 100

•YAML frontmatter present → +5
•`description` field ≥ 100 characters → +5
•`description` contains trigger language (when, if, use this skill, any time, trigger, whenever) → +10
•Body intro ≥ 50 characters of prose → +5

Determinism

Weight: 20 pts

20 / 100

•Pinned to a 40-char commit SHA (immutable install) → +12
•Tree mode — multi-file skill with scripts/ or references/ → +8
•Blob mode — single SKILL.md → +4

Discoverability

Weight: 15 pts

15 / 100

•Path follows the `skills/<name>` convention → +8
•Repo name signals a skill catalogue (`skills`, `agent-skills`) → +7
•Repo present but not skill-specific → +3

Freshness

Weight: 10 pts

10 / 100

•Same age curve as MCP — ≤30d → 10, ≤90d → 7, ≤365d → 4, older → 0.

State icons in the breakdown

ok

Earned at least 80% of the dimension’s max.

partial

Earned more than zero but below the ok threshold.

miss

Earned zero on this dimension.

Determinism + freshness guarantee

•The scorer is a pure function. Same inputs → same outputs, every render. No external API calls, no LLM judgments, no caches we have to invalidate.
•Badges recompute from D1 and are cached in KV for 1 hour with matching edge cache headers. Registry data refreshes daily at 03:00 UTC, so the staleness ceiling is small while repeated README embeds avoid redundant D1/scorer work.
•The verified-publisher allowlist is conservative by design. We’d rather give Anthropic a partial 20-point Provenance score until they ship a registered io.anthropic identity than mistakenly verify a typosquatter.
•This page is the source of truth for the weights. If the scorer changes, this page changes in the same commit.

Embed a scorecard badge in your README

Open any MCP server or Anthropic skill detail page → “Trust Scorecard” → “Embed badge” tab. Markdown and HTML snippets are pre-built. The badge updates automatically as the underlying registry data changes.

Browse MCP servers Browse Agent Skills