smolagents Review: Hugging Face's Code-First Agent Library

TL;DR

smolagents is Hugging Face’s deliberately small, deliberately opinionated agent library — and it is having a moment. It crossed 20,000 GitHub stars in early 2025, hit 4,100+ new stars in the first two weeks of April 2026 (per Fazm’s monthly roundup), and was just included in Hugging Face’s headline “agentic stack” alongside transformers and datasets. It’s now the default answer when someone on r/LLMDevs asks “what’s the simplest way to build an agent in 2026?”

The pitch is unusually clear:

The agent loop fits in ~1,000 lines of Python (see agents.py)
First-class CodeAgent — the LLM writes Python code as its action, not a JSON tool call. HuggingFace’s benchmarks report ~30% fewer steps and higher scores on hard tasks
Model-agnostic — local transformers, Ollama, Hugging Face Inference Providers, OpenAI, Anthropic, Bedrock, Azure, or anything via LiteLLM
Tool-agnostic — pull tools from any MCP server, LangChain, or even a Hugging Face Space
Modality-agnostic — text, vision, video, audio
Sandboxes built in — E2B, Modal, Blaxel, Docker, or Pyodide+Deno WebAssembly
Hub integration — agent.push_to_hub("you/my_agent") and agent.from_hub(...) for sharing
Apache 2.0 licensed

If you’ve been holding off on agent frameworks because LangGraph feels like XML for ML or because CrewAI hides too much behind decorators, smolagents is the version of “agent library” that respects your time. This review covers what it does, where it shines, the sharp edges, and how it stacks up against browser-use and the other agent libraries we’ve reviewed.

The Core Idea: Agents That Think in Code

Most agent frameworks make the LLM emit a JSON object describing which tool to call:

{ "tool": "web_search", "args": { "query": "leopard top speed" } }

The runtime parses that, calls the function, feeds the result back, and asks for the next JSON. This is fine for one tool call. It gets ugly the moment you want to combine tools — search three queries, average the results, then format. Now the model has to do five JSON round-trips, each one a separate LLM call, each one another opportunity to forget the format.

smolagents flips this. The LLM writes a Python code snippet as its action, the runtime executes it, and the snippet can call multiple tools as ordinary functions, loop, store intermediate results, and do real computation:

requests_to_search = ["gulf of mexico america", "greenland denmark", "tariffs"]
for request in requests_to_search:
    print(f"Here are the search results for {request}:", web_search(request))

That single action does what a JSON-tool-call agent would need three turns to do. The HuggingFace team’s benchmark paper shows this pattern uses about 30% fewer LLM calls on multi-step tasks, and open models like DeepSeek-R1 actually beat closed models when wrapped in a CodeAgent. The catch — and they’re explicit about it — is that you must run the code in a sandbox. Arbitrary code execution is the entire point, so you take security seriously or you don’t ship.

Install and First Agent in 30 Seconds

The “with toolkit” install gets you the default tools (web search, Python interpreter, file ops):

pip install "smolagents[toolkit]"

Then:

from smolagents import CodeAgent, WebSearchTool, InferenceClientModel

model = InferenceClientModel()  # uses HF Inference Providers
agent = CodeAgent(
    tools=[WebSearchTool()],
    model=model,
    stream_outputs=True,
)

agent.run(
    "How many seconds would it take for a leopard at full speed "
    "to run through the Pont des Arts?"
)

The agent will: search for the leopard’s top speed, search for the bridge’s length, compute the division in code, and return the answer. You can watch the steps stream as Python blocks. The whole thing is maybe 8 lines of user code.

Swapping Models — This Is Where It Pays Off

This is the bit you actually care about as a builder, because nobody wants to be locked into one provider. smolagents supports every realistic option:

Hugging Face Inference Providers (Together, Fireworks, etc. via HF Hub):

from smolagents import InferenceClientModel
model = InferenceClientModel(
    model_id="deepseek-ai/DeepSeek-R1",
    provider="together",
)

LiteLLM as the universal gateway (100+ providers):

import os
from smolagents import LiteLLMModel
model = LiteLLMModel(
    model_id="anthropic/claude-4-sonnet-latest",
    temperature=0.2,
    api_key=os.environ["ANTHROPIC_API_KEY"],
)

OpenAI-compatible endpoints (OpenRouter, Together, Groq, vLLM, llama.cpp server):

import os
from smolagents import OpenAIModel
model = OpenAIModel(
    model_id="openai/gpt-4o",
    api_base="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

Local transformers (everything stays on your GPU):

from smolagents import TransformersModel
model = TransformersModel(
    model_id="Qwen/Qwen3-Next-80B-A3B-Thinking",
    max_new_tokens=4096,
    device_map="auto",
)

There’s also AzureOpenAIModel and AmazonBedrockModel for enterprise. Critically, none of this changes the agent code — you swap one line and the same CodeAgent runs against a different brain. This is the contrast with frameworks where the model integration is a leaky abstraction; here it’s a single object you pass in.

Sandboxes: The Part You Cannot Skip

CodeAgent executes Python that an LLM wrote. By default smolagents ships a LocalPythonExecutor that strips imports, blocks dangerous builtins, and limits attribute access. The README is unambiguous about it:

The built-in LocalPythonExecutor is not a security sandbox. It applies some restrictions but can be bypassed and must not be used as a security boundary.

For anything beyond a personal hobby script, you pick a real sandbox:

from smolagents import CodeAgent, WebSearchTool, InferenceClientModel

agent = CodeAgent(
    tools=[WebSearchTool()],
    model=InferenceClientModel(),
    executor_type="e2b",       # or "modal", "docker", "wasm"
    executor_kwargs={"api_key": os.environ["E2B_API_KEY"]},
)

E2B — managed Firecracker microVMs, simplest to wire up, generous free tier
Modal — same idea, often cheaper at scale
Blaxel — newer, agent-focused
Docker — self-hosted, full control, requires Docker daemon
Pyodide + Deno WebAssembly — runs the agent’s code in a browser-grade WASM sandbox; great for edge or untrusted environments where you can’t have a Docker socket

For production work, the realistic choice is E2B or Modal for cloud, or Docker for self-host. The WASM path is genuinely interesting for edge deployments — most agent frameworks have nothing equivalent.

Tools: Pull From Anywhere

smolagents treats tool plumbing as a library problem, not a framework problem. You can write your own:

from smolagents import tool

@tool
def get_user_balance(user_id: str) -> float:
    """Returns the current account balance for a user.

    Args:
        user_id: The user's UUID.
    """
    return billing.lookup(user_id)

But you usually won’t, because you can pull tools from:

MCP servers — ToolCollection.from_mcp(...) connects to any MCP server (filesystem, browser, GitHub, your custom MCP)
LangChain — Tool.from_langchain(some_lc_tool) wraps existing LangChain tools
HF Spaces — Tool.from_space("user/space-id") turns any Gradio Space into a callable tool
The Hub — agent.push_to_hub(...) / agent.from_hub(...) for sharing whole agents

The Hub integration matters more than it sounds. It means an agent built by someone else can be downloaded and run with one line, with all its tools and prompts intact — like Docker Hub for agents. As of April 2026, the Hub already has hundreds of community-pushed smolagent configurations.

CLI: `smolagent` and `webagent`

There’s a no-Python-required path too. After install, two commands appear on your $PATH:

# Generalist code agent from the terminal
smolagent "Plan a trip to Tokyo, Kyoto and Osaka between Mar 28 and Apr 7." \
  --model-type "InferenceClientModel" \
  --model-id "Qwen/Qwen3-Next-80B-A3B-Thinking" \
  --imports pandas numpy \
  --tools web_search

# Web-browsing agent (uses helium under the hood)
webagent "go to xyz.com/men, get to the sale section, click the first \
  clothing item, return the product details and price" \
  --model-type "LiteLLMModel" --model-id "gpt-5"

Running smolagent with no arguments launches an interactive setup wizard for picking the agent type, tools, model, and prompt. It’s the closest thing the agent space has to curl | sh-easy onboarding for non-Python users.

How CodeAgent Actually Works

The loop is the classic ReAct pattern with one twist:

User task is appended to agent.memory as a chat message.
The model is called with the full memory.
The model’s response is parsed for a Python code block.
The code is executed (in your sandbox of choice) — tool calls happen as Python function calls.
If the code calls final_answer(...), that argument is returned and the loop ends.
Otherwise, execution logs (stdout, return values, errors) are appended to memory and the loop continues.

This is significant because the model’s output is executed, not interpreted by a custom DSL. If the model writes print(web_search("foo")), you get the web_search result printed and added to memory exactly as Python would print it. There is no abstraction layer between “what the model wrote” and “what happened.” When debugging, you read code and stdout, not a tool-call trace.

There’s also a more conventional ToolCallingAgent for cases where you specifically want JSON-style tool calls (some smaller models handle that better than free-form code). Same agent loop, different action format.

Multi-Agent Hierarchies

A CodeAgent can have other agents as tools. That’s the entire multi-agent story — no special “manager” abstractions, no graph DSL, just composition:

from smolagents import CodeAgent, ToolCallingAgent, WebSearchTool, InferenceClientModel

researcher = ToolCallingAgent(
    tools=[WebSearchTool()],
    model=InferenceClientModel(),
    name="researcher",
    description="Searches the web and returns concise findings.",
)

writer = CodeAgent(
    tools=[],
    model=InferenceClientModel(),
    managed_agents=[researcher],
)

writer.run("Write a 200-word brief on Llama 4 Scout based on web sources.")

The writer can now call researcher(query=...) from inside its Python action like any other tool. This is the same pattern CrewAI gives you with more ceremony, and it’s the same conceptual model used by multica — just expressed as ordinary Python composition.

Community Reaction

The reception across r/LLMDevs, r/AI_Agents, and r/huggingface has been unusually positive for an agent framework. Common themes from the threads:

“The elegance of smolagents and PydanticAI is that you do that through idiomatic Python instead of a DSL, like conditional edges in LangGraph.”
“It’s definitely better than [chat HuggingFace] as a means of inference of chatmodels on Huggingface.”
The ~1,000 LOC core gets praised constantly. People actually read the source, which is rare for this category.

The honest gripes:

Documentation has been described as “brittle” — the API moves fast and examples sometimes lag releases.
The default LocalPythonExecutor is a footgun if you skim the docs and miss the warning. Hugging Face has tightened the wording over time but it still trips up newcomers.
Streaming with some local backends (especially older transformers versions) can be flaky.

Honest Limitations

Code execution is not optional — even with WASM, you’re committing to running model-written code somewhere. If your security review can’t tolerate that under any circumstances, use ToolCallingAgent (JSON-only) and accept the step-count hit.
Small models struggle with code actions — anything below ~7B parameters tends to write malformed Python. Code-agent mode pays off most clearly on Llama 4 / DeepSeek-R1 / Claude 4 / GPT-5-class models.
No first-class persistence — agent.memory is in-process. For long-running agents you build the persistence yourself or pair smolagents with something like claude-mem.
No built-in observability — there’s no Langfuse-style trace view bundled. You can hook stdout or use OpenTelemetry, but it’s BYO.
Not a workflow engine — if you need YAML-defined deterministic pipelines with verification gates, that’s Archon’s job, not smolagents’.
Vision / video tooling is thinner — supported, but the documentation skews heavily text. The web browser tutorial is the main vision example and you’ll be reading the source for anything more advanced.

FAQ

smolagents vs LangGraph — which should I pick? Different tools for different problems. LangGraph is a graph DSL with explicit state machines, conditional edges, and persistence; pick it when your workflow has well-known branching that you want to declare upfront. smolagents is a pure Python library with no DSL; pick it when the workflow is the model’s reasoning and you want minimum framework between your code and the LLM. For a single-team agent that’s mostly “search, compute, answer,” smolagents wins on simplicity. For a multi-step business workflow with retries, branches, and human approval gates, LangGraph (or Archon) is closer to the right shape.

Can I run it fully locally with Ollama? Yes. Use LiteLLMModel(model_id="ollama/qwen3-coder:32b", api_base="http://localhost:11434") or TransformersModel. The catch: agent quality drops fast below 7B params, and code-agent mode specifically needs a model that’s trained on code. Qwen3-Coder-32B and Codestral-2-22B are the current local sweet spot.

Is the LocalPythonExecutor safe for trusted prompts only? For your own machine running your own prompts, yes. The moment any input is user-controlled — chat over a webhook, an email-driven agent, anything multi-tenant — switch to E2B, Modal, Docker, or WASM. HuggingFace says this explicitly and it’s worth taking literally.

How does it compare to HuggingFace’s older transformers.agents? smolagents is the rewrite. The old transformers.agents is deprecated — same team, same ideas, but smolagents has the Code/ToolCalling split, sandbox integrations, Hub support, and MCP. If you find a tutorial that imports from transformers.agents, it’s stale.

Does it support MCP servers? Yes, via ToolCollection.from_mcp(StdioServerParameters(...)) or HTTP MCP transports. Any tool exposed by an MCP server becomes a regular Python function inside your agent’s action. This is one of the cleanest MCP integrations in any agent library.

Is it production-ready? For internal tooling, yes — it’s been live in HuggingFace’s own products since early 2025 and the agent loop is small enough to read in an afternoon. For customer-facing, multi-tenant production, you’re responsible for the sandbox choice, observability, and persistence. The core is solid; the surrounding ops story you build yourself.

Who Should Use smolagents Today

Builders prototyping agents who want to avoid framework lock-in
Anyone tired of LangChain’s surface area — the smolagents source is small enough to fork
Teams building on HF Inference Providers — the integration is first-class
Researchers running agent benchmarks across many models — swapping models is one line
Anyone with an MCP toolbox they want to wire up to a code-writing agent without bespoke glue

Skip it if: you need a visual workflow builder (Archon), pre-built coordinator/team patterns (multica), or a turnkey browser-only agent (browser-use). And if you’re committed to never executing model-written code, use ToolCallingAgent mode — but at that point most of the value prop evaporates.

The Bigger Picture

The agent library landscape in 2026 has bifurcated. On one side: heavy frameworks (LangGraph, CrewAI, increasingly Archon) that encode workflow structure as a first-class object. On the other: minimalist libraries that get out of the way — smolagents, PydanticAI, and block/goose.

smolagents’ bet is that as frontier models get better at code generation, the right level of abstraction collapses upward into the model itself. You don’t need a DAG of tool calls if the model can write the DAG inline as Python. The framework’s job becomes: hand the model a sandbox, a list of tools, and stay out of the way.

That bet is looking increasingly correct. Code-action agents now match or beat tool-calling agents on most public benchmarks. The next frontier is making the sandbox cheap enough (WASM) and the toolchain rich enough (MCP) that “write Python in a sandbox” becomes the default agent runtime — not a clever trick.

If you’ve been waiting to pick an agent library because the field looked too messy, this is the cleanest entry point left. Star it, read agents.py, and ship something this week.