Deploy a privacy-respecting AI coding assistant with Mistral Small 4 and SearXNG using Docker on ARM64 hardware.

OpenHands Setup with Mistral-via-SGLang: The Multi-Arch Container Recipe

New to self-hosting AI? The Self-Hosted AI: Start Here hub walks the hardware-decision tree, inference-engine choice, and the operational gotchas that bite hardest in the first three months. Read it before or after this one, whichever fits your stage.


⚠ Update 2026-05-13: this stack retired OpenHands. The recipe below still works for anyone running OpenHands today (the fixes hold, the workarounds hold), but the agent layer on sovgrid moved to opencode as of 2026-05-13. Reason: the structural Microagent-injection bug in OpenHands #14287 (synthetic USER turn after every user message → Mistral strict-alternation 400) kept generating new shapes after each fix. Eight published fix-articles deep, the cost-benefit no longer favored keeping OpenHands in the chair. opencode is provider-agnostic, has no synthetic-USER-injection pattern, and ships as CLI + Electron desktop + opencode serve. Three frontends from one config. Plus the LLM-stack migration to Qwen3.6 (no alternation strictness in template) closes the bug class structurally. Migration recipe: opencode Setup: Self-Hosted AI Coding Assistant on ARM64.


Quick Take

  • Replace cloud AI coding assistants with a self-hosted alternative that respects privacy
  • Run Mistral Small 4 locally with Docker on ARM64 hardware
  • Integrate SearXNG for web search without exposing your queries to third parties

Setup notes (2026-05-03 polish pass): image tags and the local inference endpoint differ between hosts, the values shown below match a Sovereign-AI-Grid setup (Mistral Small 4 served by SGLang on DGX Spark, OpenHands as a client container). Always cross-check the current image tag at the official OpenHands docs and your own SGLang port before running.

Honest disclosure up front: OpenHands + Mistral is the agent setup I keep around to validate that the local stack still works end-to-end. It is not the tool I reach for daily. For day-to-day work I use OpenClaw (persona orchestration, Matrix bot) and Claude Code (cloud, the polish-pass driver for this very blog). The honest comparison:

ToolHostedModelBest forMy daily use
OpenHands + Mistralself-hosted (SGLang + Docker)Mistral Small 4 (119B MoE)sandboxed multi-step file edits, full agent loop on local hardwarerare, validation-only
OpenClawself-hosted (Side-Car-Proxy + SGLang)Mistral Small 4persona orchestration (cipherfox, hexabella), Matrix-bot, agent identity workactive for persona-driven work
Claude Code (CLI)cloud (Anthropic API)Claude Opus / Sonnet 4.xlarge-context reasoning across this codebase, editorial polish on Mistral output, planningdaily driver

Why this matters: a lot of “self-hosted everything” content under-reports that their authors still reach for cloud Claude when the work demands it. This blog does not pretend otherwise. Privacy-by-design is the floor, not a vow of Mistral-only purity.

You’ve outgrown cloud-hosted AI coding tools. The latency, the privacy concerns, the subscription costs, it all adds up. OpenHands gives you a local AI agent that writes code, fixes bugs, and searches the web without ever leaving your network. Here’s how to set it up right.

Two pieces matter and are worth separating up front: OpenHands is the agent client (a small Docker container), and the LLM that powers it runs somewhere else, an OpenAI-compatible inference server like SGLang or vLLM. The two communicate over HTTP. In our setup the LLM is Mistral Small 4 (a 119B-parameter MoE) served by SGLang on a DGX Spark, and OpenHands runs in its own container alongside SearXNG.

Deploy OpenHands with Docker on ARM64

sudo bash /scripts/recreate-openhands.sh

This script creates a Docker container named openhands on ARM64 hardware. The container itself is a thin coordination runtime, the heavy lifting happens on the inference server it talks to. ARM64 support matters when the agent client lives on a different machine than the inference server, e.g. an ARM-based Floki-style VPS calling back to a GPU-host SGLang endpoint. The container image is at docker.all-hands.dev/all-hands-ai/openhands (multi-arch manifest, ARM64 and AMD64 both resolved automatically).

The agent talks to your local OpenAI-compatible endpoint, in this setup http://sglang:30000/v1 (Docker-network name) or http://127.0.0.1:30000/v1 (host network). The api_key field is set to "not-needed-local" because SGLang doesn’t require authentication when running on a private network.

Gotcha: if you ever see standard_init_linux.go:228: exec user process caused: exec format error, your daemon is pulling the wrong architecture, double-check docker info | grep Architecture and the image manifest with docker manifest inspect docker.all-hands.dev/all-hands-ai/openhands:<TAG>.

Configure the Agent with config.toml

[llm]
model = "openai/Mistral-Small-Instruct-2501"
base_url = "http://sglang:30000/v1"
api_key = "not-needed-local"
native_tool_calling = true
drop_params = true
modify_params = true

[agent]
enable_prompt_extensions = false
system_prompt_filename = "custom_system_prompt.md"

[sandbox]
additional_networks = ["config_default"]
volumes = "/data/secrets/git-credentials:/root/.gitcredentials:ro,/data/openhands-state/.gitconfig:/root/.gitconfig:ro"

The [llm] section tells OpenHands to use Mistral Small 4 via the local SGLang endpoint. native_tool_calling enables function calling so the agent can execute shell commands and modify files directly. drop_params and modify_params strip OpenAI-specific request fields that SGLang and other local servers don’t accept.

The [agent] section disables prompt extensions, this is the load-bearing line for Mistral. Without enable_prompt_extensions = false, OpenHands inserts auxiliary system messages that put Mistral into an alternating-roles loop and the inference call fails with BadRequestError. The system_prompt_filename points to a custom prompt file you mount into the container at /etc/openhands/custom_system_prompt.md. Newer OpenHands releases use system_prompt_filename instead of the older system_prompt_addition, check the upstream changelog if your version differs.

The [sandbox] section configures the agent’s execution environment. additional_networks attaches the container to the same Docker network as SGLang and SearXNG, so the agent can call inference and web search without exposing traffic to external services. The volumes mount your Git credentials and config file read-only so the agent can clone private repos. /data/secrets/git-credentials holds your HTTPS credentials, /data/openhands-state/.gitconfig sets the commit name and email.

Mount Secrets and Config Files

docker run -v /data/secrets/git-credentials:/root/.gitcredentials:ro \
           -v /data/openhands-state/.gitconfig:/root/.gitconfig:ro \
           -v /etc/openhands/custom_system_prompt.md:/etc/openhands/custom_system_prompt.md:ro \
           docker.all-hands.dev/all-hands-ai/openhands:<TAG>

These mounts give the agent access to your Git credentials and config without letting it modify them. The .gitcredentials file contains HTTPS credentials for private repos, typically stored in /home/username/.git-credentials on your host machine. The .gitconfig file sets your name and email for commits, usually located at /home/username/.gitconfig.

Gotcha: If the agent can’t clone a private repo, double-check the permissions on /data/secrets/git-credentials. The container runs as root, so the file must be readable by root. You might see errors like fatal: could not read Username for 'https://github.com': No such device or address if permissions are incorrect.

OpenHands runs in the same Docker network as SearXNG. The agent uses SearXNG’s /search endpoint to perform web searches without leaking queries to Google or Bing. SearXNG is a privacy-respecting metasearch engine that aggregates results from multiple sources while keeping your queries local.

docker network create config_default
docker run -d --network config_default --name searxng -p 8080:8080 searxng/searxng:latest

Attach the OpenHands container to this network:

docker run --network config_default -p 30000:30000 docker.all-hands.dev/all-hands-ai/openhands:<TAG>

Now when the agent needs to search for documentation or examples, it queries SearXNG instead of an external API. The results are cached locally, so repeated searches are faster. You can verify the setup by visiting http://localhost:8080 in your browser to see SearXNG’s interface.

Gotcha: If SearXNG isn’t reachable, verify the container names and network attachment. Docker’s default bridge network isolates containers unless you explicitly attach them to a custom network. You might see errors like Failed to fetch search results: Connection refused if the network isn’t properly configured.

Memory Limits, Container vs Model

It is worth being explicit about what needs memory in this setup. The OpenHands container is small (a few hundred MB resident, 1, 2 GB working set is plenty for the agent runtime). The LLM is the part that needs serious memory, and it lives on the inference server, not in the OpenHands container.

# OpenHands client container, modest limits are fine
docker run --memory=2g --memory-swap=2g docker.all-hands.dev/all-hands-ai/openhands:<TAG>

For the inference side, Mistral Small 4 is a 119B-parameter MoE. Even at INT4 quantization it wants well over 100 GB of memory available to the inference server (DGX Spark’s unified 128 GB makes this workable). If you do not have GPU-class hardware, point OpenHands at a smaller model on a smaller server, the agent client does not care which model sits behind the OpenAI-compatible URL.

Gotcha: OOM killer terminated this process from the OpenHands container points to the agent runtime, raise the container limit to 3, 4 GB and check docker stats openhands. The same error from the SGLang container is a different problem entirely (insufficient host memory for the model weights or KV cache), and the fix is on the model-server side, not here.

What I Actually Use

  • For self-hosted agent work, daily: OpenClaw, not OpenHands. OpenClaw’s persona orchestration and Matrix integration fit my workflow better, OpenHands is around for the agent-loop validation case.
  • For polish-pass and large-context reasoning, daily: Claude Code (cloud). The blog itself is edited mostly through Claude Code sessions.
  • Inference layer: Mistral Small 4 (119B MoE) served by SGLang on DGX Spark, the model that pays back the hardware spend.
  • Search layer: SearXNG, self-hosted metasearch, no fixed pin, follow upstream searxng/searxng:latest.
  • Networking: a single Docker network for OpenHands + SGLang + SearXNG so all three resolve each other by container name.
Stack

Self-hosted AI Coding Agent

OpenHands architecture with local LLM and privacy

8
Git Volumes Mounted credentials
7
SearXNG Privacy search
6
OpenAI API Local endpoint
5
Mistral Small 4 Local LLM
4
OpenHands Coding agent container
3
Docker Container runtime
2
OS Linux host
1
Hardware ARM64 device

Was this worth it? Zap the article.

Value for value, no signup. Sats go straight to the writer.