A deep dive into a self-hosted AI operations dashboard that replaces cloud dashboards with a privacy-first, hardware-aware control plane. Learn how the Service tab was rebuilt to handle long commands without layout breaks, how service control works at the system level, and why a single source of truth for your AI stack matters.

Sovereign Grid Dashboard: Architecture, Service Tab Overhaul, and Service Control Pattern

New here? The Self-Hosted AI: Start Here hub article covers the broader stack this service runs inside: the hardware tree, the inference engine choice, the minimum-viable deploy. Read that for context, then come back here for the service-specific details.

The Sovereign Grid Dashboard is a self-hosted operations center for a Sovereign AI stack, built to replace cloud dashboards with a system that understands your hardware and respects your privacy.

Quick Take

  • Replaces cloud dashboards with a local, hardware-aware control plane
  • Service tab now handles long commands without layout breaks
  • Service control runs via systemd with sudoers whitelisting
  • Single source of truth for 12+ services across 6 categories

What the Sovereign Grid Dashboard Actually Does

The Sovereign Grid Dashboard is a single-page web interface that exposes the state of a self-hosted AI grid. It runs on a loopback FastAPI backend (Python) and a reactive frontend (React 18 without JSX) served from a single dashboard.html file. The dashboard shows real-time resource usage, service health, Tor hidden services, filesystem integrity, backup status, and control endpoints for GPU services and pipelines.

The backend (grid_api.py) consumes about 41 KB of Python code and exposes endpoints under /api/, while the frontend weighs in at 72 KB with all CSS and JavaScript inlined. It listens on port 8443 for local access and 9443 for Tailscale via Caddy. Authentication uses a Bearer token stored at /data/secrets/dashboard/api_token.

In my case, I use this dashboard to monitor a DGX Spark ARM64 server running Mistral Small 4 119B models. The dashboard replaces a cloud-based monitoring tool that required exposing metrics publicly, which I no longer want to do.

The Service Tab Before and After the Overhaul

The Service tab previously used a CSS Grid with minmax(175px, 1fr) for card layout. When a user clicked a card to expand it, the detail panel rendered inside the grid cell, limited to 175px width. Commands with long paths or URLs wrapped into five or more lines, often mid-word. The expanded cell also pushed adjacent cards out of alignment because it increased the grid row height.

The fix involved three changes. First, I lifted the expanded state out of the card component entirely. The card became stateless, receiving expanded and onToggle as props, while the state lived in the parent app component. The expanded state resets when the user switches tabs.

Second, I moved the detail panel out of the grid and rendered it as a full-width block below the grid container. It’s now a sibling of the grid, not a child, so it takes the full container width without constraints. The code structure looks like this:

e('div', {key: category},
  e('div', {style: {display: 'grid', gridTemplateColumns: 'repeat(auto-fill, minmax(220px, 1fr))', gap: 8}},
    ...cards.map(card => e(Card, {expanded: expanded === card.id, onToggle: ...}))
  ),
  expanded && e('div', {style: {width: '100%')}, /* tips */)
);

Third, I increased the minimum card width from 175px to 220px. Even collapsed, this gives service names and short descriptions enough horizontal space. For example, the card for Mistral Small 4 119B now displays “Mistral Small 4 119B” on one line instead of wrapping.

This means the Service tab no longer breaks layout when showing long commands, and users can copy-paste commands without manual line breaks.

How Service Control Works at the System Level

The /api/service/control endpoint executes predefined command chains as asyncio subprocesses. Each supported service and action maps to an entry in _SVC_CMDS, a dictionary of command lists. For example, starting the sglang service runs two systemd commands:

("sglang", "start"): [
  {"cmd": ["/usr/bin/sudo", "/usr/bin/systemctl", "start", "sglang-healthcheck.timer"]},
  {"cmd": ["/usr/bin/sudo", "/usr/bin/systemctl", "start", "sglang-mistral4.service"]},
],

The endpoint validates the service and action before lookup. If the combination isn’t in _SVC_CMDS, it rejects the request with a 400 error. This prevents undefined actions like restarting sglang even though the endpoint only supports start and stop.

Each job tracks status, service, action, and logs. The frontend polls /api/service/job every two seconds. If a job is running, the endpoint returns a 409 conflict to prevent overlapping commands. After completion, the job status becomes done or error, and the log remains available for display.

To allow the dashboard to control services without a password, I added NOPASSWD entries to sudoers. For sglang, I used:

cipherfox ALL=(ALL) NOPASSWD: /usr/bin/systemctl start sglang-mistral4
cipherfox ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop sglang-mistral4
cipherfox ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart sglang-mistral4

For sovereign-mcp, I added a dedicated entry:

cipherfox ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart sovereign-mcp.service

This pattern ensures each service has a minimal, explicit sudoers entry. Wildcards are avoided because they reduce attack surface unnecessarily when only a handful of services need control.

The Service Catalog as a Single Source of Truth

The service catalog is defined in grid_api.py as a list of dictionaries. Each entry includes fields like id, name, category, tor_dir, local_port, access, container, ssh, and systemctl_service. For example:

{"id": "sovereign-mcp", "name": "Sovereign MCP", "category": "Development",
 "local_port": 8002, "systemctl_service": "sovereign-mcp.service", ...}

The active status check uses this catalog. If systemctl_service is set, it runs systemctl is-active <service>. Otherwise, it falls back to checking the Tor hidden service directory or container health.

This single source of truth eliminates duplication. When I added support for sovereign-mcp restart via the Service tab, I only had to update the catalog and the _SVC_CMDS dictionary. The frontend automatically picked up the new entry without additional code paths.

In practice, when I added the sovereign-mcp service, the catalog entry and the sudoers file were the only changes needed. The Service tab rendered the new card immediately, and the control endpoint worked on the first try.

Why the Service Tab Layout Matters for Sovereign AI

CSS Grid is powerful but brittle when content varies. Auto-fill with variable-width content creates layout instability when panels expand inline. Moving the detail panel out of the grid and into a full-width block eliminates this instability.

This matters because Sovereign AI stacks often include services with long identifiers or paths. A card for “Mistral Small 4 119B Instruct” needs space for the full name, and a tip like “curl -X POST http://localhost:8000/generate -H ‘Content-Type: application/json’ -d ‘{“prompt”:”…”}’” needs to render without line breaks.

The layout fix ensures the dashboard remains usable when you’re copying commands from the UI. In my case, I frequently copy model endpoints and systemd commands directly from the Service tab to a terminal. Before the overhaul, I had to manually reformat each command.

This is why the Service tab overhaul wasn’t just a frontend tweak. It’s part of building a dashboard that respects the realities of a self-hosted AI stack.

What I Actually Use

  • DGX Spark ARM64 server: Runs Mistral Small 4 119B models and hosts the dashboard.
  • systemd: Manages all services, including sglang and sovereign-mcp.
  • Caddy with Tailscale: Exposes the dashboard securely to my local network.

Operational lessons from the first month of use

Three things became obvious only after the dashboard had been running daily for a month.

The Service Tab is the wrong default landing view. Most checks I do are on the inference side (SGLang health, MCP tool-call rate, recent errors), not on starting/stopping services. Switching the default landing view to a unified status panel cut the daily click-overhead in half. The service controls are still important; they just are not the most-used path.

The single source of truth for service definitions is non-negotiable but expensive. Every time a new service ships (Voxtral, podcast-pipeline, future MCPs) the catalog needs an update, and forgetting it means the dashboard silently lists stale state. The fix is a CI check: if a systemctl unit exists on the host that the catalog does not know about, fail the build. Not yet implemented, tracked as a follow-up.

Service control without a password is convenient and has not yet caused an incident, but the sudoers rule is the kind of thing that becomes the post-mortem detail later. The mitigation is narrow scope (specific service, specific verb, no wildcards) and the fact that the dashboard itself is behind authentication. Worth re-checking quarterly that the scope has not drifted wider.

Stack

Sovereign Grid Dashboard

Self-hosted AI operations stack

6
Access Ports 8443/9443 via Caddy
5
Services 12+ services via systemd
4
Frontend React 18 (72KB inline)
3
Backend FastAPI (41KB Python)
2
OS Linux with systemd
1
Hardware DGX Spark ARM64 server