The agent stack

Five layers under every robot we ship.

We document the architecture because the buyers who matter ask. If your CTO wants a deeper read or a custom variant, write us. We will send the whitepaper and a sample repo.

L5

Telemetry & Ops

Health, latency, intent logs, alerts, fleet view, training feedback loop. The dashboard a real on-call engineer watches at 2am when something is off.

  • Prometheus
  • Grafana
  • Sentry
  • Custom audit log
  • PagerDuty / Opsgenie hooks

L4

Agent Layer

Task graph, brand voice, persona, safety envelope, escalation paths, handoff to humans. This is the layer that makes a robot feel like a coworker instead of a parrot.

  • Custom Python orchestrator
  • LangGraph patterns where they fit
  • Pydantic schemas
  • YAML personas
  • Eval harness with graded test set

L3

Models

Whatever model wins for the task at hand. Long-context reasoning here, fast tool-calling there, on-device speech for latency, on-device LLM when the network is unreliable.

  • Claude 4 family for reasoning
  • GPT-4o for tool calling
  • Llama 3 8B Q4 on Orin (fallback)
  • Whisper-small.en (ASR)
  • Piper / ElevenLabs (TTS)
  • Moondream 2 (VLM)

L2

Orchestration

The wiring between the SDK and the agent. Message bus, task scheduler, sensor fusion, the bits that ROS 2 does well and the bits we replace because ROS 2 does not.

  • ROS 2 Humble
  • micro-ROS bridge
  • DDS
  • Custom Python services
  • Redis for ephemeral state
  • PostgreSQL for episodic memory

L1

Vendor SDK

The hardware floor. Manufacturer SDK, low-level motor control, sensor I/O, on-robot networking, kill switch, charging dock protocol.

  • Unitree SDK 2 (low + high level)
  • Boston Dynamics SDK
  • Figure / 1X partner APIs
  • CAN bus + GPIO for retrofits
Operating principles

Five rules we will not break.

  1. Models change. Interfaces should not. The agent layer talks to a model interface, not a vendor SDK. Swapping Claude for a local Llama is a config change, not a refactor.
  2. Every agent has an escalation path. If the agent cannot answer with a confidence threshold, it hands off to a human with structured context. No silent failures.
  3. Evals are not optional. Every deployment ships with a graded eval set. Regressions break the build, not the customer experience.
  4. Telemetry is a first-class citizen. If we cannot see what happened in production, we did not deploy a system, we deployed a prayer.
  5. The kill switch is hardware, not software. Anything that moves around humans gets a physical, visible, single-action stop. No exceptions.
Want the deeper read?

Send your CTO. We will send the whitepaper.