# OpenRath: Session-Centered Runtime State for Agent Systems

> OpenRath introduces Session as a first-class runtime value for multi-agent systems, making state branchable, inspectable, and replayable.

- **Source:** [arXiv](https://arxiv.org/abs/2606.19409)
- **Published:** 2026-06-24
- **Permalink:** https://picx.dev/p/m23mUu
- **Whiteboard:** https://picx.dev/p/m23mUu/image

## Summary

## Summary (Overview)

- **Key contribution**: OpenRath introduces **Session** as a first-class runtime value for multi-agent systems, analogous to a tensor in PyTorch but for agent runtime state. The Session is branchable, inspectable, replayable, backend-aware, and composable, addressing the hidden-runtime-state problem where conversation chunks, tool effects, memory events, workspace placement, branch provenance, and replay evidence are fragmented across side channels.
- **Programming model**: A compact vocabulary of objects — Session, Sandbox, Tool, Agent, Memory, Workflow, and Selector — all follow the same `Session → Session` contract, making composition, forking, merging, handoff, and replay ordinary program operations rather than reconstructed states.
- **Backend-aware boundaries**: OpenRath separates runtime state from execution backends (local, OpenSandbox, MCP) and memory backends, so tool evidence, sandbox placement, and memory interactions become explicit session events.
- **Audit-first release**: The report maps all claims to evidence packets (lineage export, local sandbox, workflow transcript, focused tests, visual QA, claim ledger) and explicitly scopes broader benchmark, memory-quality, and live-provider claims to follow-on evaluation.

## Introduction and Theoretical Foundation

### Problem Statement
Modern agent systems suffer from fragmented runtime state. A long agent run — planning, forking branches, calling tools, editing files in sandboxes, recalling memory, compressing context — produces a correct final answer but leaves auditors unable to answer simple questions: *Which branch produced the result? Which tool modified which file? Which memory item was recalled? What evidence was removed during compression?* The state is scattered across controller code, tool logs, memory stores, workspace state, and provider traces.

### Central Claim
The paper’s central thesis is that **agent systems benefit from a first-class runtime state**, and OpenRath proposes **Session** as that state. The design is inspired by PyTorch’s architectural pattern (not its tensor mathematics): a central flowing value, reusable transformations with a uniform `forward` interface, explicit placement (`tensor.to(device)` → `session.to(backend)`), and persistent state (parameters → Memory). The analogy is architectural: agent runtimes need a stable flowing value, not that agent systems are neural networks.

### Theoretical Motivation
Three types of runtime records exist (Table 1):

| Record | Written for | What it primarily holds |
|--------|-------------|------------------------|
| Graph checkpoint | The scheduler | Where execution is in the control flow (resume/time-travel) |
| Trace span | The observer | What was observed (model calls, tool calls, guardrails) |
| **Session (OpenRath)** | **The agent program** | **The live value agents fork, merge, hand off, replay; lineage, tools, placement, memory, usage travel with it** |

A graph checkpoint or trace span is written for schedulers or observers; a Session is written for the agent program itself. This is why fork, merge, and replay become first-class runtime operations in OpenRath rather than reconstructions from side channels.

## Methodology

### Object Vocabulary (Table 3)
| Object | Runtime boundary |
|--------|------------------|
| **Session** | Flowing runtime value for chunks, placement, lineage, usage, pending work, tool evidence, and memory evidence when enabled. |
| **Agent** | Reusable `Session → Session` transformation with local prompt, provider, tools, and memory policy. |
| **Tool** | Model-visible callable operation backed by schema validation, session context, sandbox dispatch, and returned evidence. |
| **Sandbox** | Placement boundary for file, command, code, and external tool execution. |
| **Memory** | Intended persistent-state plane for recall and commit across runs, separate from prompt text. |
| **Workflow** | Composition surface for agents, tools, branches, compression, memory, and child workflows. |
| **Selector** | Runtime router over self-describing workflows: reads the current session and picks the next workflow, so dynamic control flow stays explicit. |

Key design principle: each object is narrowly scoped. Agent does not own conversation graph (lineage belongs to Session); Tool does not own placement (executes through active sandbox); Workflow does not create separate orchestration state (composes over sessions); Memory does not become hidden prompt text (recall/commit are visible runtime events).

### Runtime Architecture
Session lifecycle (Figure 4): Create → Place → Transform → Branch → Persist → Release.

- **Branching**: `fork` duplicates state preserving parent relation; `detach` starts new lineage root; `merge` joins compatible sessions (must share a live sandbox handle or target same unbound backend). Merge compatibility makes placement part of the runtime graph.
- **Tool execution** (Figure 5): Model sees `FlowToolCall` schemas; session loop resolves calls by name, validates arguments, dispatches backend payloads through the session’s sandbox. Results/errors return as tool-result chunks.
- **Backend boundary** (Table 5): Placement intent, resource lifetime, capability claim, concrete execution, evidence return — all scoped to the session’s sandbox handle.

### Multi-Agent Design
Multi-agent composition uses the same `Session → Session` contract (Table 6). Patterns include:
- One agent applied to many sessions (fresh, forked, resumed)
- Many agents sharing one state (specialist agents each consume/return Session)
- Nested workflows hiding internal structure behind `forward(session)`

No second runtime object (hidden message bus, controller-only trace) is introduced.

## Empirical Validation / Results

### Implementation Milestones (Table 7)
| Surface | Status |
|---------|--------|
| Session core | Implemented: ordered chunks, fork/detach/merge, usage accounting, JSONL lineage export. Exercised by focused tests. |
| Backend placement | Local execution verified; OpenSandbox optional (unconfigured in this environment). |
| Tool layer | Implemented: model-visible schemas with backend-dispatched side effects. Custom-tool and MCP examples. |
| Agent and workflow | Implemented composition over `Session → Session` contract, including scripted multi-stage workflow. |
| Provider layer | Prerequisites in place; model quality out of scope. |
| Memory plane | Intended runtime plane; not yet substantiated by local module with tests. |
| Examples | Worked examples: lineage, backends, tools, streaming, usage, multi-agent workflows. |

### Release Evidence Protocol (Table 8)
| Runtime claim | Current packet | Scope boundary |
|---------------|----------------|----------------|
| Session lineage is inspectable | `lineage_export`: pass, deterministic | Proves exported branch metadata, not branching quality |
| Tool placement is auditable | `local_sandbox`: pass; `opensandbox_optional`: skip | Proves local placement evidence, not OpenSandbox parity |
| Workflows compose session state | `workflow_transcript`: pass, deterministic | Proves composition shape, not live agent quality |
| Implementation contracts hold | `pytest_report`: pass | Does not cover every live integration |
| Provider prerequisites can be disclosed | `live_provider_manifest`: pass, redacted | Does not execute live inference |
| Memory is a session-visible plane | `memory_local`: skip | Evidence-gated until source anchors exist |
| Claim scope is tracked explicitly | `claim_ledger`: pass, ten claims | One evidence-gated claim: memory_runtime_plane |
| Report layout is reviewable | `visual_qa` and `layout_audit`: pass | Visual smoke, not final design approval |

Current evidence supports five claims with operational packets, one partially supported, one supported only for prerequisites, one bibliography-backed positioning, one layout smoke, and one evidence-gated (memory). All deterministic claims are rebuildable.

## Theoretical and Practical Implications

### For Runtime Design
OpenRath provides a principled answer to the **crossing-object problem** in the agent runtime stack (Table 2). Multi-agent APIs, durable graph runtimes, tracing SDKs, tool/data protocols, and real-environment benchmarks each own one layer, but leave the question: *What state moves between these layers?* OpenRath’s Session is designed to be that crossing object — carrying chunks, lineage, sandbox, tools, and memory in one inspectable flow.

The key implication is that **branchability, inspectability, and replayability become properties of the runtime value itself** rather than of controller-side conventions or post-hoc traces. This makes multi-agent systems easier to compose, debug, review, and evaluate.

### For Audit and Evaluation
Before broad benchmarks (coding suites, general-assistant evals), OpenRath argues the first question is: *can the system preserve and expose the state needed to make those later evaluations meaningful?* Its packet-first evaluation protocol separates runtime semantics from model choice, prompt design, and task distribution. This makes evidence **rebuildable and reviewer-friendly**.

### Limitations and Scoped Claims (Table 9)
| Boundary | Current posture | Required for stronger claim |
|----------|----------------|------------------------------|
| Benchmarking | Deterministic smoke runner & evidence packets, not broad baseline/metric benchmark | Pinned workloads, baseline adapters, live-provider runs, reviewer-scored artifacts |
| Memory | Intended runtime plane; evidence-gated | Restored local-memory APIs, examples, tests; recall/commit quality evaluation |
| Multi-agent control | Session exposes branch/merge/tool/lineage, but no policy layer | Role permissions, tool authority, memory-commit gates, merge policy, human-review requirements |
| Safety | No safety property claimed; tool use enlarges attack surface (indirect prompt injection) | Evaluation against agent/web/embodied safety benchmarks + tool-authority limits |
| Reproducibility | Deterministic claims support inspection; live outputs provider-dependent | Pinned source snapshots, provider manifests, sandbox images, cached payloads |

## Conclusion

OpenRath’s contribution is deliberately narrow: it makes the state that agents operate on **explicit**. A multi-agent system is not only a prompt graph, tool registry, trace stream, or benchmark harness — it is a runtime in which conversation chunks, branch lineage, sandbox placement, tool effects, memory interactions, usage, artifacts, and replay evidence must remain connected.

**Session** is the proposed boundary for that runtime state. Because evidence lives in the value the program already passes around (rather than in a side channel reconstructed afterward), it stays available exactly when a reviewer needs it. OpenRath is **complementary** to graph runtimes, tracing SDKs, tool protocols, sandbox providers, and real-environment benchmarks.

The durable thesis: **reliable agent systems need a first-class runtime value, and OpenRath makes Session that value**. New capabilities should preserve the same boundary — transform a Session, attach evidence to a Session, or expose a backend effect through a Session — to keep the system from becoming a collection of hidden side channels. As deep learning made the tensor the value a network is built around, the next generation of agent systems needs the same move: a single runtime value that everything reads, transforms, and explains.

---

_Markdown view of https://picx.dev/p/m23mUu, served by PicX — AI-generated visual whiteboard summaries of research papers._