Framework draft v0.1

Task Harness Profile

Lab design standard for all agents and agent crews.

Task Harness Profile

Lab Design Standard — The Scurry Lab Companion documents: Global Harness Profile · Agent Harness Profile

last_updated: 2026-05-09 | status: draft v0.1 previously titled: Skill Design Profile


Purpose

This document defines the design standard for all task harnesses used by agents in the lab — the local behavioral environment engineered for a specific class of model interaction.

It is a companion to the Agent Harness Profile, which governs agent-level harness decisions, and the Global Harness Profile, which governs system-wide harness decisions. Together the three profiles cover the designed layers of the outer harness:

  • Global Harness Profile — system-wide. What every agent inherits before any agent-level design decisions are made.
  • Agent Harness Profile — per-agent. How a specific agent is scoped, trusted, and constrained across all its tasks.
  • Task Harness Profilethis document — per-task. How a specific task surface is engineered for a specific class of model interaction.

Task harnesses are not just prompt templates. They are local harness artifacts — active engineering of the behavioral environment for a defined task. A task harness designer is making harness decisions whether or not they have a vocabulary for it. This profile makes those decisions intentional.


Background: Task Harnesses as Local Harness

What the nominal structure covers

Developers working with task configurations — skills, prompt templates, instruction files — generally converge on a shared structure: name, description, instructions or steps, few-shot examples, output format, and constraints. This is the what of a task harness — it describes the task and specifies desired output.

It does not capture how the task harness functions as a harness artifact — how it shapes model behavior, where its jurisdiction ends, how it interacts with other context layers, and what it makes observable.

The theoretical extension

The nominal structure treats a task configuration as a task specification.

The Task Harness Profile treats it as a local behavioral environment — something that actively shapes what the model can and will do within a defined task surface. This reframe has practical consequences:

  • Over or under constraining a task harness produces different model behaviors for identical or similar requests. That is direct evidence of harness effect, not prompt quality.
  • A task harness that operates correctly in isolation may produce degraded output when combined with other task harnesses or agent-level context. Composability is a design property, not an afterthought.
  • Some decisions belong at higher layers. A task harness that silently depends on global harness content without naming that dependency creates fragility. A task harness that re-specifies global concerns creates conflict.

The Four Harness Layers

Task harnesses are the innermost designed harness layer. They inherit from everything above them. Good task harness design is explicit about what it inherits and what it defines locally.

LayerScopeArchitectural position
Global HarnessSystem-wide — all agents, all tasksPersistent shared storage; exists before agents run
Agent HarnessPer-agent — all tasks for one agentAgent definition; travels with the agent
Task Harnessthis documentPer-task — one task surface for one agentTask invocation; loaded when a specific task triggers
Infrastructure HarnessInference boundary — transparent to agentsBetween calling agent and model endpoint; active investigation

A task harness cannot override agent harness decisions. An agent harness cannot override global harness decisions. When a task harness decision conflicts with a higher layer, the higher layer takes precedence. Flag conflicts rather than resolving them silently.


The Five Dimensions

Every task harness in the lab should address all five dimensions. Dimensions that are not explicitly addressed are implicit design decisions — and implicit decisions are a source of behavioral inconsistency.


Dimension 1 — Task Surface

What does this task harness open, and what does it close?

Every task harness defines both a ceiling and a floor:

  • Ceiling — the expanded capability surface. What the model can do within this task harness that it would not do by default.
  • Floor — the minimum required behavior. What the model must always do regardless of input variation.

Most nominal task configurations define the ceiling but leave the floor implicit. Making both explicit surfaces the actual task surface the harness is engineering.

Design questions:

  • What is the intended capability this task harness enables or expands?
  • What is the minimum consistent behavior required across all inputs?
  • What is explicitly out of scope — what should the model not do within this task harness?

Failure modes:

  • Ceiling only → inconsistent floor behavior across similar inputs
  • Floor only → constrained but unexpressive output that fails to use model capability
  • Neither → the task harness is a label, not a harness

Dimension 2 — Constraint Geometry

Where is this task harness tight, where is it loose, and is that intentional?

Constraint is not uniform across a task harness. Some elements should be tightly specified; others should be intentionally loose. The geometry of constraint determines model behavior more than the presence or absence of constraints.

  • Tight by design — output format, required terminology, scope limits, constitutional boundaries. Consistency matters here more than flexibility.
  • Loose by design — reasoning path, elaboration depth, phrasing. Model judgment adds value here; over-specification reduces it.

Design questions:

  • Which parts of this task harness require tight constraint for consistency?
  • Which parts should be left loose to allow model judgment?
  • Is the current constraint geometry the result of explicit decisions or accumulated instructions?

Failure modes:

  • Over-constraint → collapses useful degrees of freedom; model produces technically compliant but brittle output
  • Under-constraint → behavioral drift from intended surface; inconsistent outputs across similar inputs
  • Uniform constraint → neither tight where it matters nor loose where it should be; flat and inexpressive

Dimension 3 — Scope Boundary

Where does this task harness’s jurisdiction end, and what belongs to higher layers?

Task harnesses are local harness. They operate inside agent harnesses which operate inside the global harness. Some decisions belong at higher layers:

  • Constitutional bounds — always global harness
  • Agent identity and role — always agent harness
  • Cross-agent behavior — always orchestration level
  • System-wide norms — always global harness

A task harness that re-specifies these creates redundancy at best, conflict at worst. A task harness that silently depends on them without naming that dependency creates fragility — if the global harness changes, the task harness breaks in ways that are hard to trace.

Design questions:

  • What does this task harness assume is already handled by the global harness or agent harness?
  • What would break if the global harness changed?
  • Does this task harness try to specify anything that belongs at a higher layer?

Failure modes:

  • Re-specifying global concerns → conflict or drift when global layer changes
  • Silent dependency → fragile task harness that breaks unexpectedly under harness changes
  • No boundary awareness → task harness that cannot be safely composed into a larger system

Dimension 4 — Composability

How does this task harness behave when it shares context with other task harnesses or agent instructions?

Task harnesses rarely operate in complete isolation. When an agent uses multiple task harnesses, or when a task harness operates inside a broader scaffold, interactions occur — some intended, some not. A task harness designed without composability consideration may produce correct output alone and degraded output in combination.

Composability requires knowing the task harness’s dominance posture:

  • Dominant — this task harness’s specifications take precedence in case of conflict with co-occurring context
  • Subordinate — this task harness yields to agent-level or other task harness context where conflicts arise
  • Neutral — this task harness is designed to be non-conflicting with expected co-occurring context

Design questions:

  • What task harnesses or context layers are likely to co-occur with this one?
  • Are there known or anticipated interaction effects?
  • What is this task harness’s dominance posture when conflicts arise?

Failure modes:

  • Unintended dominance → task harness overrides agent identity or constitutional context
  • Silent subordination → task harness is effectively ignored when co-occurring context is present
  • No posture defined → unpredictable behavior in composition that is difficult to debug

Dimension 5 — Observability Profile

What does this task harness make legible, and what does it obscure?

Task harnesses shape the inspection surface of model behavior. Structured output increases observability — outputs are auditable, behavior is traceable. Free-form output may produce higher quality results but reduces the inspection surface.

This dimension connects to Axis 5 of the Agent Harness Profile at the task level, and to Dimension 5 of the Global Harness Profile where task harness outputs are written to shared context. A task harness with no observability consideration is making implicit decisions about what is auditable.

Design questions:

  • What outputs does this task harness produce that can be inspected or audited?
  • Does the task harness require reasoning traces or chain-of-thought visibility?
  • Is there a known tradeoff between output quality and observability in this task harness?
  • What would anomalous behavior look like in this task harness’s output surface?

Failure modes:

  • No structured output → behavior is present but not inspectable; anomalies are invisible
  • Mandatory structure at the cost of quality → correct format, degraded reasoning
  • No anomaly definition → observation layer has no signal to act on

Task Harness Profile Template

### [Task Harness Name]

**Summary:**
[One sentence — what this task harness does and for which
agent/task surface]

**Dimension 1 — Task Surface**
- Ceiling (capability opened):
- Floor (minimum required behavior):
- Out of scope:

**Dimension 2 — Constraint Geometry**
- Tight by design:
- Loose by design:
- Known over/under constraint risks:

**Dimension 3 — Scope Boundary**
- Assumes from global harness:
- Assumes from agent harness:
- Would break if:

**Dimension 4 — Composability**
- Likely co-occurring task harnesses/context:
- Known interaction effects:
- Dominance posture: [Dominant / Subordinate / Neutral]

**Dimension 5 — Observability Profile**
- Auditable outputs:
- Reasoning visibility: [Required / Optional / None]
- Anomaly signals:
- Quality/observability tradeoff:

Gate Requirement

A task harness may not be deployed to a production agent until all five dimensions have at least a draft answer. Dimensions left unanswered must be listed as open design questions.

This gate exists for the same reason as the agent-level gate: implicit harness decisions are not neutral. They produce behavioral effects. Making them explicit is the mechanism by which the lab treats task harness design as engineering rather than prompt writing.


Relationship to the Four Harness Layers

Global Harness ProfileAgent Harness ProfileTask Harness ProfileInfrastructure Harness Profile
ScopeSystem-widePer-agentPer-task surfaceInference boundary
Primary questionWhat does every agent inherit?How is this agent scoped and trusted?How is this task surface engineered?What is shaped at the inference boundary?
Key dimensionsContent, Permissions, Governance, Partitioning, ObservationAuthority, Memory, Autonomy, Trust, ObservationTask Surface, Constraint Geometry, Scope Boundary, Composability, ObservabilityActive investigation
GateSystem design → deploymentIdea → In DesignDesign → DeploymentActive investigation
Failure modeUnexamined system-wide authority and valuesUnexamined agent authority and trustUnexamined behavioral environment for a task

Note on Naming

This document is currently titled Profile to reflect its status as a working design artifact at draft level. At Franklin Phase 3 publication, profile-level documents will be elevated to Standard — the Task Harness Standard — at which point they will carry community-facing authority and be structured for external adoption. The profile/standard distinction maps onto per-task fill-in artifact versus published framework document respectively.

The previous name of this document — Skill Design Profile — reflected the framework-specific vocabulary of individual agent platforms. The rename to Task Harness Profile brings it into alignment with the four-layer harness engineering framework and removes dependency on platform-specific terminology.


Agents: task harnesses define the local environment you operate in for specific tasks. If a task harness produces behavior inconsistent with your agent harness or with global harness constitutional bounds, flag it — the task harness may have a scope boundary or composability failure. Do not override task harness constraints without escalation.

← Back to Framework