COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Summary (Overview)

Automated Person-Grounded Skill Distillation: COLLEAGUE.SKILL is a system that automatically distills heterogeneous traces of a person or role (e.g., work documents, chat logs, public interviews) into a portable, inspectable, and correctable AI skill package for LLM agents.
Dual-Track Artifact Representation: Generated skills feature a capability track (work practices, mental models, decision heuristics) and a bounded behavior track (communication style, interaction rules, correction history), kept separate and inspectable.
Lifecycle Management & Governance: The system supports a full workflow: creation, inspection, invocation, correction via natural-language feedback, versioning, rollback, installation across agent hosts, and optional controlled distribution via a public gallery.
Domain-Specific Presets: The core distillation pipeline is specialized into three application presets with different evidence and governance assumptions: colleague (work expertise), celebrity/public-figure (public mental models), and relationship (private interaction patterns requiring strong consent and local control).

Introduction and Theoretical Foundation

The role of LLM agents is evolving from executing isolated tasks to carrying reusable, bounded representations of human expertise, judgment, and interaction style. Building such "person-grounded" agents is challenging because actionable knowledge is embedded in heterogeneous traces (chat logs, documents, reviews) rather than clean instructions. Existing memory/persona systems capture fragments, and skill frameworks provide packaging formats, but there is no end-to-end workflow to distill traces into inspectable, correctable, and agent-usable skills.

COLLEAGUE.SKILL frames this as a person-grounded trace-to-skill distillation problem. The goal is not unrestricted person simulation or identity replacement, but the creation of a constrained technical artifact that makes useful knowledge, interaction style, and usage limits explicit. This aligns with the broader shift in agent design towards modular extensions, where the Agent Skills standard defines skills as portable capability units (a folder centered on a SKILL.md file).

The core question addressed is: How can person-grounded knowledge, dispersed across traces, be distilled into reusable skill packages where contents, provenance, correction history, and usage limits remain visible and governable?

Methodology

Problem Formulation

Person-grounded skill generation is formulated as an artifact problem. Given a lightweight profile $p$ , a source scope $c$ , and source materials $D = \{d_1, ..., d_n\}$ , the system produces a skill package:

S = (A, M, L)

where $A$ is generated files, $M$ is machine-readable metadata, and $L$ is lifecycle state (version, update time, correction count).

The target artifact must exhibit five operational properties:

Portable: Loadable by skills-compatible agents.
Inspectable: Users can read extracted rules, examples, and limitations.
Composable: Full, capability-only, and behavior-only entrypoints can be invoked separately.
Correctable: Updatable via new evidence or user feedback while preserving prior state.
Governable: Metadata and source boundaries support deletion, sharing decisions, and safety review.

System Architecture & Presets

The system architecture (Figure 1) involves: trace intake and normalization, preset routing, dual-track distillation, artifact writing, and productization (installation/distribution).

The core pipeline is specialized via application presets (Figure 2), which are configurations of the same workflow:

colleague: Uses enterprise/local work traces (docs, reviews, decisions). Governance is based on organizational access and handover utility.
celebrity (icon): Uses public-source, first-person traces (works, interviews, speeches). Emphasizes source boundaries, citation discipline, and visible evidence limits.
relationship (ex): Uses private interpersonal traces supplied by the user. Requires strong consent, local control, deletion capability, and non-public defaults.

Dual Representation & Artifact Schema

The distillation produces a dual representation:

Capability Track (work.md): Captures responsibilities, workflows, technical standards, review criteria, and decision heuristics.
Bounded Behavior Track (persona.md): Stores expression preferences, interaction rules, boundaries, and correction records.

The artifact writer produces a versioned package (schema v3) containing the files listed in Table 1:

Table 1: Runtime artifact contract emitted by the shared writer.

Artifact	Primary Consumer	Contents
`SKILL.md`	Agent runtime, user	Combined invokable skill with frontmatter, capability track, persona track, and operating rules
`work.md`	User, updater	Editable capability document: procedures, standards, heuristics, and task patterns
`persona.md`	User, updater	Editable behavior document: style, interaction posture, boundaries, and correction log
`work_skill.md`	Agent runtime	Capability-only entrypoint generated from `work.md`
`persona_skill.md`	Agent runtime	Persona-only entrypoint generated from `persona.md`
`manifest.json`	Installers, gallery	Entrypoints, artifact list, compatible runtimes, slash commands, and toolchain metadata
`meta.json`	Lifecycle tools	Schema, provenance, lifecycle version, correction count, and compatibility fields

Workflows

Creation: Users provide an alias, optional profile, and source material. Preset-specific prompts guide the dual-track distillation, resulting in structured Markdown files packaged into the final artifact.
Correction & Update: The system accepts natural-language feedback (e.g., "he would not say that"). Corrections concerning work produce Markdown patches; those concerning behavior produce normalized correction records {scene, wrong, correct}. The writer archives the current version, applies changes, increments the version, and regenerates artifacts, enabling rollbacks (Figure 3).
Public-Figure Research Extension: The celebrity preset includes tooling for subtitle download, transcription, and quality checks that scan for mental-model coverage, limitations, and evidence grounding.
Relationship Extension: Stresses the governance surface of the package format, making deletion, correction, local ownership, and non-public defaults first-order requirements.

Empirical Validation / Results

The paper presents deployment and ecosystem metrics as evidence of the system's viability as a distribution surface, not as measures of task performance or behavioral fidelity.

Table 2: Observed public deployment counters on 2026-05-28.

Metric	Value
GitHub Repository Stars	~18.5k
GitHub Forks	~1.8k
Public Gallery Skills	215
Public Gallery Meta-skills	55
Community Contributors (Gallery)	165
Cumulative Gallery Stars (aggregate)	> 100k

The public gallery serves as a downstream sharing layer for skills where users have rights to publish. The system has been integrated with agent hosts like Claude Code, OpenClaw, Codex, and Hermes.

Theoretical and Practical Implications

Artifact-Centric Approach: The primary contribution is framing person-grounded distillation as an artifact problem with explicit portability, inspectability, and governance. This shifts the focus from hidden model behavior to concrete, reviewable packages.
Separation of Concerns: The dual-track representation deliberately separates capability (useful judgment, heuristics) from bounded behavior (style, interaction rules), mitigating risks associated with conflating them and enabling capability-only invocation where appropriate.
Workflow as a Research Enabler: The implemented creation-correction-installation-distribution workflow is not an engineering detail but a condition for making skills auditable, repairable, and evaluable. It provides concrete handles for future research on extraction quality, correction efficacy, and invocation modes.
Productization as a Constraint: The deployment surface (installers, gallery, manifests) turns distillation into a legible software object, clarifying ownership, provenance, and deployment boundaries for users and researchers.
Broad Applicability: The system demonstrates that the same artifact model can address diverse domains—from practical work expertise (colleague) to public intellectual models (celebrity) and sensitive private interactions (relationship)—by adapting evidence and consent assumptions.

Conclusion

COLLEAGUE.SKILL demonstrates that selected human traces can be distilled into portable, inspectable AI skill artifacts that encode capabilities, mental models, behavior constraints, and correction history. The core claim is not about achieving perfect behavioral fidelity, but about creating a governable, improvable artifact that users can read, revise, install, withhold, and delete.

The system provides a product-oriented research path for "digital doubles" as bounded packages with explicit evidence, rights, and correction semantics. Future work should focus on evaluating these artifacts—measuring useful judgment preservation and interaction quality—while rigorously accounting for source quality, consent, provenance, and safety boundaries across concrete deployment settings.