COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation
Summary (Overview)
- Automated Person-Grounded Skill Distillation: COLLEAGUE.SKILL is a system that automatically distills heterogeneous traces of a person or role (e.g., work documents, chat logs, public interviews) into a portable, inspectable, and correctable AI skill package for LLM agents.
- Dual-Track Artifact Representation: Generated skills feature a capability track (work practices, mental models, decision heuristics) and a bounded behavior track (communication style, interaction rules, correction history), kept separate and inspectable.
- Lifecycle Management & Governance: The system supports a full workflow: creation, inspection, invocation, correction via natural-language feedback, versioning, rollback, installation across agent hosts, and optional controlled distribution via a public gallery.
- Domain-Specific Presets: The core distillation pipeline is specialized into three application presets with different evidence and governance assumptions: colleague (work expertise), celebrity/public-figure (public mental models), and relationship (private interaction patterns requiring strong consent and local control).
Introduction and Theoretical Foundation
The role of LLM agents is evolving from executing isolated tasks to carrying reusable, bounded representations of human expertise, judgment, and interaction style. Building such "person-grounded" agents is challenging because actionable knowledge is embedded in heterogeneous traces (chat logs, documents, reviews) rather than clean instructions. Existing memory/persona systems capture fragments, and skill frameworks provide packaging formats, but there is no end-to-end workflow to distill traces into inspectable, correctable, and agent-usable skills.
COLLEAGUE.SKILL frames this as a person-grounded trace-to-skill distillation problem. The goal is not unrestricted person simulation or identity replacement, but the creation of a constrained technical artifact that makes useful knowledge, interaction style, and usage limits explicit. This aligns with the broader shift in agent design towards modular extensions, where the Agent Skills standard defines skills as portable capability units (a folder centered on a SKILL.md file).
The core question addressed is: How can person-grounded knowledge, dispersed across traces, be distilled into reusable skill packages where contents, provenance, correction history, and usage limits remain visible and governable?
Methodology
Problem Formulation
Person-grounded skill generation is formulated as an artifact problem. Given a lightweight profile , a source scope , and source materials , the system produces a skill package:
where is generated files, is machine-readable metadata, and is lifecycle state (version, update time, correction count).
The target artifact must exhibit five operational properties:
- Portable: Loadable by skills-compatible agents.
- Inspectable: Users can read extracted rules, examples, and limitations.
- Composable: Full, capability-only, and behavior-only entrypoints can be invoked separately.
- Correctable: Updatable via new evidence or user feedback while preserving prior state.
- Governable: Metadata and source boundaries support deletion, sharing decisions, and safety review.
System Architecture & Presets
The system architecture (Figure 1) involves: trace intake and normalization, preset routing, dual-track distillation, artifact writing, and productization (installation/distribution).
The core pipeline is specialized via application presets (Figure 2), which are configurations of the same workflow:
colleague: Uses enterprise/local work traces (docs, reviews, decisions). Governance is based on organizational access and handover utility.celebrity(icon): Uses public-source, first-person traces (works, interviews, speeches). Emphasizes source boundaries, citation discipline, and visible evidence limits.relationship(ex): Uses private interpersonal traces supplied by the user. Requires strong consent, local control, deletion capability, and non-public defaults.
Dual Representation & Artifact Schema
The distillation produces a dual representation:
- Capability Track (
work.md): Captures responsibilities, workflows, technical standards, review criteria, and decision heuristics. - Bounded Behavior Track (
persona.md): Stores expression preferences, interaction rules, boundaries, and correction records.
The artifact writer produces a versioned package (schema v3) containing the files listed in Table 1:
Table 1: Runtime artifact contract emitted by the shared writer.
| Artifact | Primary Consumer | Contents |
|---|---|---|
SKILL.md | Agent runtime, user | Combined invokable skill with frontmatter, capability track, persona track, and operating rules |
work.md | User, updater | Editable capability document: procedures, standards, heuristics, and task patterns |
persona.md | User, updater | Editable behavior document: style, interaction posture, boundaries, and correction log |
work_skill.md | Agent runtime | Capability-only entrypoint generated from work.md |
persona_skill.md | Agent runtime | Persona-only entrypoint generated from persona.md |
manifest.json | Installers, gallery | Entrypoints, artifact list, compatible runtimes, slash commands, and toolchain metadata |
meta.json | Lifecycle tools | Schema, provenance, lifecycle version, correction count, and compatibility fields |
Workflows
- Creation: Users provide an alias, optional profile, and source material. Preset-specific prompts guide the dual-track distillation, resulting in structured Markdown files packaged into the final artifact.
- Correction & Update: The system accepts natural-language feedback (e.g., "he would not say that"). Corrections concerning work produce Markdown patches; those concerning behavior produce normalized correction records
{scene, wrong, correct}. The writer archives the current version, applies changes, increments the version, and regenerates artifacts, enabling rollbacks (Figure 3). - Public-Figure Research Extension: The
celebritypreset includes tooling for subtitle download, transcription, and quality checks that scan for mental-model coverage, limitations, and evidence grounding. - Relationship Extension: Stresses the governance surface of the package format, making deletion, correction, local ownership, and non-public defaults first-order requirements.
Empirical Validation / Results
The paper presents deployment and ecosystem metrics as evidence of the system's viability as a distribution surface, not as measures of task performance or behavioral fidelity.
Table 2: Observed public deployment counters on 2026-05-28.
| Metric | Value |
|---|---|
| GitHub Repository Stars | ~18.5k |
| GitHub Forks | ~1.8k |
| Public Gallery Skills | 215 |
| Public Gallery Meta-skills | 55 |
| Community Contributors (Gallery) | 165 |
| Cumulative Gallery Stars (aggregate) | > 100k |
The public gallery serves as a downstream sharing layer for skills where users have rights to publish. The system has been integrated with agent hosts like Claude Code, OpenClaw, Codex, and Hermes.
Theoretical and Practical Implications
- Artifact-Centric Approach: The primary contribution is framing person-grounded distillation as an artifact problem with explicit portability, inspectability, and governance. This shifts the focus from hidden model behavior to concrete, reviewable packages.
- Separation of Concerns: The dual-track representation deliberately separates capability (useful judgment, heuristics) from bounded behavior (style, interaction rules), mitigating risks associated with conflating them and enabling capability-only invocation where appropriate.
- Workflow as a Research Enabler: The implemented creation-correction-installation-distribution workflow is not an engineering detail but a condition for making skills auditable, repairable, and evaluable. It provides concrete handles for future research on extraction quality, correction efficacy, and invocation modes.
- Productization as a Constraint: The deployment surface (installers, gallery, manifests) turns distillation into a legible software object, clarifying ownership, provenance, and deployment boundaries for users and researchers.
- Broad Applicability: The system demonstrates that the same artifact model can address diverse domains—from practical work expertise (
colleague) to public intellectual models (celebrity) and sensitive private interactions (relationship)—by adapting evidence and consent assumptions.
Conclusion
COLLEAGUE.SKILL demonstrates that selected human traces can be distilled into portable, inspectable AI skill artifacts that encode capabilities, mental models, behavior constraints, and correction history. The core claim is not about achieving perfect behavioral fidelity, but about creating a governable, improvable artifact that users can read, revise, install, withhold, and delete.
The system provides a product-oriented research path for "digital doubles" as bounded packages with explicit evidence, rights, and correction semantics. Future work should focus on evaluating these artifacts—measuring useful judgment preservation and interaction quality—while rigorously accounting for source quality, consent, provenance, and safety boundaries across concrete deployment settings.
Related papers
- K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts
Even strong frontier models achieve only 45.67% accuracy on K-BrowseComp, and Korean open-weight models score 0–10.33%, revealing a massive agentic gap.
- On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters
Parameter-efficient fine-tuning scales one shared foundation model into millions of persistent personal model instances, shown with trillion-parameter LoRA RL.
- GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration
Training image restoration models on 100,000 real-world image pairs generated by a multimodal foundation model consistently improves their generalization to diverse real-world degradations.