Summary (Overview)

  • Code2LoRA introduces a hypernetwork framework that generates repository-specific LoRA adapters for frozen code language models, eliminating inference-time token overhead by injecting repository knowledge directly into model parameters.
  • Two usage scenarios are instantiated: Code2LoRA-Static (maps a single repository snapshot to an adapter) and Code2LoRA-Evo (maintains an adapter via a GRU hidden state updated per code diff to track software evolution).
  • RepoPeftBench is a new benchmark of 604 Python repositories with static and evolution tracks for evaluating repository-level parameter-efficient fine-tuning. It includes a temporal out-of-distribution holdout (92 repositories created after the training cutoff).
  • Code2LoRA-Static achieves 63.8% exact match (EM) on cross-repo (CR) evaluation, outperforming the strongest baseline (FFT+RAG) by +9.9 pp. Code2LoRA-Evo reaches 60.3% CR exact match on the evolution track, +5.2 pp over a single shared LoRA.
  • On an out-of-distribution temporal holdout, Code2LoRA-Evo achieves the highest EM (74.1%), demonstrating strong generalization to unseen, post-cutoff repositories.

Introduction and Theoretical Foundation

Code language models require repository-level context (imports, APIs, conventions) to perform complex tasks like assertion completion. Existing approaches inject this knowledge through long inputs (RAG, dependency analysis) incurring high token and retrieval costs, or through per-repository fine-tuning/LoRA which is costly and brittle to evolving codebases.

Hypernetwork-generated LoRA adapters (e.g., Text2LoRA, Doc2LoRA) provide a promising alternative: a forward pass over a conditioning input produces task-specific weights for a frozen LLM. However, these methods are designed for short natural-language inputs or single documents, not the long, repository-scale context of code, and lack mechanisms for tracking software evolution.

Code2LoRA fills this gap by framing repository-level adaptation along two orthogonal axes:

  • How knowledge enters parameters (via a hypernetwork conditioned on a repository embedding)
  • When it is updated (static snapshot vs. sequential commit diffs)

The theoretical foundation rests on low-rank adaptation (LoRA) (Hu et al., 2022) and hypernetworks (Ha et al., 2017). For a frozen base model with weight WW, a LoRA adapter injects an update W=W+αrBAW' = W + \frac{\alpha}{r} BA, where BRd×rB \in \mathbb{R}^{d \times r}, ARr×kA \in \mathbb{R}^{r \times k}. Code2LoRA generates AA and BB from a sampled repository embedding using a trained hypernetwork, so no per-repository fine-tuning is needed.

Methodology

3.1 Repository Encoder

Repository context is compressed into a fixed-size vector in two steps using a frozen Qwen3-Embedding-0.6B model:

  1. File-level embedding: Each file (or diff) is chunked into 4096-token segments with 512-token overlap, embedded, and mean-pooled to produce a file vector fiRd\mathbf{f}_i \in \mathbb{R}^d (d=1024d=1024).
  2. Repository-level aggregation: Each file vector receives an importance weight wiw_i. The repository embedding is the concatenation of a weighted mean and a max pool:
e=[iwifi  ;  maxifi]R2d\mathbf{e} = \left[ \sum_i w_i \mathbf{f}_i \; ; \; \max_i \mathbf{f}_i \right] \in \mathbb{R}^{2d}

3.2 Code2LoRA-Static

A shared 2-layer MLP with GELU activation projects the embedding e\mathbf{e} to a hidden state, which is then fed to dedicated output heads for each LoRA module type m{q,k,v,o,gate,up,down}m \in \{q,k,v,o,gate,up,down\}:

h=dhL2Norm(MLP(e))\mathbf{h} = \sqrt{d_h} \cdot \text{L2Norm}(\text{MLP}(\mathbf{e})) Am=tanh(HeadmA(h))exp(smA)\mathbf{A}_m = \tanh(\text{Head}_m^A(\mathbf{h})) \cdot \exp(s_m^A) Bm=tanh(HeadmB(h))exp(smB)\mathbf{B}_m = \tanh(\text{Head}_m^B(\mathbf{h})) \cdot \exp(s_m^B)

Learnable log-scales smA/Bs_m^{A/B} control adapter magnitudes (initialized to -3.5). LoRA matrices are shared across all layers and injected via W=W+αrBmAmW' = W + \frac{\alpha}{r} \mathbf{B}_m \mathbf{A}_m (rank r=16r=16, α=32\alpha=32). The hypernetwork has 720M\sim 720M trainable parameters.

3.3 Code2LoRA-Evo

A GRU recurrent neural network aggregates a chronological stream of diff embeddings {et}\{\mathbf{e}_t\}:

zt=GRU(LayerNorm(Linear(et)),zt1)\mathbf{z}_t = \text{GRU}(\text{LayerNorm}(\text{Linear}(\mathbf{e}_t)), \mathbf{z}_{t-1})

The initial state z0\mathbf{z}_0 is computed from the initial repository embedding via a small linear projector. At each step tt, the shared LoRA-generation head uses zt\mathbf{z}_t in place of e\mathbf{e} to produce the adapter. The GRU and projector add 25M\sim 25M parameters, total 745M\sim 745M.

3.4 Training

The hypernetwork is trained end-to-end by minimizing cross-entropy on assertion-completion pairs from the frozen base LLM:

L(θ)=(x,y)Dlogp(yx;Hypernetworkθ(u))\mathcal{L}(\theta) = -\sum_{(x,y) \in \mathcal{D}} \log p(y \mid x; \text{Hypernetwork}_\theta(u))

where u=eu = \mathbf{e} for Code2LoRA-Static and u=ztu = \mathbf{z}_t for Code2LoRA-Evo. For Code2LoRA-Evo, truncated BPTT is used with detach every K=16K=16 steps. Batches sample a repository first, then an input-output pair from it.

Empirical Validation / Results

Benchmark: RepoPeftBench

  • 604 Python repositories (512 in-distribution, 92 temporal OOD holdout after 2025-04-01)
  • Task: assertion completion – given a test-file prefix, predict the expected value of an assertion
  • Two tracks:
    • Static: single snapshot per repository (39,612 train, 11,636 test tasks)
    • Evolution: commit-derived tasks (215,129 train, 86,793 test tasks from commit history)
  • Splits: Cross-Repo (CR) and In-Repo (IR)

Table 1: Dataset statistics

SplitReposCommitsTasksTasks / repo
Static track
Train40940939,61296.9
CR Test52526,414123.3
IR Test4094095,22212.8
Evolution track
Train (Evo)40045,516215,129537.8
CR Test516,61844,732877
IR Test3896,17942,061108.1
OOD holdout921,95014,813161.0

Static Track Results (Table 2)

MethodCR EM (%)IR EM (%)
Pretrained45.746.8
RAG (k=3)39.742.1
Dep.-Resolved Context48.249.5
FFT51.455.9
Single LoRA47.450.4
Per-repo LoRA64.0
Text2LoRA (strengthened)45.846.7
Code2LoRA-Static63.866.2

Code2LoRA-Static outperforms all baselines by large margins (+9.9 pp over FFT+RAG on CR) and matches the Per-repo LoRA upper bound on IR without per-repository training.

Evolution Track Results (Table 3)

MethodCR EM (%)IR EM (%)
Pretrained31.529.3
Single LoRA55.161.3
Per-repo LoRA64.2
Text2LoRA41.743.5
Code2LoRA-Static55.760.6
Code2LoRA-Evo60.364.5

Commit-derived tasks are significantly harder (Pretrained drops to 31.5% CR). Code2LoRA-Evo gains +5.2 pp over Single LoRA on CR and exceeds the Per-repo LoRA bound on IR without per-repo training.

Out-of-Distribution Generalization (Table 4)

MethodEM (%)
Pretrained44.6
Single LoRA72.3
Text2LoRA60.4
Code2LoRA-Static72.2
Code2LoRA-Evo74.1

Code2LoRA-Evo leads on OOD by ~1.8 pp over the next-best fine-tuned adapter, with consistent gains across EditSim and CodeBLEU. (Note: OOD targets are systematically shorter, inflating absolute scores, but within-table comparisons remain valid.)

Theoretical and Practical Implications

  • Parametric injection of repository knowledge (via generated adapters) consistently outperforms context-injection methods (RAG, dependency resolution) across both static and evolution settings, suggesting that code models benefit from distilling repository context into parameters rather than extending input length.
  • Recurrent aggregation over commit diffs is shown to be more effective than static snapshot adaptation when codebases evolve, providing a principled way to keep model knowledge current without full retraining.
  • The hypernetwork approach enables zero-inference-time token overhead and generalization to unseen repositories without per-repository fine-tuning, making it practical for large-scale deployment across many codebases.
  • RepoPeftBench provides a standardized evaluation framework for repository-level PEFT, including a temporal OOD split that challenges models to generalize to future codebases.

Conclusion

Code2LoRA is a hypernetwork framework that generates repository-specific LoRA adapters for code language models. Two instances address different usage scenarios:

  • Code2LoRA-Static maps a single repository snapshot to an adapter, achieving 63.8% CR / 66.2% IR exact match on the static track.
  • Code2LoRA-Evo uses a GRU to aggregate commit diffs, reaching 60.3% CR / 64.5% IR exact match on the evolution track, outperforming static and shared adapters.

The results demonstrate that repository knowledge is best injected parametrically and updated to track software evolution, rather than through long input context. Code2LoRA provides a building block for more context-aware, customizable, and cost-efficient AI code assistants.

Limitations include evaluation limited to Python and a single backbone (Qwen2.5-Coder-1.5B), potential inflationary effects on OOD metrics due to shorter target lengths (though within-table comparisons are valid), and the large size of the hypernetwork itself (~720M–745M parameters). Future work should extend to more languages, larger backbones, and additional downstream tasks.

Related papers