Visual Summary | Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Summary (Overview)

Code2LoRA introduces a hypernetwork framework that generates repository-specific LoRA adapters for frozen code language models, eliminating inference-time token overhead by injecting repository knowledge directly into model parameters.
Two usage scenarios are instantiated: Code2LoRA-Static (maps a single repository snapshot to an adapter) and Code2LoRA-Evo (maintains an adapter via a GRU hidden state updated per code diff to track software evolution).
RepoPeftBench is a new benchmark of 604 Python repositories with static and evolution tracks for evaluating repository-level parameter-efficient fine-tuning. It includes a temporal out-of-distribution holdout (92 repositories created after the training cutoff).
Code2LoRA-Static achieves 63.8% exact match (EM) on cross-repo (CR) evaluation, outperforming the strongest baseline (FFT+RAG) by +9.9 pp. Code2LoRA-Evo reaches 60.3% CR exact match on the evolution track, +5.2 pp over a single shared LoRA.
On an out-of-distribution temporal holdout, Code2LoRA-Evo achieves the highest EM (74.1%), demonstrating strong generalization to unseen, post-cutoff repositories.

Introduction and Theoretical Foundation

Code language models require repository-level context (imports, APIs, conventions) to perform complex tasks like assertion completion. Existing approaches inject this knowledge through long inputs (RAG, dependency analysis) incurring high token and retrieval costs, or through per-repository fine-tuning/LoRA which is costly and brittle to evolving codebases.

Hypernetwork-generated LoRA adapters (e.g., Text2LoRA, Doc2LoRA) provide a promising alternative: a forward pass over a conditioning input produces task-specific weights for a frozen LLM. However, these methods are designed for short natural-language inputs or single documents, not the long, repository-scale context of code, and lack mechanisms for tracking software evolution.

Code2LoRA fills this gap by framing repository-level adaptation along two orthogonal axes:

How knowledge enters parameters (via a hypernetwork conditioned on a repository embedding)
When it is updated (static snapshot vs. sequential commit diffs)

The theoretical foundation rests on low-rank adaptation (LoRA) (Hu et al., 2022) and hypernetworks (Ha et al., 2017). For a frozen base model with weight $W$ , a LoRA adapter injects an update $W' = W + \frac{\alpha}{r} BA$ , where $B \in \mathbb{R}^{d \times r}$ , $A \in \mathbb{R}^{r \times k}$ . Code2LoRA generates $A$ and $B$ from a sampled repository embedding using a trained hypernetwork, so no per-repository fine-tuning is needed.

Methodology

3.1 Repository Encoder

Repository context is compressed into a fixed-size vector in two steps using a frozen Qwen3-Embedding-0.6B model:

File-level embedding: Each file (or diff) is chunked into 4096-token segments with 512-token overlap, embedded, and mean-pooled to produce a file vector $\mathbf{f}_i \in \mathbb{R}^d$ ( $d=1024$ ).
Repository-level aggregation: Each file vector receives an importance weight $w_i$ . The repository embedding is the concatenation of a weighted mean and a max pool:

\mathbf{e} = \left[ \sum_i w_i \mathbf{f}_i \; ; \; \max_i \mathbf{f}_i \right] \in \mathbb{R}^{2d}

3.2 Code2LoRA-Static

A shared 2-layer MLP with GELU activation projects the embedding $\mathbf{e}$ to a hidden state, which is then fed to dedicated output heads for each LoRA module type $m \in \{q,k,v,o,gate,up,down\}$ :

\mathbf{h} = \sqrt{d_h} \cdot \text{L2Norm}(\text{MLP}(\mathbf{e}))

\mathbf{A}_m = \tanh(\text{Head}_m^A(\mathbf{h})) \cdot \exp(s_m^A)

\mathbf{B}_m = \tanh(\text{Head}_m^B(\mathbf{h})) \cdot \exp(s_m^B)

Learnable log-scales $s_m^{A/B}$ control adapter magnitudes (initialized to -3.5). LoRA matrices are shared across all layers and injected via $W' = W + \frac{\alpha}{r} \mathbf{B}_m \mathbf{A}_m$ (rank $r=16$ , $\alpha=32$ ). The hypernetwork has $\sim 720M$ trainable parameters.

3.3 Code2LoRA-Evo

A GRU recurrent neural network aggregates a chronological stream of diff embeddings $\{\mathbf{e}_t\}$ :

\mathbf{z}_t = \text{GRU}(\text{LayerNorm}(\text{Linear}(\mathbf{e}_t)), \mathbf{z}_{t-1})

The initial state $\mathbf{z}_0$ is computed from the initial repository embedding via a small linear projector. At each step $t$ , the shared LoRA-generation head uses $\mathbf{z}_t$ in place of $\mathbf{e}$ to produce the adapter. The GRU and projector add $\sim 25M$ parameters, total $\sim 745M$ .

3.4 Training

The hypernetwork is trained end-to-end by minimizing cross-entropy on assertion-completion pairs from the frozen base LLM:

\mathcal{L}(\theta) = -\sum_{(x,y) \in \mathcal{D}} \log p(y \mid x; \text{Hypernetwork}_\theta(u))

where $u = \mathbf{e}$ for Code2LoRA-Static and $u = \mathbf{z}_t$ for Code2LoRA-Evo. For Code2LoRA-Evo, truncated BPTT is used with detach every $K=16$ steps. Batches sample a repository first, then an input-output pair from it.

Empirical Validation / Results

Benchmark: RepoPeftBench

604 Python repositories (512 in-distribution, 92 temporal OOD holdout after 2025-04-01)
Task: assertion completion – given a test-file prefix, predict the expected value of an assertion
Two tracks:
- Static: single snapshot per repository (39,612 train, 11,636 test tasks)
- Evolution: commit-derived tasks (215,129 train, 86,793 test tasks from commit history)
Splits: Cross-Repo (CR) and In-Repo (IR)

Table 1: Dataset statistics

Split	Repos	Commits	Tasks	Tasks / repo
Static track
Train	409	409	39,612	96.9
CR Test	52	52	6,414	123.3
IR Test	409	409	5,222	12.8
Evolution track
Train (Evo)	400	45,516	215,129	537.8
CR Test	51	6,618	44,732	877
IR Test	389	6,179	42,061	108.1
OOD holdout	92	1,950	14,813	161.0

Static Track Results (Table 2)

Method	CR EM (%)	IR EM (%)
Pretrained	45.7	46.8
RAG (k=3)	39.7	42.1
Dep.-Resolved Context	48.2	49.5
FFT	51.4	55.9
Single LoRA	47.4	50.4
Per-repo LoRA	—	64.0
Text2LoRA (strengthened)	45.8	46.7
Code2LoRA-Static	63.8	66.2

Code2LoRA-Static outperforms all baselines by large margins (+9.9 pp over FFT+RAG on CR) and matches the Per-repo LoRA upper bound on IR without per-repository training.

Evolution Track Results (Table 3)

Method	CR EM (%)	IR EM (%)
Pretrained	31.5	29.3
Single LoRA	55.1	61.3
Per-repo LoRA	—	64.2
Text2LoRA	41.7	43.5
Code2LoRA-Static	55.7	60.6
Code2LoRA-Evo	60.3	64.5

Commit-derived tasks are significantly harder (Pretrained drops to 31.5% CR). Code2LoRA-Evo gains +5.2 pp over Single LoRA on CR and exceeds the Per-repo LoRA bound on IR without per-repo training.

Out-of-Distribution Generalization (Table 4)

Method	EM (%)
Pretrained	44.6
Single LoRA	72.3
Text2LoRA	60.4
Code2LoRA-Static	72.2
Code2LoRA-Evo	74.1

Code2LoRA-Evo leads on OOD by ~1.8 pp over the next-best fine-tuned adapter, with consistent gains across EditSim and CodeBLEU. (Note: OOD targets are systematically shorter, inflating absolute scores, but within-table comparisons remain valid.)

Theoretical and Practical Implications

Parametric injection of repository knowledge (via generated adapters) consistently outperforms context-injection methods (RAG, dependency resolution) across both static and evolution settings, suggesting that code models benefit from distilling repository context into parameters rather than extending input length.
Recurrent aggregation over commit diffs is shown to be more effective than static snapshot adaptation when codebases evolve, providing a principled way to keep model knowledge current without full retraining.
The hypernetwork approach enables zero-inference-time token overhead and generalization to unseen repositories without per-repository fine-tuning, making it practical for large-scale deployment across many codebases.
RepoPeftBench provides a standardized evaluation framework for repository-level PEFT, including a temporal OOD split that challenges models to generalize to future codebases.

Conclusion

Code2LoRA is a hypernetwork framework that generates repository-specific LoRA adapters for code language models. Two instances address different usage scenarios:

Code2LoRA-Static maps a single repository snapshot to an adapter, achieving 63.8% CR / 66.2% IR exact match on the static track.
Code2LoRA-Evo uses a GRU to aggregate commit diffs, reaching 60.3% CR / 64.5% IR exact match on the evolution track, outperforming static and shared adapters.

The results demonstrate that repository knowledge is best injected parametrically and updated to track software evolution, rather than through long input context. Code2LoRA provides a building block for more context-aware, customizable, and cost-efficient AI code assistants.

Limitations include evaluation limited to Python and a single backbone (Qwen2.5-Coder-1.5B), potential inflationary effects on OOD metrics due to shorter target lengths (though within-table comparisons are valid), and the large size of the hypernetwork itself (~720M–745M parameters). Future work should extend to more languages, larger backbones, and additional downstream tasks.