Visual Summary | From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

Summary (Overview)

BrainCause: A novel automated framework that combines generative models (text-to-image, image editing) and brain encoding models to causally discover and validate visual concept representations in the human brain from fMRI data.
Key methodological advance: Moves beyond traditional activation-based localization by constructing targeted stimulus sets containing positive images, counterfactual edits (concept removed while preserving context), and semantic negatives (correlated but distinct concepts), then testing whether brain regions respond specifically to the target concept.
Main finding: Over 70% of regions identified by activation-based methods are false positives under causal evaluation—meaning their strong responses are driven by correlated cues (e.g., background, color, pose) rather than the concept itself.
Causal ranking reduces false positive rate from 73.4% to 23.0% while improving true positive rate from 26.6% to 38.7%, and recovers known functional regions (faces, bodies, places, words) as well as fine-grained concepts (handwritten text, logos, animal faces, body parts).
The framework also proposes follow-up fMRI experiments when existing measured data is insufficient to validate a discovery, closing the loop between computational analysis and experimental design.

Introduction and Theoretical Foundation

The human brain organizes visual experience into representations of objects, scenes, actions, and abstract concepts. For decades, fMRI studies have relied on activation-based localization—measuring whether a brain region responds more strongly to a target category than to others (e.g., faces vs. non-faces). This has led to classical findings of category-selective regions: FFA (faces), PPA (places), EBA (bodies), VWFA (words). However, a fundamental limitation remains: strong activation does not establish that a region represents the concept itself, because responses may be driven by correlated visual or semantic cues (e.g., color, background, pose, co-occurring objects).

The paper argues that causality—testing whether removing or replacing the target concept reduces the response while preserving everything else—is essential for robust discovery. While causality is a central concern in other fields and in some areas of neuroscience (lesion studies, brain stimulation), it has received limited attention in fMRI studies of visual concept representations. Recent advances in generative models (text-to-image, image editing), large language models (for proposing correlated alternatives), and image-to-fMRI encoding models now make it possible to construct controlled stimulus sets that disentangle concept-driven activation from correlated factors, enabling large-scale causal testing.

Methodology

BrainCause operates in three stages:

Stage 1: Concept-Targeted Causal Dataset Generation

Given a target concept (e.g., "human face"), BrainCause constructs three types of images:

Positive Images: 200 training + 100 validation images generated by a text-to-image model (FLUX.2) from diverse prompts produced by a language model (Gemma-3-27B-IT).
Semantic Negative Images: ~80–100 images of correlated but distinct concepts (e.g., for "human face": "animal face", "human body", "robot face") proposed by the language model, with the target concept explicitly excluded. Generated and then verified by a vision-language model (Qwen3-VL-8B).
Counterfactual Negative Images: ~400–500 images created by editing each positive image to minimally remove/replace the target concept (e.g., replace human face with animal face) while preserving background, color, layout. Generated via FLUX.2 editing model.

Additionally, positive and semantic-negative images are retrieved from the measured fMRI dataset (NSD) using CLIP-based retrieval, filtered with the vision-language model.

All generated images are passed through an image-to-fMRI encoder (trained on NSD) to obtain predicted brain responses across ~40K voxels per subject.

Stage 2: Concept-Selective Representation Search

For each voxel, three scores are computed:

Activation Score $A(v) = \frac{1}{|P|}\sum_{p \in P} r_v(p)$ — average response to positive images.
Semantic Negative Score $S(v) = A(v) - \max_{n \in N_{hard}} r_v(n)$ — difference between activation and the hardest semantic negative (top-10 highest activating negatives).
Counterfactual Score $C(v) = \frac{1}{|P|}\sum_{p \in P} \left( r_v(p) - \max_{e \in E_p} r_v(e) \right)$ — average difference between each positive and its hardest edited counterpart.

The Causal Score for each voxel is the average of semantic-negative and counterfactual scores. The candidate representation is the set of voxels with positive causal scores (or top $K$ ranked by causal score).

Stage 3: Final Verdict and Follow-Up Experiment Design

Two quantities determine the verdict:

Causal Evidence: Statistical significance (p-values) of activation and causal scores on both generated evaluation data and measured data.
Concept Coverage in Measured Data: Fraction of positive and semantic-negative images successfully retrieved from the real fMRI dataset.

If coverage is high and causal evidence is strong → high-confidence discovery. If coverage is low but generated evidence is strong → promising but calls for follow-up experiments. Weak evidence leads to rejection or inconclusive verdict. The framework then proposes specific stimuli to acquire in future fMRI scans.

Empirical Validation / Results

4.1 BrainCause Causal Discovery

Activation-Based Regions Have High False Positive Rate (Fig. 4a): Across 260 concepts, activation-based discovery yields a false positive rate of 73.4% (regions with high activation but negative causality score). Only 26.6% are true positives (both high activation and positive causality).

Causal Ranking Discovers More Faithful Regions (Fig. 4b): Using the train causality score to rank voxels reduces false positive rate to 23.0% and increases true positive rate to 38.7%. Grey points (negative train causality) are withheld from discovery, avoiding spurious localizations.

Quantitative Comparison (Table 1): BrainCause achieves comparable activation scores to activation-based methods while substantially improving causality scores.

Method	Activation Gen.	Activation Meas.	Causal Semantic Gen.	Causal Semantic Meas.	Causal Counterfactual
Max Activation	2.76	0.70	0.08	0.18	0.44
MindSimulator	1.89	1.02	-0.44	0.27	0.23
MindSimulator+VLM	2.13	1.12	-0.26	0.41	0.38
BrainCause	2.05	1.08	0.62	0.71	0.98

4.2 Fine-Grained Visual Concept Localization

Alignment with Known Functional Regions (Table 2): For Bodies, Faces, Places, and Words, top-100 voxels ranked by causality score show high overlap with established NSD functional ROIs (e.g., 99% for Bodies, 90% for Faces, 99% for Words across top 100 voxels).

Region	Top 100	Top 200	Top 500
Bodies	99%	99%	97%
Faces	90%	87%	84%
Places	74%	75%	74%
Words	99%	98%	97%

Discovery of Fine-Grained Concepts (Figs. 5 & 6): BrainCause discovers localized causally supported representations for concepts such as tools (near EBA), animal faces (within FFA/OFA), human hands, human legs (within EBA/FBA), and text-related concepts like handwritten text, symbolic signs, logos (with distinct patterns across VWFA/OWFA regions). These show fine-grained organization within high-level visual cortex.

Consistency Across Subjects: Discovered region locations show clear correspondence across the four NSD subjects, despite individual variability in functional organization.

4.3 Ablation and Analysis

Causal ranking consistently outperforms activation-based ranking across region sizes and subjects.
Concept coverage in measured data varies widely—some concepts are well-represented (e.g., "human face"), others poorly (e.g., "sky diving"). Low coverage triggers follow-up recommendations.
Failure modes: Some false positives remain, often due to broad image properties (sky, reflection, lighting contrast) or limitations in generative models (semantic-negative generation failures).

Theoretical and Practical Implications

Theoretical: Establishes that causality is not just a philosophical concern but an empirically significant issue in fMRI-based discovery of visual representations. Without causal validation, activation-based localizations are unreliable—most are driven by correlated factors.
Methodological: Provides a systematic, automated pipeline for causal testing that can scale to hundreds of concepts, moving beyond the small, pre-specified category sets of traditional fMRI studies.
Practical: Can be used to validate and refine existing findings, discover new fine-grained representations, and propose informative follow-up experiments—closing the loop between computational analysis and experimental neuroscience.
Limitations: Depends on current generative and language models, which may fail to propose all relevant correlated alternatives or generate high-quality counterfactuals. Future work should make the pipeline iterative, using current results to guide new counterfactual proposals.

Conclusion

BrainCause demonstrates that activation alone is insufficient evidence of representation—causal testing is essential to distinguish concept-specific neural responses from responses driven by correlated visual or semantic cues. By combining generative models, language models, and brain encoding models, the framework enables large-scale causal discovery of visual concept representations, recovering known functional regions and uncovering fine-grained distinctions. It also provides a roadmap for designing follow-up fMRI experiments when existing data is insufficient. The work highlights the importance of moving from correlation to causation in cognitive neuroscience and offers a practical tool for achieving this goal.