Electroencephalography (EEG) is widely used in neuroscience and clinical applications but is constrained by data scarcity, high variability, and noise. Deep generative models offer a solution by synthesizing realistic EEG data to augment datasets, improving generalization. However, existing approaches—such as GANs, diffusion models, and masked autoencoders—generate entire sequences at once instead of preserving EEG’s causal structure. This research introduces causal generative modeling as a biologically consistent alternative, leveraging recursive transformers and iterative diffusion-based refinement for sample-efficient and scalable EEG synthesis.
This study addresses three key challenges in EEG generation: (1) modeling EEG variability within a causal framework, (2) improving data efficiency while maintaining generative quality, and (3) integrating diffusion-based iterative refinement into autoregressive modeling. The proposed approach reframes the transformer’s residual pathway as the core computational structure, incorporating progressive weight-sharing across recursive steps to reduce model size and improve sample efficiency.
The model will be evaluated using likelihood-based metrics, statistical comparisons with real EEG distributions, and downstream classification tasks to ensure alignment with real-world EEG characteristics. Ablation studies will assess trade-offs between recursive weight-sharing, model complexity, and synthesis quality. By strictly enforcing temporal causality, this approach enables realistic EEG synthesis for applications such as privacy-preserving clinical data augmentation, synthetic patient modeling for clinical trials, counterfactual EEG generation, and adaptive neurofeedback. More importantly, it enables training on synthetic data for solving downstream tasks in fields such as brain-computer interfacing, event-related potentials and sleep state prediction.
Beyond neuroscience, this research advances AI-driven time-series modeling by integrating recursive transformers with diffusion-based refinement, improving data efficiency in sequential learning. This approach strengthens both foundational AI methodologies and biomedical applications while offering broader implications for speech processing, NLP, and time-series forecasting, where preserving stepwise dependencies is essential for realistic synthesis and prediction.