I received my joint PhD in Machine Learning & Neural Computation from Carnegie Mellon University (CMU) in 2024, where I worked with Michael
Tarr and Leila Wehbe. Before that, I earned my undergraduate degree in Computer Science from the Massachusetts Institute of Technology (MIT) in 2019. I also have a Master of Science in Machine Learning Research from CMU.
My work focuses on understanding the computational principles underlying visual perception and how these principles can inform the development of improved generative models and intelligent machines. Ultimately, I aim to bridge the gap between human and machine reasoning, leading to both a deeper understanding of human cognition and advancements in artificial intelligence.
I have completed PhD student hiring for the 2025 Fall cycle. I welcome RAs (remote or in-person) or remote collaboration with PhDs, master's, and undergraduates.
We show that artifacts can be removed from pre-trained ViTs without any labeled data by introducing registers in post-training. Our method uses a model combined with test-time augmentation to distill itself, leading to significant improvements in open-vocabulary segmentation and dense prediction tasks.
We show how to construct higher visual cortex encoders that can generalize across subjects, scanners, voxel sizes, protocols, and images without any additional finetuning by using meta-learning across subjects and in-context learning across stimuli.
We propose image-conditioned decoding of perceived motion from fMRI data, we show that this can be used to animate images with an video diffusion model.
We propose an efficient gradient-free distillation module capable of extraction high quality dense CLIP embeddings, and utilize these embeddings to understand semantic selectivity in the visual cortex.
We propose a way to leverage contrastive image-language models (CLIP) and fine-tuned language models to generate natural language descriptions of voxel-wise selectivity in the higher order visual areas.
We propose a learnable and compact implicit encoding for acoustic impulse responses. We find that our NAFs can achieve state-of-the-art performance at a tiny size footprint.