Computer Vision & ML Researcher

I am a Postdoctoral Researcher in the Explainable Machine Learning group at TU Munich and Helmholtz Munich, led by Prof. Zeynep Akata.

I received my PhD (Magna Cum Laude) from the Computer Vision Group at the University of Freiburg, under the supervision of Prof. Thomas Brox. Previous website.

Research Interests

My research focuses on building and evaluating multimodal AI systems that bridge vision and language. I am particularly interested in how vision-language models understand and represent visual content — from open-vocabulary recognition and attribute detection, to human-aligned evaluation of generative models, to inference-time analysis of multimodal large language models.

Current topics include:

Inference-time analysis of MLLMs: How cross-image attention shapes multi-image reasoning, and whether it can be routed at inference time to improve performance.
Training-free spatial control for text-to-image generation: Test-time methods for position control in Multimodal Diffusion Transformers.
Open-vocabulary recognition: Recognition of objects and attributes beyond predefined categories using vision-language models (see OVAD, LocOV).
Human-aligned evaluation: Building datasets and metrics grounded in human preferences for evaluating generative and discriminative vision-language models (see TIAlign, OVQA).

Quick Facts

🎓 PhD in Computer Science (Magna Cum Laude), University of Freiburg (May 2025)
📍 Currently based in Munich, Germany
🏆 Best Poster Award, Tuebingen WiML Workshop 2024, presented at NeurIPS WiML 2024
📝 Outstanding Reviewer: NeurIPS D&B Track 2022 & 2023, CVPR 2026
🎙️ Presented at ICLR, CVPR, NeurIPS, and ECCV
📋 Reviewed for NeurIPS (2025-2026), CVPR (2024–2026), ECCV (2022, 2026), ICCV (2025), TPAMI (2022–2023)
🤝 Collaborated with researchers from Amazon, MIT, KAUST, and the University of Freiburg
🇨🇴 Colombian, with a background in mathematics and biomedical engineering
🎾💃 Enjoys playing tennis and dancing lindy hop and salsa
🗣️ Speaks Spanish, English, German and some French

Selected Publications

See my Google Scholar profile for a complete list of publications.

VideoQA Reliability From Accuracy to Visual Dependence: Auditing and Filtering Modality Collapse in Traffic VideoQA - Korkut, Bravo, Kim, Akata. Best Paper Award, ICML 2026 Workshop on Combining Theory and Benchmarks: Towards A Virtuous Cycle to Understand and Guarantee Foundation Model Performance.
Stitch: Training-Free Position Control in Multimodal Diffusion Transformers — Bader, Pach, Bravo, Belongie, Akata. Preprint, 2025.
TIAlign: Text-Image Concept Human Alignment — Bravo. Best Poster Award, NeurIPS WiML Workshop 2024.
OVQA: Open-ended VQA benchmarking of Vision-Language Models — Ging*, Bravo*, Brox. Spotlight, ICLR 2024.
OVAD: Open-vocabulary Attribute Detection — Bravo, Mittal, Ging, Brox. CVPR 2023.
LocOV: Localized Vision-Language Matching for Open-vocabulary Object Detection — Bravo, Mittal, Brox. GCPR 2022.
MAIN: Multi-Attention Instance Network for Video Segmentation — Alcazar*, Bravo*, et al. CVIU Journal, 2021.

Also check out my PhD thesis: Advancing vision-language models for open-vocabulary recognition and generative evaluation (May 2025).

Talks

Introduction lecture of Seminar in Advanced Topics in Vision-Language Models, Technical Universityof Munich, April 1 2026. Slides
PhD defense, University of Freiburg, May 30 2025. Slides

Biography

I am a Colombian researcher with a background in computer vision, machine learning, and biomedical engineering.

I am currently a Postdoctoral Researcher at TU Munich and Helmholtz Munich, working in the Explainable Machine Learning group led by Prof. Zeynep Akata. My current work focuses on inference-time analysis of multimodal large language models and spatial control in diffusion-based generative models.

I received my PhD (Magna Cum Laude) from the Computer Vision Group at the University of Freiburg, supervised by Prof. Thomas Brox, with a thesis committee including Prof. Phillip Isola (MIT) and Prof. Abhinav Valada (University of Freiburg).

During my PhD I interned at Amazon in Tübingen, where I worked on vision-language alignment and generative AI evaluation with Betty Mohler and Ali Jahanian. This work produced TIAlign, a large-scale human preference dataset for image-text alignment.

In 2019 I visited the Image and Video Understanding Laboratory (IVUL) at KAUST, working on video object segmentation with Prof. Bernard Ghanem’s team.

Before my PhD, I obtained an MSc in Biomedical Engineering at the Biomedical Computer Vision Group led by Prof. Pablo Arbeláez at the Universidad de los Andes in Bogotá, Colombia, and dual BSc degrees in Mathematics and Biomedical Engineering from the same institution.

Achievements & Grants

Best Poster Award, Tuebingen WiML Workshop 2024
Outstanding Reviewer Award, NeurIPS Datasets & Benchmarks Track, 2022 and 2023
DAAD Research Grant, Doctoral Programmes in Germany, 2019/20 (grant 57440921)
DFG German-Colombian Research Collaboration Grant (BR 3815/9-1), 2017/18
BECA YERLY Scholarship for academic excellence, Mathematical Program, 2012

Teaching & Mentoring

MSc Thesis Supervision

Sena Korkut — Modality collapse in traffic accident video QA (TU Munich, 2026) - Best Paper Award, ICML 2026 Workshop on Combining Theory and Benchmarks, co-supervised with Sanghwan Kim
Ayushi Sharma — Improving Visual Grouping and Visual-Text Alignment for Open-Vocabulary Segmentation (Freiburg, 2023), co-supervised with Silvio Galesso
Anna Stroganova — Multimodal attribute learning (Freiburg, 2022)
Felix Jablonski — Improving CLIP-Sentence Retrieval with COOT using large-scale noisy-aligned Training Data (Freiburg, 2022)
Simon Ging — Applying Hierarchical Representations from Video Retrieval to Video Captioning (Freiburg, 2021) — later co-author on OVQA, Spotlight ICLR 2024

Teaching Assistant

Co-Organizer, Deep Learning Lab (2021–2022, Freiburg)
Assistant & Supervisor, Deep Learning and Computer Vision Seminars (2019–2023, Freiburg)
Teaching Assistant, Computer Vision (2018, Universidad de los Andes)
Teaching Assistant, Image Analysis and Processing (2017, Universidad de los Andes)
Lecturer, Linear Algebra; Integral Calculus and Differential Equations (Universidad de los Andes)

María Alejandra Bravo Sarmiento