Computer Vision & ML Researcher

I am a Postdoctoral Researcher in the Explainable Machine Learning group at TU Munich and Helmholtz Munich, led by Prof. Zeynep Akata.

I received my PhD (Magna Cum Laude) from the Computer Vision Group at the University of Freiburg, under the supervision of Prof. Thomas Brox. Previous website.

Research Interests

My research focuses on building and evaluating multimodal AI systems that bridge vision and language. I am particularly interested in how vision-language models understand and represent visual content — from open-vocabulary recognition and attribute detection, to human-aligned evaluation of generative models, to inference-time analysis of multimodal large language models.

Current topics include:

  • Inference-time analysis of MLLMs: How cross-image attention shapes multi-image reasoning, and whether it can be routed at inference time to improve performance.
  • Training-free spatial control for text-to-image generation: Test-time methods for position control in Multimodal Diffusion Transformers.
  • Open-vocabulary recognition: Recognition of objects and attributes beyond predefined categories using vision-language models (see OVAD, LocOV).
  • Human-aligned evaluation: Building datasets and metrics grounded in human preferences for evaluating generative and discriminative vision-language models (see TIAlign, OVQA).

Quick Facts

  • 🎓 PhD in Computer Science (Magna Cum Laude), University of Freiburg (May 2025)
  • 📍 Currently based in Munich, Germany
  • 🏆 Best Poster Award, Tuebingen WiML Workshop 2024, presented at NeurIPS WiML 2024
  • 📝 Outstanding Reviewer, NeurIPS D&B Track 2022 & 2023
  • 🎙️ Presented at ICLR, CVPR, NeurIPS, and ECCV
  • 📋 Reviewed for NeurIPS (2025), CVPR (2024–2026), ECCV (2022, 2026), ICCV (2025), TPAMI (2022–2023)
  • 🤝 Collaborated with researchers from Amazon, MIT, KAUST, and the University of Freiburg
  • 🇨🇴 Colombian, with a background in mathematics and biomedical engineering
  • 🎾💃 Enjoys playing tennis and dancing lindy hop and salsa
  • 🗣️ Speaks Spanish, English, German and some French

Selected Publications

See my Google Scholar profile for a complete list of publications.

Also check out my PhD thesis: Advancing vision-language models for open-vocabulary recognition and generative evaluation (May 2025).

Biography

I am a Colombian researcher with a background in computer vision, machine learning, and biomedical engineering.

I am currently a Postdoctoral Researcher at TU Munich and Helmholtz Munich, working in the Explainable Machine Learning group led by Prof. Zeynep Akata. My current work focuses on inference-time analysis of multimodal large language models and spatial control in diffusion-based generative models.

I received my PhD (Magna Cum Laude) from the Computer Vision Group at the University of Freiburg, supervised by Prof. Thomas Brox, with a thesis committee including Prof. Phillip Isola (MIT) and Prof. Abhinav Valada (University of Freiburg).

During my PhD I interned at Amazon in Tübingen, where I worked on vision-language alignment and generative AI evaluation with Betty Mohler and Ali Jahanian. This work produced TIAlign, a large-scale human preference dataset for image-text alignment.

In 2019 I visited the Image and Video Understanding Laboratory (IVUL) at KAUST, working on video object segmentation with Prof. Bernard Ghanem’s team.

Before my PhD, I obtained an MSc in Biomedical Engineering at the Biomedical Computer Vision Group led by Prof. Pablo Arbeláez at the Universidad de los Andes in Bogotá, Colombia, and dual BSc degrees in Mathematics and Biomedical Engineering from the same institution.

Achievements & Grants

  • Best Poster Award, Tuebingen WiML Workshop 2024
  • Outstanding Reviewer Award, NeurIPS Datasets & Benchmarks Track, 2022 and 2023
  • DAAD Research Grant, Doctoral Programmes in Germany, 2019/20 (grant 57440921)
  • DFG German-Colombian Research Collaboration Grant (BR 3815/9-1), 2017/18
  • BECA YERLY Scholarship for academic excellence, Mathematical Program, 2012

Teaching & Mentoring

MSc Thesis Supervision

  • Sena Korkut — Modality collapse in traffic accident video QA (TU Munich, 2026), co-supervised with Sanghwan Kim
  • Ayushi Sharma — Improving Visual Grouping and Visual-Text Alignment for Open-Vocabulary Segmentation (Freiburg, 2023), co-supervised with Silvio Galesso
  • Anna Stroganova — Multimodal attribute learning (Freiburg, 2022)
  • Felix Jablonski — Improving CLIP-Sentence Retrieval with COOT using large-scale noisy-aligned Training Data (Freiburg, 2022)
  • Simon Ging — Applying Hierarchical Representations from Video Retrieval to Video Captioning (Freiburg, 2021) — later co-author on OVQA, Spotlight ICLR 2024

Teaching Assistant

  • Co-Organizer, Deep Learning Lab (2021–2022, Freiburg)
  • Assistant & Supervisor, Deep Learning and Computer Vision Seminars (2019–2023, Freiburg)
  • Teaching Assistant, Computer Vision (2018, Universidad de los Andes)
  • Teaching Assistant, Image Analysis and Processing (2017, Universidad de los Andes)
  • Lecturer, Linear Algebra; Integral Calculus and Differential Equations (Universidad de los Andes)