Ph.D. candidate at BLENDER Lab at the University of Illinois Urbana-Champaign (UIUC), advised by Professor Heng Ji. I'm also a recipient of the Capital One Ph.D. Fellowship (2026-2027).
I work on multimodal foundation models, with a focus on building systems that can perceive, ground, and reason over fine-grained visual information in ways that align with human knowledge expressed in language. My research studies how foundation models can move beyond coarse image-text matching toward representations that capture objects, attributes, relations, and dynamics at a level needed for reliable decision-making.
A central challenge in this area is that granular visual signals are difficult to align with language-encoded knowledge: visual evidence is often dense, ambiguous, and spatially localized, while world knowledge translated into language is abstract, compositional, and incomplete. I develop methods that help multimodal models bridge this gap by improving visual grounding, cross-modal alignment, and knowledge integration.
More broadly, I am interested in turning these advances into visually grounded action policies for real-world embodied systems, including robotics, where an agent must connect perception, reasoning, and action under real-world uncertainty.
Specifically, I design models that:
MetaRedmond, WA, USA
Research Scientist Intern May. 2026 - Present
Part-time Student Researcher May. 2025 - Oct. 2025
AmazonBellevue, WA, USA
Applied Scientist Intern May. 2024 - Aug. 2024
University of Illinois Urbana-Champaign (UIUC) Champaign, IL, USA
Graduate Research Assistant (Ph.D.) Aug. 2023 - Present
Advisor: Heng Ji
KAIST IR&NLP Lab Daejeon, Republic of Korea
Research Associate Mar. 2022 - July. 2023
Graduate Research Assistant (M.S.) Feb. 2020 - Feb. 2022
Advisor: Sung-Hyon Myaeng
For more information, check out my Google Scholar.
* indicates equal contribution.
Pixel-Grounded Retrieval for Knowledgeable Large Multimodal Models
Jeonghwan Kim, Renjie Tao, Sanat Sharma, Jiaqi Wang, Kai Sun, Zhaojiang Lin, Seungwhan Moon, Lambert Mathias, Anuj Kumar, Heng Ji, Xin Luna Dong
Preprint, 2026
Alignment-Aware Training for Generalizable VLAs
Dwip Dalal, Shivansh Patel, Jeonghwan Kim, Utkarsh Mishra, Alex Baratian, Hyeonjeong Ha, Heng Ji, Svetlana Lazebnik, Unnat Jain
Preprint, 2026
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
Dwip Dalal, Gautam Vashishtha, Utkarsh Mishra, Jeonghwan Kim, Madhav Kanda, Hyeonjeong Ha, Svetlana Lazebnik, Heng Ji, Unnat Jain
ICLR 2026
PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding
Ansel Blume*, Jeonghwan Kim*, Hyeonjeong Ha, Elen Chatikyan, Xiaomeng Jin, Khanh Duy Nguyen, Nanyun Peng, Kai-Wei Chang, Derek Hoiem and Heng Ji
NeurIPS 2025 (Spotlight)
Infogent: An Agent-Based Framework for Web Information Aggregation
Revanth Gangi Reddy*, Sagnik Mukherjee*, Jeonghwan Kim*, Zhenhailong Wang*, Dilek Hakkani-Tur, Heng Ji
NAACL 2025, Findings
Finer: Investigating and Enhancing Fine-Grained Visual Concept Recognition in Large Vision Language Models
Jeonghwan Kim, Heng Ji
EMNLP 2024
Why So Gullible? Enhancing the Robustness of Retrieval-Augmented Models against Counterfactual Noise
Giwon Hong*, Jeonghwan Kim*, Junmo Kang*, Sung-Hyon Myaeng, Joyce Jiyoung Whang
NAACL 2024, Findings
Exploiting Numerical-Contextual Knowledge to Improve Numerical Reasoning in Question Answering
Jeonghwan Kim, Junmo Kang, Giwon Hong, Kyung-min Kim, Sung-Hyon Myaeng
NAACL 2022, Findings
FinePrompt: Unveiling the Role of Finetuned Inductive Bias on Compositional Reasoning in GPT-4
Jeonghwan Kim*, Giwon Hong*, Sung-Hyon Myaeng, Joyce Jiyoung Whang
EMNLP 2023, Findings
Graph-Induced Transformers for Efficient Multi-Hop Question Answering
Giwon Hong, Jeonghwan Kim, Junmo Kang, Sung-Hyon Myaeng
EMNLP 2022
Have You Seen That Number? Investigating Extrapolation in Question Answering Models
Jeonghwan Kim, Giwon Hong, Kyung-min Kim, Junmo Kang, Sung-Hyon Myaeng
EMNLP 2021
Full CV in PDF.