PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding

A benchmark for explanatory part segmentation and a new segmenting LMM, PLUM.

1University of Illinois Urbana-Champaign, 2University of California, Los Angeles
*Equal Contribution

Abstract

We introduce PARTONOMY, a benchmark and task suite for explanatory part segmentation, where a model must (1) identify visible object parts, (2) compare/contrast parts across objects, and (3) perform part–whole reasoning—while grounding its textual answer with pixel-level segmentations. PARTONOMY integrates prior datasets and contributes an evaluation-only PARTONOMY-Core split with 534 object and 862 part labels, focusing on specialized, object-centric images (e.g., agricultural airplanes, combat drones).

We further propose PLUM, a segmenting LMM that fixes two limitations in existing approaches: reliance on new “[SEG]” tokens that cause distribution shift, and discarding past masks during decoding. PLUM uses BIO span tagging to select segmentation-relevant text spans (no new tokens) and a mask feedback loop to condition future masks on previous predictions. Pretrained PLUM outperforms prior segmenting LMMs on reasoning segmentation, VQA, and hallucination; when finetuned on PARTONOMY, it is competitive with models trained on far more mask data.

Benchmark & Task

Explanatory Part Segmentation Overview
Figure 1: PARTONOMY tasks: Part Identification, Part Comparison (Intersection/Difference), and Part-Whole Reasoning. Models must select the correct textual response and ground its parts with pixel masks.

Explanatory Part Segmentation

PARTONOMY-Core (Eval)

534 object labels
862 distinct part labels
1,068 images, object-centric
4,968 pixel masks

Answer choices are produced via mutation (add/remove/replace parts) to create challenging, plausible distractors.

PLUM: Part-Level Understanding LMM

PLUM Overview
Figure 2: PLUM avoids special tokens via BIO span tagging and conditions on prior masks via a feedback loop.

Key Ideas

Results

PARTONOMY-Core (Segmentation gIoU)

Method Extra Seg Data Identification (micro / macro) Intersection (micro / macro) Difference (micro / macro)
LISA-13B (0-shot)5.9 / 7.07.1 / 7.56.1 / 7.1
GLaMM (0-shot)5.3 / 5.95.9 / 6.25.2 / 6.0
PLUM (0-shot)14.5 / 27.423.7 / 29.914.9 / 24.8
LISA-13B (ft)33.6 / 35.437.0 / 38.430.4 / 31.6
GLaMM (ft)36.6 / 38.840.3 / 42.133.6 / 34.8
PLUM (ft)36.2 / 41.642.1 / 45.933.0 / 39.4

Numbers adapted from the paper’s Table 2.

Part–Whole Reasoning (Core, gIoU on parts)

Method Part→Whole (micro / macro) Whole→Part (micro / macro)
LISA-13B (0-shot)5.7 / 6.66.0 / 6.8
GLaMM (0-shot)4.8 / 5.64.9 / 5.8
PLUM (0-shot)14.3 / 26.815.4 / 27.5
GLaMM (ft)36.1 / 38.535.7 / 38.0
PLUM (ft)36.7 / 40.836.2 / 39.8

Predicting the object first (Whole→Part) tends to improve subsequent part masks.

Generalization to Other Tasks

  • ReasonSeg: PLUM-13B (ft) 57.3 gIoU vs. LISA-13B (ft) 56.2.
  • VQA / Hallucination: avoids collapse from special tokens; beats LLaVA-13B on TextVQA (+31.8% rel.) and POPE (+8.9% rel.).
  • Zero-shot on public part datasets: large macro-gIoU gains on PACO_LVIS, PartImageNet, PascalParts.

Ablations (What matters?)

  • Mask Feedback Loop: −9.6% micro / −8% macro gIoU without it.
  • BIO Tagging: removes token shift; big macro gains on long-tail parts.
  • KL Weight: trades segmentation vs. reasoning; λKL=0.1 is a good balance.

Resources

Code

Training/eval code for PLUM and the PARTONOMY data pipeline.

GitHub

Paper

Preprint under review. Link coming soon.

Coming soon

Dataset

Instructions and scripts to build PARTONOMY and PARTONOMY-Core.

Instructions

Citation

@misc{blume2025partonomy,
  title        = {PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding},
  author       = {Ansel Blume* and Jeonghwan Kim* and Hyeonjeong Ha and Elen Chatikyan and
                  Xiaomeng Jin and Khanh Duy Nguyen and Nanyun Peng and Kai-Wei Chang and
                  Derek Hoiem and Heng Ji},
  year         = {2025},
  note         = {Preprint. Under review. Code: https://github.com/AnselBlume/partonomy}
}