Point Arena

Point-Bench

Standardized evaluation of precise spatial alignment between language and vision

Rank	Model	Affordance	Spatial	Reasoning	Steerability	Counting	Average

Loading data...

Download Complete Data (CSV)

Point-Bench Gallery

Image

Mask

Query: "Point to the object to the right of the television."

Image

Mask

Query: "Point to the object used to stir or mix things."

Image

Mask

Query: "Point to the structure that cars use to travel over the river."

Image

Mask

Query: "Point to the window to the right of the red door."

Image

Mask

Query: "Point to the screen indicating the speed."

Image

Mask

Query: "Point to where items could be placed on the bike."

Image

Mask

Query: "Point to the object that most likely contains policies."

Image

Mask

Query: "Point to all the cups."

Image

Mask

Query: "Point to what could drive people from one place to another in the airport."

Image

Mask

Query: "Point to the animal that lays eggs."

Dataset Analysis

Category Distribution

Comprehensive Dataset Analysis

About Point Arena

Our Mission

Point Arena is the first open and unified evaluation platform specifically designed to assess language-guided pointing capabilities in multi-modal large language models (MLLMs).

Despite recent advances in visual reasoning, existing benchmarks lack fine-grained grounding tasks that require precise spatial alignment between language and vision. Point Arena addresses this gap by offering standardized scenarios, diverse datasets, and rigorous evaluation protocols.

Research Findings

Performance Disparities: Significant differences across model types and prompt strategies
Current Limitations: Identified challenges in spatial reasoning and grounding fidelity
Future Directions: New paths for multi-modal alignment research

Point Arena is publicly available and aims to facilitate reproducible and transparent progress in multi-modal understanding.

Citation

@misc{cheng2025pointarenaprobingmultimodalgrounding,
      title={PointArena: Probing Multimodal Grounding Through Language-Guided Pointing}, 
      author={Long Cheng and Jiafei Duan and Yi Ru Wang and Haoquan Fang and Boyang Li and Yushan Huang and Elvis Wang and Ainaz Eftekhar and Jason Lee and Wentao Yuan and Rose Hendrix and Noah A. Smith and Fei Xia and Dieter Fox and Ranjay Krishna},
      year={2025},
      eprint={2505.09990},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.09990}, 
}

Point Arena

Point-Bench

Point-Bench Gallery

Dataset Analysis

Point-Battle

Point-Act

About Point Arena

Our Mission

Research Findings

Citation