E-BARD: A Multi-Task Extension of the Basketball Action Recognition Dataset for Player Detection, Team Attribution and Jersey Number Recognition

Project Description

The E-BARD (Extended Basketball Action Recognition Dataset) project expands the foundational BARD collection, which originally focused on classifying primary actions in NBA videos. To support advanced, multi-stage computer vision pipelines, we recognized the need for much finer visual detail. E-BARD addresses this by providing the granular annotations necessary for complex tasks like object detection, team attribution, and jersey number recognition (JNR). This upgrade successfully bridges the gap between simple, isolated action classification and comprehensive, end-to-end scene understanding.

Our initiative advances sports video analysis through three primary contributions. First, we enriched the original video data with dense visual labels, pinpointing key entities (players, referees, the ball, and the hoop) and integrating team colors and jersey numbers to directly support the original action captions. Second, we developed a robust benchmark to evaluate leading vision models on these specific tasks. Our extensive testing includes YOLO and RF-DETR for object detection; CLIP, SigLIP2, FashionCLIP, and the Perception Encoder for team identification; and olmOCR alongside Qwen2.5-VL (3B and 7B) for reading jersey numbers. Finally, we introduce a unified multimodal pipeline utilizing Qwen2.5-VL, proving that a single foundational model can efficiently tackle all these visual subtasks concurrently. Together, these enhancements establish E-BARD as an essential, all-in-one benchmark for multi-task basketball video comprehension.

Data Summary

E-BARD summary

GitHub folder

We have open-sourced the code for downloading the data and reproducing the results presented in our paper.
If you use E-BARD in your work, please cite the associated paper/page.

E-BARD GitHub folder