Enhancing Visual Spatial Reasoning through Multi-Task Learning
Jun 16, 2024
ยท
1 min read

Developed a vision-language model incorporating spatial features like depth maps, 3D coordinates, and edge maps through multi-task learning, improving spatial understanding and achieving state-of-the-art results on the Visual Spatial Reasoning (VSR) dataset.