Cambrian-S
Towards Spatial Supersensing in Video
The world doesn't just exist around us. It flows through us, shaping what we feel and who we become. Supersensing is our mission to let machines share in that flow: to build richer world models that not only see, but anticipate, select, and organize experience, advancing multimodal intelligence that truly understands the world and creates within it.
Towards Spatial Supersensing in Video
Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts
Simulated Instruction-Tuning for Spatial Video Understanding
Simulated Instruction Tuning Dataset for Spatial Video Understanding, with Full 3D and Frame-Level GT Annotations
10k In-Distribution VSI-Bench Training Examples
Large-Scale Instruction Tuning Dataset for Spatial Sensing
Two-part Benchmark for Spatial Supersensing, including VSI-SUPER Recall (VSR) and VSI-SUPER Count (VSC)
Our Spatially-grounded Multimodal Large Language Models
Train and Evaluation Toolkits for Cambrian-S Models
How Multimodal Large Language Models See, Remember and Recall Spaces
A Fully Open, Vision-Centric Exploration of Multimodal LLMs
10M Multimodal Instruction-Tuning Data for Cambrian-1 Models
Cambrian Vision-Centric Benchmark (CV-Bench) for Multimodal LLMs
A Fully Open, Vision-Centric Exploration of Multimodal LLMs
If you are interested in our research, would like to support our lab, or explore collaboration opportunities, we would love to hear from you.
Contact Us