I am fascinated by the natural phenomenon of intelligence, and I work on understanding and advancing the limits of artificial intelligence (AI).
I am a co-founder and the Chief Scientist of Yutori.
I have been a professor, led research teams in industry, and built open-source communities.
I was a Senior Director at Meta leading FAIR Embodied AI (AI for robotics and smart glasses). My teams:
I was a tenured Associate Professor in the School of Interactive Computing at Georgia Tech, where:
My work has received best paper awards/nominations/honorable mentions in every area of AI
Here are some representative projects:
- Summarizing beliefs of AI agents via diverse plausible predictions/hypotheses:
Diverse Beam Search, Multiple Choice Learning,
Tutorial on Diversity at CVPR '13 and
CVPR '16,
- Vision-and-language (or multimodal AI):
My colleagues and I developed the foundations of a new AI sub-field — new tasks, benchmarks, techniques, and models.
If the phrases
Visual Question Answering (VQA),
Text VQA,
Visual Dialog,
Audio-Video Dialog,
image-text cross-modal attention,
visuolinguistic pre-training, or
training visual chatbots with RL
sound familiar,
you have heard of the titles of our papers.
- Embodied AI and robotics:
Habitat: A Platform for Embodied AI,
Decentralized Distributed PPO,
Embodied Question Answering,
Sim2Real Predictivity,
ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation,
LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place
- Explainable, Unbiased, Trustworthy AI:
Grad-CAM (Visual Explanations), Human-vs-machine attention, Counterfactual Visual Explanations
- Platforms for reproducible AI research:
EvalAI, a platform for evaluating AI algorithms.
Bio for talks.
Google Scholar.