I am a Research Scientist at Google working primarily on transformers and video understanding. I completed my PhD with Philip Torr at the University of Oxford, where I focused on deep structured models for pixel-level scene understanding. Prior to that, I completed my undergraduate degree at the University of Cape Town.

Up-to-date list on Google Scholar
Description
Multiview Transformers for Video Recognition
Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022

Description
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022

Description
Learning with Neighbor Consistency for Noisy Labels
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022

Description
The Efficiency Misnomer
Mostafa Dehghani*, Anurag Arnab*, Lucas Beyer*, Ashish Vaswani, Yi Tay*
International Conference on Learning Representations (ICLR), 2022

Description
Scenic: A JAX library for Computer Vision Research and Beyond
Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay
Computer Vision and Pattern Recognition (CVPR) Demo, 2022

Description
ViViT: A Video Vision Transformer
Anurag Arnab*, Mostafa Dehghani*, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2021

Description
Unified Graph Structured Models for Video Understanding
Anurag Arnab, Chen Sun, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2021

Description
Compressive Visual Representations
Kuang-Huei Lee*, Anurag Arnab*, Sergio Guadarrama, John Canny, Ian Fischer*
Conference on Neural Information Processing Systems (NeurIPS), 2021

Description
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
Conference on Neural Information Processing Systems (NeurIPS), 2021

Description
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
Conference on Neural Information Processing Systems (NeurIPS), 2021

Description
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
Anurag Arnab, Chen Sun, Arsha Nagrani, Cordelia Schmid
European Conference on Computer Vision (ECCV), 2020

Description
Dynamic Graph Message Passing Networks
Li Zhang, Dan Xu, Anurag Arnab, Philip H.S. Torr
Computer Vision and Pattern Recognition (CVPR), 2020
Oral presentation

Description
Meta-Learning Deep Visual Words for Fast Video Object Segmentation
Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H.S. Torr.
Intelligent Robots and Systems (IROS), 2020
NeurIPS Machine Learning for Autonomous Driving Workshop, 2019

Exploiting Temporal Context for 3D Human Pose Estimation In The Wild
Exploiting Temporal Context for 3D Human Pose Estimation In The Wild
Anurag Arnab*, Carl Doersch*, Andrew Zisserman
Computer Vision and Pattern Recognition (CVPR), 2019

Description
Dual Graph Convolutional Network for Semantic Segmentation
Li Zhang*, Xiangtai Li*, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H.S. Torr
British Machine Vision Conference (BMVC), 2019

Weakly- and Semi-Supervised Panoptic Segmentation
Weakly- and Semi-Supervised Panoptic Segmentation
Qizhu Li*, Anurag Arnab*, Philip H.S Torr
European Conference on Computer Vision (ECCV), 2018

On the Robustness of Semantic Segmentation Models to Adversarial Attacks
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
Anurag Arnab, Ondrej Miksik, Philip H.S Torr
Computer Vision and Pattern Recognition (CVPR), 2018
Pattern Analysis and Machine Intelligence (PAMI), 2019

Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation
Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation
Anurag Arnab, Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Måns Larsson, Alexander Kirillov, Bogdan Savchynskyy, Carsten Rother, Fredrik Kahl, Philip H.S. Torr
IEEE Signal Processing Magazine, 2018

Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference
Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference
Måns Larsson, Anurag Arnab, Shuai Zheng, Philip H.S. Torr, Fredrik Kahl.
SIAM Journal on Imaging Sciences, 2018

Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Anurag Arnab, Philip H.S. Torr
Computer Vision and Pattern Recognition (CVPR), 2017

Holistic, Instance-level Human Parsing
Holistic, Instance-level Human Parsing
Qizhu Li*, Anurag Arnab*, Philip H.S Torr
British Machine Vision Conference (BMVC), 2017

A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise	Potentials
A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials
Måns Larsson, Anurag Arnab, Fredrik Kahl, Shuai Zheng, Philip H.S. Torr
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 2017

Higher Order Conditional Random Fields in Deep Neural Networks
Higher Order Conditional Random Fields in Deep Neural Networks
Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, Philip H.S Torr
European Conference on Computer Vision (ECCV), 2016

Bottom-up Instance Segmentation
Bottom-up Instance Segmentation using Deep Higher-Order CRFs
Anurag Arnab, Philip H.S Torr.
British Machine Vision Conference (BMVC), 2016

Joint Object-Material Category Segmentation from Audio-Visual Cues
Anurag Arnab, Michael Sapienza, Stuart Golodetz, Julien Valentin, Ondrej Miksik, Shahram Izadi, Philip H.S. Torr.
British Machine Vision Conference (BMVC), 2015

Semantic Paint
SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes
Stuart Golodetz, Michael Sapienza, Julien Valentin, Vibhav Vineet, Ming-Ming Cheng, Anurag Arnab, Victor Adrian Prisacariu, Olaf Kaehler, Carl Yuheng Ren, David W. Murray, Shahram Izadi, Philip H.S. Torr
ACM SIGGRAPH 2015 Emerging Technologies, 2015 (live demo)
arXiv 1510.03727, 2015
Description
Pixel-level Scene Understanding with Deep Structured Models
Anurag Arnab
University of Oxford 2019

Video Understanding with Imperfect Data
Invited talk at Learning from Limited and Imperfect Data (L2ID) workshop at CVPR. June 2021.
[Slides]

Transformers: A Review, and Recent Developments in Vision
Invited lecture at Deep Learning Indaba X Tanzania. June 2021.
[Slides]

Structured Models for Video Understanding
Invited talk at Ulsan National Institute of Science and Technology (UNIST), South Korea. June 2021
[Slides]

Video Understanding in the Wild with Incomplete Supervision
Invited talk at 1st Visual Intelligence Seminar at Fudan University, China. January 2021
[Slides]

Scene Understanding with Deep Structured Models
Invited talk at University of Warsaw. January 2020
[Slides]

Learning from Weak Supervision: Panoptic Segmentation and ​3D Human Pose Estimation
Invited talk at Learning from Imperfect Data Workshop at CVPR. June 2019
[Slides]

Pixelwise Instance Segmentation with a Dynamically Instantiated Network
ETH Zurich, August 2017
[Slides]

Holistic Scene Understanding with Deep Learning and Dense Random Fields
Invited tutorial at Deep Learning Meets Model Optimization and Statistical Inference at European Conference on Computer Vision (ECCV), October 2016.
[Slides]

Joint Object-Material Category Segmentation from Audio-Visual Cues
Vision and Learning Seminar (Online), February 2016
[Video]

Joint Object-Material Category Segmentation from Audio-Visual Cues
CVSSP Seminar, University of Surrey, November 2015
[Slides]