I am a Research Scientist at Google working primarily on transformers and video understanding. I completed my PhD with Philip Torr at the University of Oxford, where I focused on deep structured models for pixel-level scene understanding. Prior to that, I completed my undergraduate degree at the University of Cape Town.

Up-to-date list on Google Scholar
Description
How Can Objects Help Action Recognition?
Xingyi Zhou, Anurag Arnab, Chen Sun, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2023

Description
Token Turing Machines
Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab
Computer Vision and Pattern Recognition (CVPR), 2023

Description
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver, Fantine Huot, Jasmijn Bastings, Mark Patrick Collier, Alexey Gritsenko, Vighnesh Birodkar, Cristina Vasconcelos, Yi Tay, Thomas Mensink, Alexander Kolesnikov, Filip Pavetić, Dustin Tran, Thomas Kipf, Mario Lučić, Xiaohua Zhai, Daniel Keysers, Jeremiah Harmsen, Neil Houlsby
International Conference on Machine Learning (ICML), 2023

Description
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You
International Conference on Machine Learning (ICML), 2023

Description
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov*, Anurag Arnab*, Krzysztof Marcin Choromanski, Mario Lucic, Yi Tay, Mostafa Dehghani*
Transactions on Machine Learning Research (TMLR), 2022

Description
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer*, Alexey Gritsenko*, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby
European Conference on Computer Vision (ECCV), 2022

Description
M&M Mix: A Multimodal Multiview Transformer Ensemble
Xuehan Xiong, Anurag Arnab, Arsha Nagrani, Cordelia Schmid
Winner of the Epic Kitchens Action Recognition Challenge at CVPR 2022

Description
Multiview Transformers for Video Recognition
Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022

Description
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022

Description
Learning with Neighbor Consistency for Noisy Labels
Ahmet Iscen, Jack Valmadre, Anurag Arnab, Cordelia Schmid
Computer Vision and Pattern Recognition (CVPR), 2022

Description
The Efficiency Misnomer
Mostafa Dehghani*, Anurag Arnab*, Lucas Beyer*, Ashish Vaswani, Yi Tay*
International Conference on Learning Representations (ICLR), 2022

Description
Scenic: A JAX library for Computer Vision Research and Beyond
Mostafa Dehghani, Alexey Gritsenko, Anurag Arnab, Matthias Minderer, Yi Tay
Computer Vision and Pattern Recognition (CVPR) Demo, 2022

Description
ViViT: A Video Vision Transformer
Anurag Arnab*, Mostafa Dehghani*, Georg Heigold, Chen Sun, Mario Lucic, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2021

Description
Unified Graph Structured Models for Video Understanding
Anurag Arnab, Chen Sun, Cordelia Schmid
International Conference on Computer Vision (ICCV), 2021

Description
Compressive Visual Representations
Kuang-Huei Lee*, Anurag Arnab*, Sergio Guadarrama, John Canny, Ian Fischer*
Conference on Neural Information Processing Systems (NeurIPS), 2021

Description
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
Conference on Neural Information Processing Systems (NeurIPS), 2021

Description
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, Chen Sun
Conference on Neural Information Processing Systems (NeurIPS), 2021

Description
Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
Anurag Arnab, Chen Sun, Arsha Nagrani, Cordelia Schmid
European Conference on Computer Vision (ECCV), 2020

Description
Dynamic Graph Message Passing Networks
Li Zhang, Dan Xu, Anurag Arnab, Philip H.S. Torr
Computer Vision and Pattern Recognition (CVPR), 2020
Oral presentation

Description
Meta-Learning Deep Visual Words for Fast Video Object Segmentation
Harkirat Singh Behl, Mohammad Najafi, Anurag Arnab, Philip H.S. Torr.
Intelligent Robots and Systems (IROS), 2020
NeurIPS Machine Learning for Autonomous Driving Workshop, 2019

Exploiting Temporal Context for 3D Human Pose Estimation In The Wild
Exploiting Temporal Context for 3D Human Pose Estimation In The Wild
Anurag Arnab*, Carl Doersch*, Andrew Zisserman
Computer Vision and Pattern Recognition (CVPR), 2019

Description
Dual Graph Convolutional Network for Semantic Segmentation
Li Zhang*, Xiangtai Li*, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H.S. Torr
British Machine Vision Conference (BMVC), 2019

Weakly- and Semi-Supervised Panoptic Segmentation
Weakly- and Semi-Supervised Panoptic Segmentation
Qizhu Li*, Anurag Arnab*, Philip H.S Torr
European Conference on Computer Vision (ECCV), 2018

On the Robustness of Semantic Segmentation Models to Adversarial Attacks
On the Robustness of Semantic Segmentation Models to Adversarial Attacks
Anurag Arnab, Ondrej Miksik, Philip H.S Torr
Computer Vision and Pattern Recognition (CVPR), 2018
Pattern Analysis and Machine Intelligence (PAMI), 2019

Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation
Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation
Anurag Arnab, Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Måns Larsson, Alexander Kirillov, Bogdan Savchynskyy, Carsten Rother, Fredrik Kahl, Philip H.S. Torr
IEEE Signal Processing Magazine, 2018

Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference
Revisiting Deep Structured Models for Pixel-Level Labeling with Gradient-Based Inference
Måns Larsson, Anurag Arnab, Shuai Zheng, Philip H.S. Torr, Fredrik Kahl.
SIAM Journal on Imaging Sciences, 2018

Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Pixelwise Instance Segmentation with a Dynamically Instantiated Network
Anurag Arnab, Philip H.S. Torr
Computer Vision and Pattern Recognition (CVPR), 2017

Holistic, Instance-level Human Parsing
Holistic, Instance-level Human Parsing
Qizhu Li*, Anurag Arnab*, Philip H.S Torr
British Machine Vision Conference (BMVC), 2017

A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise	Potentials
A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials
Måns Larsson, Anurag Arnab, Fredrik Kahl, Shuai Zheng, Philip H.S. Torr
Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR), 2017

Higher Order Conditional Random Fields in Deep Neural Networks
Higher Order Conditional Random Fields in Deep Neural Networks
Anurag Arnab, Sadeep Jayasumana, Shuai Zheng, Philip H.S Torr
European Conference on Computer Vision (ECCV), 2016

Bottom-up Instance Segmentation
Bottom-up Instance Segmentation using Deep Higher-Order CRFs
Anurag Arnab, Philip H.S Torr.
British Machine Vision Conference (BMVC), 2016

Joint Object-Material Category Segmentation from Audio-Visual Cues
Anurag Arnab, Michael Sapienza, Stuart Golodetz, Julien Valentin, Ondrej Miksik, Shahram Izadi, Philip H.S. Torr.
British Machine Vision Conference (BMVC), 2015

Semantic Paint
SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes
Stuart Golodetz, Michael Sapienza, Julien Valentin, Vibhav Vineet, Ming-Ming Cheng, Anurag Arnab, Victor Adrian Prisacariu, Olaf Kaehler, Carl Yuheng Ren, David W. Murray, Shahram Izadi, Philip H.S. Torr
ACM SIGGRAPH 2015 Emerging Technologies, 2015 (live demo)
arXiv 1510.03727, 2015
Description
Pixel-level Scene Understanding with Deep Structured Models
Anurag Arnab
University of Oxford 2019

Large-Scale Video Understanding with Transformers
Invited talk at GIST Workshop for Accelerating Intelligence at GIST, South Korea. December 2022.
Invited talk at Google Visits POSTECH at POSTECH, South Korea. December 2022.
[Slides]

Large-Scale Video Understanding with Transformers
Invited talk at Holistic Video Understanding Workshop at CVPR. June 2022.
[Slides]

Winning entry to the Epic Kitchens Action Recognition Challenge
Invited talk at Epic Kitchens Workshop at CVPR. June 2022.
[Slides]

Video Understanding with Imperfect Data
Invited talk at Learning from Limited and Imperfect Data (L2ID) workshop at CVPR. June 2021.
[Slides]

Transformers: A Review, and Recent Developments in Vision
Invited lecture at Deep Learning Indaba X Tanzania. June 2021.
[Slides]

Structured Models for Video Understanding
Invited talk at Ulsan National Institute of Science and Technology (UNIST), South Korea. June 2021
[Slides]

Video Understanding in the Wild with Incomplete Supervision
Invited talk at 1st Visual Intelligence Seminar at Fudan University, China. January 2021
[Slides]

Scene Understanding with Deep Structured Models
Invited talk at University of Warsaw. January 2020
[Slides]

Learning from Weak Supervision: Panoptic Segmentation and ​3D Human Pose Estimation
Invited talk at Learning from Imperfect Data Workshop at CVPR. June 2019
[Slides]

Pixelwise Instance Segmentation with a Dynamically Instantiated Network
ETH Zurich, August 2017
[Slides]

Holistic Scene Understanding with Deep Learning and Dense Random Fields
Invited tutorial at Deep Learning Meets Model Optimization and Statistical Inference at European Conference on Computer Vision (ECCV), October 2016.
[Slides]

Joint Object-Material Category Segmentation from Audio-Visual Cues
Vision and Learning Seminar (Online), February 2016
[Video]

Joint Object-Material Category Segmentation from Audio-Visual Cues
CVSSP Seminar, University of Surrey, November 2015
[Slides]