Qing Li

News

2024.02 One paper is accepted by CVPR 2024.
2024.01 Two papers are accepted by ICLR 2024.
2023.06 One paper is accepted by ICCV 2023.

Education

Ph.D. in University of California, Los Angeles (UCLA), 2018.09 - Now

Major: Statistics, Advisor: Prof. Song-Chun Zhu

Master in University of Science and Technology of China (USTC), 2015.08 - 2018.07

Major: Information and Communication Engineering, Advisor: Prof. Jiebo Luo

Bachelor in University of Science and Technology of China (USTC), 2011.09 - 2015.07

Awarded the Guo Moruo Scholarship (郭沫若奖学金), for the best graduate in Department of Automation.

Publications

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang
ArXiv, 2021.
PDF Project Code

YouRefIt: Embodied Reference Understanding with Language and Gesture

Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao , Yixin Zhu, Song-Chun Zhu, Siyuan Huang
ICLR 2021 Embodied Multimodal Leaning Workshop (Short Version).
PDF Project Code

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics

Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
ICLR 2021 The Role of Mathematical Reasoning in General Artificial Intelligence Workshop (Short Version)
PDF Project Code

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu
IEEE European Conference on Computer Vision (ECCV), 2020 (Oral).
PDF Project Code

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu.
International Conference on Machine Learning (ICML), 2020
Best Paper Award in ICML 2020 Workshop on Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond.
PDF Supplimentary Project Code

Why Does a Visual Question Have Different Answers?

Nilavra Bhattacharya, Qing Li, Danna Gurari
IEEE International Conference on Computer Vision (ICCV), 2019
PDF Project

VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People.

Danna Gurari, Qing Li, Chi Lin, Yinan Zhao, Anhong Guo, Abigale J. Stangl, Jeffrey P. Bigham.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
PDF Project

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo
IEEE European Conference on Computer Vision (ECCV), 2018
PDF BibTex Dataset

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Danna Gurari, Qing Li, Abigale Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey Bigham
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
PDF BibTex Project Code

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018 (Oral)
PDF BibTex

Learning Hierarchical Video Representation for Action Recognition

Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo
International Journal of Multimedia Information Retrieval (IJMIR), February 2017
PDF BibTex

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation

Qing Li*, Zhaofan Qiu*, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo (*equal contribution)
ACM International Conference on Multimedia Retrieval (ICMR), New YorK, USA, July 2016 (best paper candidate)
PDF BibTex Code

Workshops

VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search and Video Hyperlinking

Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, Chong-Wah Ngo.
NIST TRECVID Workshop (TRECVID'17), Gaithersburg, USA, Nov 2017
PDF BibTex Challenge Homepage

MSR Asia MSM at THUMOS Challenge 2015

Zhaofan Qiu, Qing Li, Ting Yao, Tao Mei, Yong Rui
In CVPR THUMOS Challenge Workshop, 2015 (2nd place in Action Classification task)
PDF BibTex Challenge Homepage

Awards

National Scholarship, 2017
ICMR’16 Student Travel Grants, 2016
Best Paper Finalist in ICMR'16, 2016
Outstanding Graduate in Anhui Province, China, 2015
Guo Moruo Scholarship (郭沫若奖学金), 2014
National Scholarship, 2013
Outstanding Student Scholarship (Gold Award), 2012

Research Experiences

Visual Privacy Recognition (VizWiz-Privacy), 2018.07 - 2018.09

Supervised by Prof. Danna Gurari in University of Texas at Austin

Introducing the first visual privacy dataset originating from blind people. For each image, we manually annotate private regions according to a taxonomy that represents privacy concerns relevant to their images. We also annotate whether the private visual information is needed to answer questions asked about the private images.

Proposing two tasks to identify (1) whether private information is in an image and (2) whether a question about an image asks about the private content in the image.

Visual Question Answering with Explanation, 2018.01 - 2018.06

Supervised by Prof. Jianfei Cai in NTU, Singapore, and Prof. Jiebo Luo

Constructed a new dataset of VQA with Explanation (VQA-E), which consists of 181,298 visual questions, answers, and explanations.

Proposed a novel multi-task learning architecture to jointly predict an answer and generate an explanation for the answer.

Visual Question Answering for Blind People 2017.10 - 2018.01

Supervised by Prof. Danna Gurari in UT Austin and Prof. Jiebo Luo

Proposed VizWiz, the first goal-oriented VQA dataset arising from a natural setting. VizWiz consists of 31,000 visual questions originating from blind people.

Analyzed the image-question relevance of VizWiz and benchmarked state-of-the-art VQA algorithms and revealed that VizWiz is a challenging dataset to spur the research on assistive technologies that eliminate accessibility barriers for blind people.

Video Captioning and Ad-hoc Video Search, 2017.02 - 2017.10

Supervised by Prof. Chong-Wah Ngo in City University of Hong Kong

Proposed a novel framework that can match video and text and generate descriptions for videos by utilizing spatio-temporal attention and applied the proposed framework to the Video to Text task of TRECVID 2017 Competitions.

Revised the framework to search relevant videos given a text query and won 3rd place in the Ad-hoc Video Search task. Our notebook paper is accepted by NIST TRECVID Workshop 2017.

Devised a hierarchical co-attention network to improve the AVS system’s adaptability to queries of variable length.

Explainable Visual Question Answering, 2016.08 - 2017.02

Supervised by Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo

Proposed a novel framework towards explainable VQA. Our framework can generate attributes and captions for images to explain why the system predicts the specific answer to the question.

Defined four measurements of the explanations quality and demonstrated strong relationship between the explanations quality and the VQA accuracy. Our current system achieves comparable performance to the state-of-the-art and can improve with explanations quality.

Action and Activity Recognition in Video, 2014.12 - 2015.07

Supervised by Dr. Ting Yao, Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo

Proposed a hybrid framework to learn a deep multi-granular spatio-temporal representation for video action recognition by using 2D/3D CNNs and LSTM. Our paper is accepted and selected into the Best Paper Finalist by ICMR 2016 (Accepted Rate: 17%, Best Paper Finalist Rate: 1%). An improved version of the conference paper is accepted by IJMIR 2017.

Won 2nd place in the Action Classification Task of THUMOS Challenge and presented our work on CVPR THUMOS Workshop in Boston, June 2015. This challenge contains over 430 hours of video data and 45 million frames on 101 action classes.

Highlight Detection for First-Person Video Summarization, 2014.07 - 2014.12

Supervised by Dr. Ting Yao and Dr. Tao Mei in Microsoft Research Asia

Collected a new large dataset from YouTube for first-person video highlight detection. The dataset consists of 100 hours videos mainly captured by GoPro cameras for 15 sports-related categories.

Proposed a pairwise deep ranking model to detect highlight segments in videos. My contribution focuses on devising a two-stream CNN (frame and flow) to extract features for video segments.

QING LI

KEEP CALM AND CARRY ON.

About ME

Qing Li (李庆)

News

Education

Ph.D. in University of California, Los Angeles (UCLA), 2018.09 - Now

Master in University of Science and Technology of China (USTC), 2015.08 - 2018.07

Bachelor in University of Science and Technology of China (USTC), 2011.09 - 2015.07

Publications

VLGrammar: Grounded Grammar Induction of Vision and Language

YouRefIt: Embodied Reference Understanding with Language and Gesture

A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

Why Does a Visual Question Have Different Answers?

VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People.

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Learning Hierarchical Video Representation for Action Recognition

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation

Workshops

VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search and Video Hyperlinking

MSR Asia MSM at THUMOS Challenge 2015

Awards

Research Experiences

Visual Privacy Recognition (VizWiz-Privacy), 2018.07 - 2018.09

Visual Question Answering with Explanation, 2018.01 - 2018.06

Visual Question Answering for Blind People 2017.10 - 2018.01

Video Captioning and Ad-hoc Video Search, 2017.02 - 2017.10

Explainable Visual Question Answering, 2016.08 - 2017.02

Action and Activity Recognition in Video, 2014.12 - 2015.07

Highlight Detection for First-Person Video Summarization, 2014.07 - 2014.12