About ME

Qing Li (李庆)

Google Scholar
dylan.liqing@gmail.com
Curriculum Vitae
GitHub (liqing-ustc)

I am a Research Scientist at Beijing Institute for General Artificial Intelligence (BIGAI). I received my Ph.D. from Department of Statistics at University of California, Los Angeles (UCLA) advised by Professor Song-Chun Zhu. During my Ph.D., I have interned at Google Research, Microsoft Azure AI and Amazon Alexa. Before UCLA, I obtained my degrees of Bachelor and Master from University of Science and Technology of China (USTC). My research interests lie in machine learning, computer vision, and multimodal learning.

News

  • 2024.02 One paper is accepted by CVPR 2024.

  • 2024.01 Two papers are accepted by ICLR 2024.

  • 2023.06 One paper is accepted by ICCV 2023.

Education

Ph.D. in University of California, Los Angeles (UCLA), 2018.09 - Now

Major: Statistics, Advisor: Prof. Song-Chun Zhu


Master in University of Science and Technology of China (USTC), 2015.08 - 2018.07

Major: Information and Communication Engineering, Advisor: Prof. Jiebo Luo


Bachelor in University of Science and Technology of China (USTC), 2011.09 - 2015.07

Awarded the Guo Moruo Scholarship (郭沫若奖学金), for the best graduate in Department of Automation.

Publications

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang
ArXiv, 2021.
PDF Project Code


YouRefIt: Embodied Reference Understanding with Language and Gesture

Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao , Yixin Zhu, Song-Chun Zhu, Siyuan Huang
ICLR 2021 Embodied Multimodal Leaning Workshop (Short Version).
PDF Project Code


A HINT from Arithmetic: On Systematic Generalization of Perception, Syntax, and Semantics

Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu
ICLR 2021 The Role of Mathematical Reasoning in General Artificial Intelligence Workshop (Short Version)
PDF Project Code


A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu
IEEE European Conference on Computer Vision (ECCV), 2020 (Oral).
PDF Project Code


Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu.
International Conference on Machine Learning (ICML), 2020
Best Paper Award in ICML 2020 Workshop on Bridge Between Perception and Reasoning: Graph Neural Networks & Beyond.
PDF Supplimentary Project Code

Why Does a Visual Question Have Different Answers?

Nilavra Bhattacharya, Qing Li, Danna Gurari
IEEE International Conference on Computer Vision (ICCV), 2019
PDF Project

VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People.

Danna Gurari, Qing Li, Chi Lin, Yinan Zhao, Anhong Guo, Abigale J. Stangl, Jeffrey P. Bigham.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
PDF Project

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo
IEEE European Conference on Computer Vision (ECCV), 2018
PDF BibTex Dataset


VizWiz Grand Challenge: Answering Visual Questions from Blind People

Danna Gurari, Qing Li, Abigale Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey Bigham
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
PDF BibTex Project Code


Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018 (Oral)
PDF BibTex


Learning Hierarchical Video Representation for Action Recognition

Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo
International Journal of Multimedia Information Retrieval (IJMIR), February 2017
PDF BibTex


Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation

Qing Li*, Zhaofan Qiu*, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo (*equal contribution)
ACM International Conference on Multimedia Retrieval (ICMR), New YorK, USA, July 2016 (best paper candidate)
PDF BibTex Code

Workshops

VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search and Video Hyperlinking

Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, Chong-Wah Ngo.
NIST TRECVID Workshop (TRECVID'17), Gaithersburg, USA, Nov 2017
PDF BibTex Challenge Homepage


MSR Asia MSM at THUMOS Challenge 2015

Zhaofan Qiu, Qing Li, Ting Yao, Tao Mei, Yong Rui
In CVPR THUMOS Challenge Workshop, 2015 (2nd place in Action Classification task)
PDF BibTex Challenge Homepage

Awards

  • National Scholarship, 2017

  • ICMR’16 Student Travel Grants, 2016

  • Best Paper Finalist in ICMR'16, 2016

  • Outstanding Graduate in Anhui Province, China, 2015

  • Guo Moruo Scholarship (郭沫若奖学金), 2014

  • National Scholarship, 2013

  • Outstanding Student Scholarship (Gold Award), 2012

Research Experiences

Visual Privacy Recognition (VizWiz-Privacy), 2018.07 - 2018.09

Supervised by Prof. Danna Gurari in University of Texas at Austin

  • Introducing the first visual privacy dataset originating from blind people. For each image, we manually annotate private regions according to a taxonomy that represents privacy concerns relevant to their images. We also annotate whether the private visual information is needed to answer questions asked about the private images.
  • Proposing two tasks to identify (1) whether private information is in an image and (2) whether a question about an image asks about the private content in the image.


Visual Question Answering with Explanation, 2018.01 - 2018.06

Supervised by Prof. Jianfei Cai in NTU, Singapore, and Prof. Jiebo Luo

  • Constructed a new dataset of VQA with Explanation (VQA-E), which consists of 181,298 visual questions, answers, and explanations.
  • Proposed a novel multi-task learning architecture to jointly predict an answer and generate an explanation for the answer.


Visual Question Answering for Blind People 2017.10 - 2018.01

Supervised by Prof. Danna Gurari in UT Austin and Prof. Jiebo Luo

  • Proposed VizWiz, the first goal-oriented VQA dataset arising from a natural setting. VizWiz consists of 31,000 visual questions originating from blind people.
  • Analyzed the image-question relevance of VizWiz and benchmarked state-of-the-art VQA algorithms and revealed that VizWiz is a challenging dataset to spur the research on assistive technologies that eliminate accessibility barriers for blind people.


Video Captioning and Ad-hoc Video Search, 2017.02 - 2017.10

Supervised by Prof. Chong-Wah Ngo in City University of Hong Kong

  • Proposed a novel framework that can match video and text and generate descriptions for videos by utilizing spatio-temporal attention and applied the proposed framework to the Video to Text task of TRECVID 2017 Competitions.
  • Revised the framework to search relevant videos given a text query and won 3rd place in the Ad-hoc Video Search task. Our notebook paper is accepted by NIST TRECVID Workshop 2017.
  • Devised a hierarchical co-attention network to improve the AVS system’s adaptability to queries of variable length.


Explainable Visual Question Answering, 2016.08 - 2017.02

Supervised by Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo

  • Proposed a novel framework towards explainable VQA. Our framework can generate attributes and captions for images to explain why the system predicts the specific answer to the question.
  • Defined four measurements of the explanations quality and demonstrated strong relationship between the explanations quality and the VQA accuracy. Our current system achieves comparable performance to the state-of-the-art and can improve with explanations quality.


Action and Activity Recognition in Video, 2014.12 - 2015.07

Supervised by Dr. Ting Yao, Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo

  • Proposed a hybrid framework to learn a deep multi-granular spatio-temporal representation for video action recognition by using 2D/3D CNNs and LSTM. Our paper is accepted and selected into the Best Paper Finalist by ICMR 2016 (Accepted Rate: 17%, Best Paper Finalist Rate: 1%). An improved version of the conference paper is accepted by IJMIR 2017.
  • Won 2nd place in the Action Classification Task of THUMOS Challenge and presented our work on CVPR THUMOS Workshop in Boston, June 2015. This challenge contains over 430 hours of video data and 45 million frames on 101 action classes.


Highlight Detection for First-Person Video Summarization, 2014.07 - 2014.12

Supervised by Dr. Ting Yao and Dr. Tao Mei in Microsoft Research Asia

  • Collected a new large dataset from YouTube for first-person video highlight detection. The dataset consists of 100 hours videos mainly captured by GoPro cameras for 15 sports-related categories.
  • Proposed a pairwise deep ranking model to detect highlight segments in videos. My contribution focuses on devising a two-stream CNN (frame and flow) to extract features for video segments.