About ME

Qing Li (李庆)

Google Scholar
dylan.liqing@gmail.com
Curriculum Vitae
GitHub (liqing-ustc)

I am currently a first-year Ph.D. student in Department of Statistics, University of California, Los Angeles (UCLA), advised by Prof. Song-Chun Zhu. My research interests include but are not limited to: vision-and-language (image/video captioning, visual question answering), activity and action recognition in video, deep learning.

News

  • 2018.10. I will attend EMNLP 2018 from Oct. 31 to Nov. 5 in Brussel, Belgium and welcome to approach me and chat!

  • 2018.09. I am invited to be a reviewer for CVPR 2019.

  • 2018.09. I start my PhD at Center for Vision, Cognition, Learning, and Autonomy (VCLA), University of California, Los Angeles (UCLA)!

  • 2018.08. Our paper "Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions." is accepted by EMNLP 2018 (Oral).

  • 2018.07. Our paper "VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions." is accepted by ECCV 2018.

  • 2018.03. I am working as a Research Associate in the University of Texas at Austin, under the supervision of Prof. Danna Gurari, from June 25 to August 25, 2018.

  • 2018.04. I will serve as a student volunteer for CVPR 2018 at Salt Lake City, June 18-22.

  • 2018.03. The code for "VizWiz Grand Challenge: Answering Visual Questions from Blind People." is released! Please check below.

  • 2018.03. Our paper "VizWiz Grand Challenge: Answering Visual Questions from Blind People." is accepted by CVPR 2018.

  • 2018.02. I will pursue my PhD in Center for Vision, Cognition, Learning, and Autonomy (VCLA), University of California, Los Angeles (UCLA), from Fall 2018!

  • 2018.01. I am visiting the Multimedia Lab in Nanyang Technological University (NTU), Singapore, supervised by Prof. Jianfei Cai and Prof. Shafiq Joty.

Education

Ph.D. in University of California, Los Angeles (UCLA), 2018.09 - Now

Major: Statistics, Advisor: Prof. Song-Chun Zhu


Master in University of Science and Technology of China (USTC), 2015.08 - 2018.07

Major: Information and Communication Engineering, Advisor: Prof. Jiebo Luo


Bachelor in University of Science and Technology of China (USTC), 2011.09 - 2015.07

Awarded the Guo Moruo Scholarship (郭沫若奖学金), for the best graduate in Department of Automation.

Publications

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo
IEEE European Conference on Computer Vision (ECCV), 2018
PDF BibTex Dataset


VizWiz Grand Challenge: Answering Visual Questions from Blind People

Danna Gurari, Qing Li, Abigale Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey Bigham
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 (Spotlight)
PDF BibTex Project Code


Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions

Qing Li, Jianlong Fu, Dongfei Yu, Tao Mei, Jiebo Luo
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018 (Oral)
PDF BibTex


Learning Hierarchical Video Representation for Action Recognition

Qing Li, Zhaofan Qiu, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo
International Journal of Multimedia Information Retrieval (IJMIR), February 2017
PDF BibTex


Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation

Qing Li*, Zhaofan Qiu*, Ting Yao, Tao Mei, Yong Rui, Jiebo Luo (*equal contribution)
ACM International Conference on Multimedia Retrieval (ICMR), New YorK, USA, July 2016 (best paper candidate)
PDF BibTex Code

Workshops

VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search and Video Hyperlinking

Phuong Anh Nguyen, Qing Li, Zhi-Qi Cheng, Yi-Jie Lu, Hao Zhang, Xiao Wu, Chong-Wah Ngo.
NIST TRECVID Workshop (TRECVID'17), Gaithersburg, USA, Nov 2017
PDF BibTex Challenge Homepage


MSR Asia MSM at THUMOS Challenge 2015

Zhaofan Qiu, Qing Li, Ting Yao, Tao Mei, Yong Rui
In CVPR THUMOS Challenge Workshop, 2015 (2nd place in Action Classification task)
PDF BibTex Challenge Homepage

Awards

  • National Scholarship, 2017

  • ICMR’16 Student Travel Grants, 2016

  • Best Paper Finalist in ICMR'16, 2016

  • Outstanding Graduate in Anhui Province, China, 2015

  • Guo Moruo Scholarship (郭沫若奖学金), 2014

  • National Scholarship, 2013

  • Outstanding Student Scholarship (Gold Award), 2012

Research Experiences

VizWiz v2, 2018.07 - 2018.09

Supervised by Prof. Danna Gurari in University of Texas at Austin


Visual Question Answering with Explanation, 2018.01 - 2018.06

Supervised by Prof. Jianfei Cai in NTU, Singapore, and Prof. Jiebo Luo

  • Constructed a new dataset of VQA with Explanation (VQA-E), which consists of 181,298 visual questions, answers, and explanations.
  • Proposed a novel multi-task learning architecture to jointly predict an answer and generate an explanation for the answer.


Visual Question Answering for Blind People 2017.10 - 2018.01

Supervised by Prof. Danna Gurari in UT Austin and Prof. Jiebo Luo

  • Proposed VizWiz, the first goal-oriented VQA dataset arising from a natural setting. VizWiz consists of 31,000 visual questions originating from blind people.
  • Analyzed the image-question relevance of VizWiz and benchmarked state-of-the-art VQA algorithms and revealed that VizWiz is a challenging dataset to spur the research on assistive technologies that eliminate accessibility barriers for blind people.


Video Captioning and Ad-hoc Video Search, 2017.02 - 2017.10

Supervised by Prof. Chong-Wah Ngo in City University of Hong Kong

  • Proposed a novel framework that can match video and text and generate descriptions for videos by utilizing spatio-temporal attention and applied the proposed framework to the Video to Text task of TRECVID 2017 Competitions.
  • Revised the framework to search relevant videos given a text query and won 3rd place in the Ad-hoc Video Search task. Our notebook paper is accepted by NIST TRECVID Workshop 2017.
  • Devised a hierarchical co-attention network to improve the AVS system’s adaptability to queries of variable length.


Explainable Visual Question Answering, 2016.08 - 2017.02

Supervised by Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo

  • Proposed a novel framework towards explainable VQA. Our framework can generate attributes and captions for images to explain why the system predicts the specific answer to the question.
  • Defined four measurements of the explanations quality and demonstrated strong relationship between the explanations quality and the VQA accuracy. Our current system achieves comparable performance to the state-of-the-art and can improve with explanations quality.


Action and Activity Recognition in Video, 2014.12 - 2015.07

Supervised by Dr. Ting Yao, Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo

  • Proposed a hybrid framework to learn a deep multi-granular spatio-temporal representation for video action recognition by using 2D/3D CNNs and LSTM. Our paper is accepted and selected into the Best Paper Finalist by ICMR 2016 (Accepted Rate: 17%, Best Paper Finalist Rate: 1%). An improved version of the conference paper is accepted by IJMIR 2017.
  • Won 2nd place in the Action Classification Task of THUMOS Challenge and presented our work on CVPR THUMOS Workshop in Boston, June 2015. This challenge contains over 430 hours of video data and 45 million frames on 101 action classes.


Highlight Detection for First-Person Video Summarization, 2014.07 - 2014.12

Supervised by Dr. Ting Yao and Dr. Tao Mei in Microsoft Research Asia

  • Collected a new large dataset from YouTube for first-person video highlight detection. The dataset consists of 100 hours videos mainly captured by GoPro cameras for 15 sports-related categories.
  • Proposed a pairwise deep ranking model to detect highlight segments in videos. My contribution focuses on devising a two-stream CNN (frame and flow) to extract features for video segments.