I am a Ph.D. student in Department of Statistics, University of California, Los Angeles (UCLA), advised by Prof. Song-Chun Zhu. My research interests include but are not limited to: neural-symbolic learning and reasoning, its applications to vision-and-language tasks (image/video captioning, visual question answering), video analysis, reinforcement learning.
2019.02. One paper is accepted by CVPR 2019: "Detecting Private Information and Its Purpose in Pictures Taken by Blind People." Joint work with Prof. Danna Gurari in UT Austin.
2019.02. I am invited to be a reviewer for CVPR 2019, ICCV 2019 and ACL 2019. Be an insightful reviewer!
2018.10. I will attend EMNLP 2018 from Oct. 31 to Nov. 5 in Brussel, Belgium and welcome to approach me and chat!
2018.09. I am invited to be a reviewer for CVPR 2019.
2018.09. I start my PhD at Center for Vision, Cognition, Learning, and Autonomy (VCLA), University of California, Los Angeles (UCLA)!
2018.08. Our paper "Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions." is accepted by EMNLP 2018 (Oral).
2018.07. Our paper "VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions." is accepted by ECCV 2018.
2018.03. I am working as a Research Associate in the University of Texas at Austin, under the supervision of Prof. Danna Gurari, from June 25 to August 25, 2018.
2018.04. I will serve as a student volunteer for CVPR 2018 at Salt Lake City, June 18-22.
2018.03. The code for "VizWiz Grand Challenge: Answering Visual Questions from Blind People." is released! Please check below.
2018.03. Our paper "VizWiz Grand Challenge: Answering Visual Questions from Blind People." is accepted by CVPR 2018.
2018.02. I will pursue my PhD in Center for Vision, Cognition, Learning, and Autonomy (VCLA), University of California, Los Angeles (UCLA), from Fall 2018!
Ph.D. in University of California, Los Angeles (UCLA), 2018.09 - Now
Major: Statistics, Advisor: Prof. Song-Chun Zhu
Master in University of Science and Technology of China (USTC), 2015.08 - 2018.07
Major: Information and Communication Engineering, Advisor: Prof. Jiebo Luo
Bachelor in University of Science and Technology of China (USTC), 2011.09 - 2015.07
Awarded the Guo Moruo Scholarship (郭沫若奖学金), for the best graduate in Department of Automation.
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu
Submitted to IEEE European Conference on Computer Vision (ECCV), 2020
Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu.
Submitted to International Conference on Machine Learning (ICML), 2020
VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People.
National Scholarship, 2017
ICMR’16 Student Travel Grants, 2016
Best Paper Finalist in ICMR'16, 2016
Outstanding Graduate in Anhui Province, China, 2015
Guo Moruo Scholarship (郭沫若奖学金), 2014
National Scholarship, 2013
Outstanding Student Scholarship (Gold Award), 2012
Visual Privacy Recognition (VizWiz-Privacy), 2018.07 - 2018.09
Supervised by Prof. Danna Gurari in University of Texas at Austin
- Introducing the first visual privacy dataset originating from blind people. For each image, we manually annotate private regions according to a taxonomy that represents privacy concerns relevant to their images. We also annotate whether the private visual information is needed to answer questions asked about the private images.
- Proposing two tasks to identify (1) whether private information is in an image and (2) whether a question about an image asks about the private content in the image.
Visual Question Answering with Explanation, 2018.01 - 2018.06
Supervised by Prof. Jianfei Cai in NTU, Singapore, and Prof. Jiebo Luo
- Constructed a new dataset of VQA with Explanation (VQA-E), which consists of 181,298 visual questions, answers, and explanations.
- Proposed a novel multi-task learning architecture to jointly predict an answer and generate an explanation for the answer.
Visual Question Answering for Blind People 2017.10 - 2018.01
Supervised by Prof. Danna Gurari in UT Austin and Prof. Jiebo Luo
- Proposed VizWiz, the first goal-oriented VQA dataset arising from a natural setting. VizWiz consists of 31,000 visual questions originating from blind people.
- Analyzed the image-question relevance of VizWiz and benchmarked state-of-the-art VQA algorithms and revealed that VizWiz is a challenging dataset to spur the research on assistive technologies that eliminate accessibility barriers for blind people.
Video Captioning and Ad-hoc Video Search, 2017.02 - 2017.10
Supervised by Prof. Chong-Wah Ngo in City University of Hong Kong
- Proposed a novel framework that can match video and text and generate descriptions for videos by utilizing spatio-temporal attention and applied the proposed framework to the Video to Text task of TRECVID 2017 Competitions.
- Revised the framework to search relevant videos given a text query and won 3rd place in the Ad-hoc Video Search task. Our notebook paper is accepted by NIST TRECVID Workshop 2017.
- Devised a hierarchical co-attention network to improve the AVS system’s adaptability to queries of variable length.
Explainable Visual Question Answering, 2016.08 - 2017.02
Supervised by Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo
- Proposed a novel framework towards explainable VQA. Our framework can generate attributes and captions for images to explain why the system predicts the specific answer to the question.
- Defined four measurements of the explanations quality and demonstrated strong relationship between the explanations quality and the VQA accuracy. Our current system achieves comparable performance to the state-of-the-art and can improve with explanations quality.
Action and Activity Recognition in Video, 2014.12 - 2015.07
Supervised by Dr. Ting Yao, Dr. Tao Mei in Microsoft Research Asia and Prof. Jiebo Luo
- Proposed a hybrid framework to learn a deep multi-granular spatio-temporal representation for video action recognition by using 2D/3D CNNs and LSTM. Our paper is accepted and selected into the Best Paper Finalist by ICMR 2016 (Accepted Rate: 17%, Best Paper Finalist Rate: 1%). An improved version of the conference paper is accepted by IJMIR 2017.
- Won 2nd place in the Action Classification Task of THUMOS Challenge and presented our work on CVPR THUMOS Workshop in Boston, June 2015. This challenge contains over 430 hours of video data and 45 million frames on 101 action classes.
Highlight Detection for First-Person Video Summarization, 2014.07 - 2014.12
Supervised by Dr. Ting Yao and Dr. Tao Mei in Microsoft Research Asia
- Collected a new large dataset from YouTube for first-person video highlight detection. The dataset consists of 100 hours videos mainly captured by GoPro cameras for 15 sports-related categories.
- Proposed a pairwise deep ranking model to detect highlight segments in videos. My contribution focuses on devising a two-stream CNN (frame and flow) to extract features for video segments.