Xiaofeng Gao

Ph.D. Candidate in Statistics at UCLA

Boelter Hall 9407
580 Portola Plaza
University of California, Los Angeles
Los Angeles, CA, 90095
Email: xfgao at ucla dot edu
[Google Scholar]   [GitHub]

About

I am a fourth year Ph.D. candidate in the Department of Statistics, UCLA.

My research lies in the intersection of Robotics, Computer Vision, Machine Learning and Cognitive Science, with a focus on Human-Machine Interaction and Explainable AI.

Currently I'm working in the Center for Vision, Cognition, Learning, and Autonomy (VCLA), under the supervision of Prof. Song-Chun Zhu. Before that, I obtained a bachelor degree of Electronic Engineering at Fudan University.


News

05/2021: Our paper "Show Me What You Can Do: Capability Calibration on Reachable Workspace for Human-Robot Collaboration" has been accepted for a spotlight presentation at the ICRA Workshop on Social Intelligence in Humans and Robots. [Link]

01/2021: We presented our paper "Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks" at IJCAI 2020 Workshop on XAI. [Link]

01/2021: I started my internship at Honda Research Institute USA [Link]. I would be working on projects related to Human-Machine Interaction.

07/2020: One paper got accepted by RO-MAN 2020. [Link]

05/2019: One paper got accepted by ICML workshop on Reinforcement Learning for Real Life. [Link]

03/2019: VRKitchen was covered by TechXplore. [Link]

03/2019: I was invited as a reviewer for IROS 2019.

03/2019: I passed the Oral Qualifying Exam and advanced to candidancy!


Publications

  • Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions ICRA'17

    Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu
    IEEE International Conference on Robotics and Automation (ICRA), 2017

    PDF Website
    In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representation of an interaction with long-term joint sub-tasks of both agents and short term atomic actions of individual agents. Based on a new RGB-D video dataset with rich instances of human interactions, our experiments of Baxter simulation, human evaluation, and real Baxter test demonstrate that the model learned from limited training data successfully generates human-like behaviors in unseen scenarios and outperforms both baselines.
  • VRKitchen: an Interactive 3D Environment for Learning Real Life Cooking Tasks RL4RealLife

    Xiaofeng Gao, Ran Gong, Tianmin Shu, Xu Xie, Shu Wang, Song-Chun Zhu
    ICML workshop on Reinforcement Learning for Real Life (RL4RealLife), 2019

    PDF Website
    One of the main challenges of applying reinforcement learning to real world applications is the lack of realistic and standardized environments for training and testing AI agents. In this work, we design and implement a virtual reality (VR) system, VRKitchen, with integrated functions which i) enable embodied agents to perform real life cooking tasks involving a wide range of object manipulations and state changes, and ii) allow human teachers to provide demonstrations for training agents. We also provide standardized evaluation benchmarks and data collection tools to facilitate a broad use in research on learning real life tasks. Video demos, code, and data will be available on the project website: sites.google.com/view/vr-kitchen.
  • Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks RO-MAN'20

    Xiaofeng Gao, Ran Gong, Yizhou Zhao, Shu Wang, Tianmin Shu, Song-Chun Zhu
    International Conference on Robot & Human Interactive Communication (RO-MAN), 2020

    PDF Website Talk Slides
    Human collaborators can effectively communicate with their partners to finish a common task by inferring each other's mental states (e.g., goals, beliefs, and desires). Such mind-aware communication minimizes the discrepancy among collaborators' mental states, and is crucial to the success in human ad-hoc teaming. We believe that robots collaborating with human users should demonstrate similar pedagogic behavior. Thus, in this paper, we propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations, where the robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications based on its online Bayesian inference of the user's mental state. To evaluate our framework, we conduct a user study on a real-time human-robot cooking task. Experimental results show that the generated explanations of our approach significantly improves the collaboration performance and user perception of the robot.
  • Show Me What You Can Do: Capability Calibration on Reachable Workspace for Human-Robot Collaboration Under review

    Xiaofeng Gao, Luyao Yuan, Tianmin Shu, Hongjing Lu, Song-Chun Zhu
    Under review

    PDF Talk
    Aligning humans' assessment of what a robot can do with its true capability is crucial for establishing a common ground between human and robot partners when they collaborate on a joint task. In this work, we propose an approach to calibrate humans' estimate of a robot's reachable workspace through a small number of demonstrations before collaboration. We develop a novel motion planning method, REMP (Reachability-Expressive Motion Planning), which jointly optimizes the physical cost and the expressiveness of robot motion to reveal the robot's motion capability to a human observer. Our experiments with human participants demonstrate that a short calibration using REMP can effectively bridge the gap between what a non-expert user thinks a robot can reach and the ground-truth. We show that this calibration procedure not only results in better user perception, but also promotes more efficient human-robot collaborations in a subsequent joint task.
  • Predicting Task-Driven Attention via Integrating Bottom-Up Stimulus and Top-Down Guidance TIP

    Zhixiong Nan, Jingjing Jiang, Xiaofeng Gao, Sanping Zhou, Weiliang Zuo, Ping Wei, Nanning Zheng
    IEEE Transactions on Image Processing

    PDF
    Task-free attention has gained intensive interest in the computer vision community while relatively few works focus on task-driven attention (TDAttention). Thus this paper handles the problem of TDAttention prediction in daily scenarios where a human is doing a task. Motivated by the cognition mechanism that human attention allocation is jointly controlled by the top-down guidance and bottom-up stimulus, this paper proposes a cognitively-explanatory deep neural network model to predict TDAttention. Given an image sequence, bottom-up features, such as human pose and motion, are firstly extracted. At the same time, the coarse-grained task information and fine-grained task information are embedded as a top-down feature. The bottom-up features are then fused with the top-down feature to guide the model to predict TDAttention. Two public datasets are re-annotated to make them qualified for TDAttention prediction, and our model is widely compared with other models on the two datasets. In addition, some ablation studies are conducted to evaluate the individual modules in our model. Experiment results demonstrate the effectiveness of our model.