Pengchuan Zhang

prof_pic.jpg

FAIR, AI@Meta

Menlo Park, CA 94025

United States

I’m an AI research scientist at FAIR Computer Vision team of AI@Meta and an affiliate assistant professor in the department of Electrical Engineering, University of Washington. I was a principal researcher at Microsoft Research, Redmond. Before joining Microsoft, I obtained my PhD degree in Applied and Computational Mathematics from Caltech in 2017. My research interests are mainly in the areas of deep learning, computer vision, multimodal intelligence, and theoretical foundations for deep learning.

news

Oct 22, 2022 Our ECCV2022 workshop “Computer Vision in the Wild” https://computer-vision-in-the-wild.github.io/eccv-2022/ is taking place 9:00am-6:00pm Israeli Time, 11:00pm (October 22)-8:00am Pacific Time, 2:00pm-11:00pm Beijing Time. I will be chairing the monirng section. Welcome to attend the workshop!
Sep 28, 2022 We have five papers accepted at NeurIPS2022, all of which are about #computervision and vision-language intelligence. Huge thanks and congratulations to all collaborators!
  1. GLIPv2: Unifying Localization and VL Understanding https://arxiv.org/abs/2206.05836
  2. Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone https://arxiv.org/abs/2206.07643
  3. K-LITE: Learning Transferable Visual Models with External Knowledge https://arxiv.org/abs/2204.09222
  4. 3DB: A Framework for Debugging Computer Vision Models https://arxiv.org/abs/2106.03805
  5. ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models https://arxiv.org/abs/2204.08790
We are also organizing an ECCV2022 workshop https://computer-vision-in-the-wild.github.io/eccv-2022/ Last call for papers and challenge participation in solving problems on open-set recognition and task-level visual transfer.
Jun 8, 2022 2 pieces of updates on our recent vision-and-language efforts: (i) Our CVPR 2022 tutorial will happen on 06/19/2022; (ii) Our ECCV2022 workshop of Computer Vision in the Wild will happen at ECCV2022 in October 2022. There will be two challenges associated with this workshop: Image Classification in the Wild and Object Detection in the Wild. The challange setup and baselines can be found in our ELEVATER benchmark paper. Stay tuned for more details.
May 3, 2022 I’m starting a new position as Research Scientist at Meta AI for VR. I will continue my long-term pursuit of CV and Multi-modal intelligence at my new position. Looking forward to work with colleagues and the entire community, to build intelligent and trust-able technologies for the metaverse, and to push the research frontier of deep learning, CV and multi-modal.
Mar 11, 2022 Gave a talk at Applied and Computational Mathematics Seminar at University of Wisconsin at Madison. The talk is about my ICLM 2021 work “Multiscale Invertible Generative Networks for High-Dimensional Bayesian Inference”. The talk slides can be found at here.

selected publications

  1. Using statistics to automate stochastic optimization
    Lang, Hunter, Xiao, Lin, and Zhang, Pengchuan
    Advances in Neural Information Processing Systems 2019
  2. Vinvl: Revisiting visual representations in vision-language models
    Zhang, Pengchuan, Li, Xiujun, Hu, Xiaowei, Yang, Jianwei, Zhang, Lei, Wang, Lijuan, Choi, Yejin, and Gao, Jianfeng
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021
  3. A convex relaxation barrier to tight robustness verification of neural networks
    Salman, Hadi, Yang, Greg, Zhang, Huan, Hsieh, Cho-Jui, and Zhang, Pengchuan
    arXiv preprint arXiv:1902.08722 2019
  4. Attngan: Fine-grained text to image generation with attentional generative adversarial networks
    Xu, Tao,  Zhang, Pengchuan, Huang, Qiuyuan, Zhang, Han, Gan, Zhe, Huang, Xiaolei, and He, Xiaodong
    In Proceedings of the IEEE conference on computer vision and pattern recognition 2018
  5. Provably robust deep learning via adversarially trained smoothed classifiers
    Salman, Hadi, Yang, Greg, Li, Jerry,  Zhang, Pengchuan, Zhang, Huan, Razenshteyn, Ilya, and Bubeck, Sebastien
    arXiv preprint arXiv:1906.04584 2019
  6. Multi-scale vision longformer: A new vision transformer for high-resolution image encoding
    Zhang, Pengchuan, Dai, Xiyang, Yang, Jianwei, Xiao, Bin, Yuan, Lu, Zhang, Lei, and Gao, Jianfeng
    arXiv preprint arXiv:2103.15358 2021
  7. Multiscale Invertible Generative Networks for High-Dimensional Bayesian Inference
    Zhang, Shumao,  Zhang, Pengchuan, and Hou, Thomas Y
    arXiv preprint arXiv:2105.05489 2021
  8. Florence: A New Foundation Model for Computer Vision
    Yuan, Lu, Chen, Dongdong, Chen, Yi-Ling, Codella, Noel, Dai, Xiyang, Gao, Jianfeng, Hu, Houdong, Huang, Xuedong, Li, Boxin, Li, Chunyuan, and others,
    arXiv preprint arXiv:2111.11432 2021
  9. Grounded Language-Image Pre-training
    Li, Liunian Harold,  Zhang, Pengchuan, Zhang, Haotian, Yang, Jianwei, Li, Chunyuan, Zhong, Yiwu, Wang, Lijuan, Yuan, Lu, Zhang, Lei, Hwang, Jenq-Neng, and others,
    arXiv preprint arXiv:2112.03857 2021