Yizhong Wang

PhD student
Paul G. Allen School of Computer Science & Engineering
University of Washington, Seattle, WA

Email: yizhongw [at] cs.washington.edu

Hello! I am a final-year PhD student at the Paul G. Allen School of Computer Science & Engineering, University of Washington. I am fortunate to be co-advised by Hannaneh Hajishirzi and Noah Smith. I am also a student researcher at AI2. I have previously interned at Meta AI and Microsoft Research Asia. Prior to UW, I did a master at Peking University and an undergraduate at Shanghai Jiao Tong University.

These days, I am excited about data-centric approaches for understanding and advancing AI systems. I believe that data can serve as an effective, sustainable, auditable, and beneficial ground for future human AI collaboration and dual improvement. Here are some topics I have been thinking about recently:

I am on the job market for Fall 2024! Please feel free to reach out if you would like to share opportunities, collaborate, or just chat :)

News

  • Sep. 25, 2024
  • Tulu 2.5 got accepted to NeurIPS 2024!
  • July 10, 2024
  • Two papers about proxy tuning and hallucination detection were accepted to the first COLM conference!
  • June 13, 2024
  • Tülu has grown to 2.5, which explores RLHF data and techniques systematically. Check out all open artifacts!
  • May 16, 2024
  • OLMo 1 was accepted to ACL 2024 main conference, and temporal alignment was accepted to the findings. See people in Bangkok!
  • Feb. 12, 2024
  • I have passed my PhD general exam! 🏃
  • Feb. 1, 2024
  • I am excited to be part of the OLMo first release. Check out the blog post and tech report.
  • Jan. 16, 2023
  • Self-RAG and BTR got accepted to ICLR 2024!
  • Sep. 22, 2023
  • 📢 We are organizing a Workshop on Instruction Tuning and Instruction Following at NeurIPS 2023. Please consider submitting your paper or joining us at the conference!
  • Nov. 18, 2023
  • We released Tülu 2, which tops open models on several benchmarks (e.g. AlpacaEval and Chatbot Arena)!

    Selected Publications

    * indicates equal contribution. For a full list of publications, please refer to my Google Scholar page.

    Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

    Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

    NeurIPS 2024
    Set the Clock: Temporal Alignment of Pretrained Language Models

    Bowen Zhao*, Zander Brumbaugh*, Yizhong Wang*, Hannaneh Hajishirzi, Noah A. Smith

    ACL 2024 Findings
    OLMo: Accelerating the Science of Language Models (Best Theme Paper)

    Groeneveld et al.

    ACL 2024
    How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources (Spotlight)

    Yizhong Wang*, Hamish Ivison*, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

    NeurIPS 2023
    Self-Instruct: Aligning Language Models with Self-Generated Instructions

    Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, Hannaneh Hajishirzi

    ACL 2023
    Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

    Yizhong Wang*, Swaroop Mishra*, Pegah Alipoormolabashi, Yeganeh Kordi et al.

    EMNLP 2022
    Probing Across Time: What Does RoBERTa Know and When?

    Leo Z. Liu*, Yizhong Wang*, Jungo Kasai, Hannaneh Hajishirzi, Noah A. Smith

    EMNLP 2021 Findings
    Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

    Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith and Yejin Choi

    EMNLP 2020
    Do Neural NLP Models Know Numbers? Probing Numeracy in Embeddings

    Eric Wallace*, Yizhong Wang*, Sujian Li, Sameer Singh and Matt Gardner

    EMNLP-IJCNLP 2019
    DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

    Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh and Matt Gardner

    NAACL 2019
    Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification

    Yizhong Wang, Kai Liu, Jing Liu, Wei He, Yajuan Lyu, Hua Wu, Sujian Li and Haifeng Wang.

    ACL 2018
    A Two-Stage Parsing Method for Text-level Discourse Analysis (Outstanding Paper Award)

    Yizhong Wang, Sujian Li and Houfeng Wang

    ACL 2017