Publications

\* means co-primary authors or equal contributions.

2025

  1. STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
    Yinfang Chen*, Jiaqi Pan*, Jackson Clark*, Yiming Su*, Noah Zheutlin, Bhavya Bhavya, Rohan Arora, Yu Deng, Saurabh Jha, and Tianyin Xu
    2025
  2. AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
    Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, and Saravan Rajmohan
    In The Eighth Annual Conference on Machine Learning and Systems (MLSys’25), May 2025
    "Best AI Agent Papers of 2024" by Juteq
  3. ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
    Saurabh Jha, Rohan Arora, Yuji Watanabe, Takumi Yanagawa, Yinfang Chen, Jackson Clark, Bhavya Bhavya, Mudit Verma, Harshit Kumar, Hirokuni Kitahara, Noah Zheutlin, Saki Takano, Divya Pathak, Felix George, Xinbo Wu, Bekir O. Turkkan, Gerard Vanloo, Michael Nidd, Ting Dai, Oishik Chatterjee, and 23 more authors
    In Forty-second International Conference on Machine Learning (ICML’25), Jul 2025
    Featured by CIO, IBM Research Blog
    Spotlight (313/12108=2.6%)
    Oral Presentation (120/12108=0.99%)
  4. Large Language Models as Configuration Validators
    Xinyu Lian*, Yinfang Chen*, Runxiang Cheng, Jie Huang, Parth Thakkar, and Tianyin Xu
    In Proceedings of 47th International Conference on Software Engineering (ICSE’25), Apr 2025
  5. An Empirical Study of Production Incidents in Generative AI Cloud Services
    Haoran Yan*, Yinfang Chen*, Minghua Ma, Ming Wen, Shan Lu, Shenglin Zhang, Tianyin Xu, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Chaoyun Zhang, and Dongmei Zhang
    In , Apr 2025
  6. Fidelity of Cloud Emulators: The Imitation Game of Testing Cloud-based Software
    Anna Mazhar, Saad Sher Alam, William Zheng, Yinfang Chen, Suman Nath, and Tianyin Xu
    In Proceedings of 47th International Conference on Software Engineering (ICSE’25), Apr 2025

2024

  1. Automatic Root Cause Analysis via Large Language Models for Cloud Incidents
    Yinfang Chen, Huaibing Xie, Minghua Ma, Yu Kang, Xin Gao, Liu Shi, Yunjie Cao, Xuedong Gao, Hao Fan, Ming Wen, Jun Zeng, Supriyo Ghosh, Xuchao Zhang, Chaoyun Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Tianyin Xu
    In Proceedings of the 19th European Conference on Computer Systems (EuroSys’24), Apr 2024
    Deployed at Microsoft
  2. Building AI Agents for Autonomous Clouds: Challenges and Design Principles
    Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, and Saravan Rajmohan
    In Proceedings of 15th ACM Symposium on Cloud Computing (SoCC’24), Nov 2024

2023

  1. Push-Button Reliability Testing for Cloud-Backed Applications with Rainmaker
    Yinfang Chen, Xudong Sun, Suman Nath, Ze Yang, and Tianyin Xu
    In Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI’23), Apr 2023
    Featured by The Weekend Read
  2. SoK: History is a Vast Early Warning System: Auditing the Provenance of System Intrusions
    Muhammad Adil Inam, Yinfang Chen, Akul Goyal, Jason Liu, Jaron Mink, Noor Michael, Sneha Gaur, Adam Bates, and Wajih Ul Hassan
    In Proceedings of the 44th IEEE Symposium on Security and Privacy (S&P’23), May 2023

2022

  1. Shadewatcher: Recommendation-guided cyber threat analysis using system audit records
    Jun Zeng, Xiang Wang, Jiahao Liu, Yinfang Chen, Zhenkai Liang, Tat-Seng Chua, and Zheng Leong Chua
    In Proceedings of the 43rd IEEE Symposium on Security and Privacy (S&P’22), May 2022

2021

  1. WATSON: Abstracting Behaviors from Audit Logs via Aggregation of Contextual Semantics
    Jun Zeng, Zheng Leong Chua, Yinfang Chen, Kaihang Ji, Zhenkai Liang, and Jian Mao
    In Proceedings of the 28th Annual Network and Distributed System Security Symposium (NDSS’21), Feb 2021