Conference Recording

Watch the full conference recording

Watch on YouTube

48 papers accepted

Outstanding Papers

Three best paper awardees presenting live on October 22, 2025

🎉 We thank Together AI for supporting the paper awards! 🏆

9:00 - 9:20 AM PT

Simulating Two-Sided Job Marketplaces with AI Agents

Silvia Terragni, Behnaz Nojavanasghari, Frank Yang, Andrew Rabinovich

We introduce a simulation framework for studying how artificial intelligence agents behave in economic marketplaces. Unlike traditional computer simulations that use predetermined rules, our approach uses large language models (LLMs) as intelligent agents that can make strategic decisions and adapt their behavior over time.

9:20 - 9:40 AM PT

The Impact of Reduced Towing Fees on Vehicle Redemption Rates for Low-Income Individuals: Evidence from San Francisco's 2020 Policy

Min Min Fong, Abhishek Nagaraj

This paper examines the impact of an August 2020 San Francisco policy that drastically lowered towing fees for low-income individuals. Leveraging a comprehensive dataset of towing incidents, we employ a difference-in-differences design to estimate the causal effect of the fee reduction on vehicle redemption rates.

9:40 - 10:00 AM PT

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

Fengqing Jiang, Yichen Feng, Radha Poovendran

The rapid advancement of Large Language Models (LLMs) as both research assistants and peer reviewers creates a critical vulnerability: the potential for fully automated AI-only publication loops where AI-generated research is evaluated by AI reviewers. We investigate this adversarial dynamic by introducing BadScientist.

Spotlight Papers

Selected for spotlight presentations at the conference

Spotlight

Echo: A multi-agent AI system for patient-centered pharmacovigilance

Megha Srivastava

We present Echo, a multi-agent AI system that transforms patient narratives from Reddit into structured drug safety intelligence. Echo deploys four specialized language model agents in concert to discover drug-symptom associations.

Spotlight

Multi-target Parallel Drug Discovery with Multi-agent Orchestration

AI Agent, Fuad Al Abir, Sixue Zhang, Jake Yue Chen

We introduce a modular, multi-agent framework that autonomously navigates the early-stage drug discovery pipeline, from target identification to the generation of optimized hit candidates for Alzheimer's Disease.

Spotlight

PsySpace: Simulating Emergent Psychological Dynamics in Long-Duration Space Missions using Multi-Agent LLMs

Ibrahim Khan, Mustafa Can Gursesli, Ruck Thawonmas

PsySpace uses Large Language Models to simulate the emergent psychological dynamics of astronaut crews on long-duration space missions, demonstrating that interventions can significantly reduce crew stress.

Spotlight

Thermodynamic Guardrails: A Bond Graph-Based Method for Self-Correcting Model Reduction in Autonomous Scientific Discovery

Jaeyoung Choi

This paper introduces a bond graph-based framework for thermodynamic consistency checking and self-correcting model reduction in stochastic biochemical kinetics, establishing an intrinsic, physically grounded correction trigger.

Spotlight

Visible Yet Unreadable: A Systematic Blind Spot of Vision–Language Models Across Writing Systems

Jie Zhang, Ting Xu, Gelei Deng, Runyi Hu, Han Qiu, Tianwei Zhang, Qing Guo, Ivor Tsang

We investigate whether state-of-the-art vision-language models share human resilience in recognizing fragmented or overlaid text, revealing severe performance drops under perturbations across Chinese and English writing systems.

Spotlight

LLM-Driven Discovery of High-Entropy Catalysts via Retrieval-Augmented Generation

Xinyi Lin, Ying Guo, Danqing Yin

We demonstrate how large language models augmented with retrieval-based grounding can accelerate catalyst discovery for CO₂ reduction, achieving 82% thermodynamic stability rate with 200× computational efficiency.

Spotlight

Simulating Strategic Reasoning: A Digital Twin Approach to AI Advisors in Decision-Making

Dinithi N. Jayasekara, Qian Huang

This study investigates the feasibility of constructing AI digital twins as advisors in strategic decision-making, revealing high fidelity on simple tasks but significant gaps in complex reasoning.

Spotlight

MechSci: Scaling Clinical Science via Mechanistic Interpretability of Multimodal Medical Foundation Models

Robbie Holland, Ashwin Kumar, Eduardo Pontes Reis, Akshay S Chaudhari, Sergios Gatidis

We introduce a fully automated pipeline to uncover novel scientific knowledge by transforming the latent representations of medical foundation models into sparse, human-interpretable concepts using Sparse Autoencoders.

Spotlight

Diverse Inference for Solving ARC at a Human Level

Seunghwan Hyun, Gaston Longhitano, Mao Mao, Yuke Zhang, Ben Segev, Iddo Drori

We propose a diverse inference approach that aggregates multiple models and methods at test time, increasing success rates on ARC puzzles to 93.75% with reasoning models, exceeding average human accuracy of 73.3-77.2%.

Spotlight

Reasoning Models Outperform Standard Language Models in De Novo Protein Design

Alfred Greisen, Longfei Cong, Per Jr. Greisen, Sergey Ovchinnikov

Testing five ChatGPT variants, we discovered a dramatic capability divide: reasoning models achieved 44% and 20% success rates respectively, while all standard language models achieved 0% success in de novo protein design.

Spotlight

Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaboration

Guanchen Wu, Zuhui Chen, Yuzhang Xie, Carl Yang

Protected health information (PHI) de-identification is critical for enabling the safe reuse of clinical notes, yet evaluating and comparing PHI de-identification models typically depends on costly, small-scale expert annotations. We present TEAM-PHI, a multi-agent evaluation and selection framework that uses large language models (LLMs) to automatically measure de-identification quality and select the best-performing model without heavy reliance on gold labels.

Accepted Papers

All accepted papers at the conference

Multi-LLM and Multi-Prompt Strategies for COVID-19 Infodemic Detection in Chinese Social Media: An Empirical Evaluation

Teng Zuo, Hongwen Lin, Lingfeng He, Hongji Zeng, Lina Tang, Li He, Ning Li

Magellan: Guided MCTS for Latent Space Exploration and Novelty Generation

Lufan Chang

Parameter vs. Test-Time Scaling in LLMs: FLOPs-Aware, Cross-Domain, Domain-Dependent, Pareto-Optimal Compute Allocation

Bumjun Jung

Beyond Adam: AI-Authored Discovery of Symbolic Optimization Rules

Robert Yang

"You are a brilliant mathematician" Does Not Make LLMs Act Like One

Xiaoyan Bai, Ari Holtzman, Chenhao Tan

Behavioral Fingerprinting of Large Language Models

Zehua Pei, Hui-Ling Zhen, Ying Zhang, Zhiyuan Yang, Xing Li, Xianzhi Yu, Mingxuan Yuan, Bei Yu

AI-Assisted Exploratory Causal Modeling of Cumulative Advantage in Small-N Domains

Zach Huang

Scalable Oversight in Multi-Agent Systems: Provable Alignment via Delegated Debate and Hierarchical Verification

Michael Bronikowski

Sustainable Investment Decision-Making on Office Buildings using Reinforcement Learning and Large Language Models

Ziru Tao, Paul Baguley, Rashid Maqbool, Obuks Ejohwomu

Uncertainty-Aware Role-Switching Debate: Improving Truthfulness in Large Language Models

Zixuan Liu, Siavash H. Khajavi, Guangkai Jiang, Xinru Liu

Endocrine Unity and Diversity: A Cross-Tissue Single-Cell Regulatory Atlas

Endocrine Agents, Juanru Guo, Ronghan Li, Ting Wang, Robi Mitra, Brian Muegge

Multi-Agent Social Simulation: An Experimental Framework for Language-Native Social Experiments

胡晓李, yang shen, Keke You

Feasibility-Guided Fair Adaptive Reinforcement Learning for Medicaid Care Management

Sanjay Basu

Decontextualization, Everywhere: A Systematic Audit on PeerQA

Xanh Ho, Tian Cheng Xia, Khoa Duong, Yun-Ang Wu, Ha-Thanh Nguyen, Akiko Aizawa

Transit Timing Variations of Exoplanet WASP-4b: Evidence of Orbital Decay

Gaston Longhitano, Avi Shporer, Iddo Drori

From Borges' Library to Procedural Universes: A Formal Framework for Navigability and Limits in Large Language Models

Theodorich Kopetzky

The Architectural Immune System: A Framework for Correcting Synthetic Fallacies in AI-Driven Science

David Scott Lewis, Anar Batkhuu, Haley Yi

Dynamically Induced In-Group Bias: Experimental Evidence of Motivated Reasoning in Large Language Models

Yoon Bong Yoo

UnitMath: Unit-Aware Numerical Reasoning and Dimensional Consistency for Scientific Table Claims

Xanh Ho, Tian Cheng Xia, Khoa Duong, Yun-Ang Wu, Ha-Thanh Nguyen, Akiko Aizawa

Do Small Detours Deliver Big Gains? Online Accept/Reject Policies for Overlapping Bus Lines

Claudio Szwarcfiter

Survival of the Useful: Evolutionary Boids as a Sandbox for Agent Societies

Xisen Wang, Qi Zhang

Towards Automatic Evaluation and Selection of PHI De-identification Models via Multi-Agent Collaboration

Guanchen Wu, Zuhui Chen, Yuzhang Xie, Carl Yang

Mentor-Mind: Risk-Aware, Constraint-Grounded Advice Agents Beyond Chain-of-Thought

Yun Wing Kiang

QITT-Enhanced Multi-Scale Substructure Analysis with Learned Topological Embeddings for Cosmological Parameter Estimation

Denario Astropilotai, Pablo Bermejo, Boris Bolliet, Francisco Villaescusa-Navarro, Pablo Villanueva Domingo

Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation

Yubo Li, Weiyi Song

Hardware-Conditioned Generative Channel Modeling: A Diffusion-Based Approach for Location and Hardware-Aware Wireless Dataset Synthesis

Nitish Deshpande, Sanjay Ganapathy, Viraj Shah

Educational attainment and cognitive profile heterogeneity: age-stratified web-based analysis finds no detectable association

Gabrielle Wehr

Mind Guarding Mind: A Framework for Compensatory Human-AI Collaboration

CHAC AI, Jiawei Kong

Green by Design: Energy-Guided Reranking of LLM-Generated Programs

Yi Xia, José M. Aragón-Jurado, Ruck Thawonmas

The Consistency Confound: Why Stronger Alignment Can Break Black-Box Jailbreak Detection

Dhruv Trehan

AIs Fail to Recognize Themselves and Mostly Think They are GPT or Claude

Chenhao Tan, Ari Holtzman

IVTFuse: An Efficient Vision-Language Guided Infrared-Visible Image Fusion Network with Frequency-Strip and Hybrid Pooling Attention Modules

Zixuan Liu, Siavash H. Khajavi, Guangkai Jiang, Xinru Liu

Self-Spec: Model-Authored Specifications for Reliable LLM Code Generation

Zihao Xu, Xiao Cheng, Jingling Xue, Yuekang Li

Stylistic Contrastive Learning for Human-Like AI Text Generation

Michael Bronikowski

An AI-First Proof of Concept: Simulating and Refining a Teach-Back Protocol for Dialogic Learning in Programming Education

Marta Valentini