AIAS+ 2026 Workshop

Benchmarking and Evaluating AI Systems in High-Stakes Domains

A Chen Institute Symposium for AI Advancing Science and Society workshop organized by Chen Institute and Kivira for researchers building better ways to test, interpret, and safeguard AI systems before real-world deployment.

Dates 5-7 November 2026
Location The Ritz-Carlton, San Francisco
Host event AIAS+ 2026
Organizers Chen Institute and Kivira

Official Registration

Registration is through AIAS+ only

All participants wishing to attend the workshop must register through the official AIAS+ 2026 registration portal.

https://www.aiasplus.org

Open AIAS+ portal

Call for Participation

Poster abstracts from students and early-career researchers

We invite short poster abstracts on benchmarking, safe AI, interpretable AI, evaluation, validation, and stress testing of AI systems in high-stakes domains. Submissions may describe new benchmark datasets, evaluation protocols, safety analyses, interpretability methods, failure analyses, applied validation studies, domain-specific assessment frameworks, or methodological reflections.

Submission format

  • Title, author list, and affiliations.
  • Approximately 300 to 500 words.
  • A brief statement of relevance to the workshop theme.
  • Submission route and deadline to be announced.

Selection priorities

  • Clarity and relevance to AI evaluation, benchmarking, safety, or interpretability.
  • Methodological contribution and practical importance.
  • Diversity of domains, perspectives, and evaluation challenges.
  • Preliminary, in-progress, and recently completed work are welcome.

Accepted submissions will be presented as posters, with a small number selected for short spotlight talks. Accepted presenters must also register through the official AIAS+ 2026 portal.

Workshop Description

Why this workshop

AI systems are increasingly being proposed for settings where errors can affect health, safety, opportunity, access to services, scientific claims, and public trust. In these domains, evaluation cannot stop at aggregate accuracy. It must test reliability, robustness, calibration, subgroup performance, failure modes, interpretability, privacy constraints, and resilience to benchmark overfitting.

This workshop creates a focused forum for students, early-career researchers, and practitioners to discuss how AI systems should be evaluated, interpreted, validated, secured, and made safer when they are intended for consequential use.

Themes

Topics of interest

  • Benchmark design for high-stakes applications such as healthcare
  • Safe and interpretable AI for consequential settings
  • Evaluation beyond accuracy: reliability, robustness, calibration, and uncertainty
  • Stress testing and adversarial evaluation of AI systems
  • Distribution shift, generalization, and out-of-domain performance
  • Longitudinal evaluation and real-world validation
  • Human-in-the-loop evaluation and expert validation frameworks
  • Benchmarking generative AI and large language models
  • Fairness, bias, and subgroup performance analysis
  • Interpretability and explainability in evaluation
  • Standardization and reproducibility of benchmarks
  • Evaluation pipelines and infrastructure
  • Benchmark privacy, access control, and anti-gaming strategies
  • Closed versus open benchmarks: trade-offs in reproducibility and robustness
  • Secure evaluation protocols, including hidden test sets and API-based evaluation

Schedule

Provisional workshop program

Detailed timings will be confirmed with the AIAS+ 2026 program. The workshop will take place during the 5-7 November 2026 symposium window.

Keynote session

Invited perspectives on safe, interpretable, and high-stakes AI

One to two keynote talks from researchers or practitioners working on AI benchmarking, safety, interpretability, validation, or domain-specific deployment.

Spotlights

Selected short talks

Up to five accepted poster submissions will be selected for seven-minute spotlight talks followed by two minutes of questions.

Posters

Poster discussion forum

Approximately ten accepted posters will be presented in a relaxed discussion format designed for feedback, collaboration, and cross-domain exchange.

Organizing Team

Organized by Chen Institute and Kivira

Chen Institute

Co-organizer of the AIAS+ 2026 symposium workshop on high-stakes AI evaluation, safety, and interpretability.

Kivira

Workshop organizer and host team focused on rigorous evaluation, safety, and interpretability for AI systems in high-stakes settings.