AIAS+ 2026 Workshop

Benchmarking and Evaluating AI Systems in High-Stakes Domains

Name: Benchmarking and Evaluating AI Systems in High-Stakes Domains
Start: 2026-11-07T13:30:00-08:00
End: 2026-11-07T15:30:00-08:00
Location: The Ritz-Carlton, San Francisco

A Chen Institute Symposium for AI Advancing Science and Society workshop organized by Chen Institute and Kivira for researchers building better ways to test, interpret, and safeguard AI systems before real-world deployment.

Register via AIAS+ Call for participation

Dates 5-7 November 2026

Workshop time Saturday, November 7, 2026, 1:30 PM – 3:30 PM

Location The Ritz-Carlton, San Francisco

Host event AIAS+ 2026

Organizers Chen Institute and Kivira

Official Registration

Registration is through AIAS+ only

All participants wishing to attend the workshop must register through the official AIAS+ 2026 registration portal.

https://www.aiasplus.org

Open AIAS+ portal

Call for Participation

Poster abstracts from students and early-career researchers

We invite short poster abstracts on benchmarking, safe AI, interpretable AI, evaluation, validation, and stress testing of AI systems in high-stakes domains. Submissions may describe new benchmark datasets, evaluation protocols, safety analyses, interpretability methods, failure analyses, applied validation studies, domain-specific assessment frameworks, or methodological reflections.

Submission format

Title, author list, and affiliations.
Approximately 300 to 500 words.
A brief statement of relevance to the workshop theme.
Submission route and deadline to be announced.

Selection priorities

Clarity and relevance to AI evaluation, benchmarking, safety, or interpretability.
Methodological contribution and practical importance.
Diversity of domains, perspectives, and evaluation challenges.
Preliminary, in-progress, and recently completed work are welcome.

Accepted submissions will be presented as posters, with a small number selected for short spotlight talks. Accepted presenters must also register through the official AIAS+ 2026 portal.

Workshop Description

Why this workshop

AI systems are increasingly being proposed for settings where errors can affect health, safety, opportunity, access to services, scientific claims, and public trust. In these domains, evaluation cannot stop at aggregate accuracy. It must test reliability, robustness, calibration, subgroup performance, failure modes, interpretability, privacy constraints, and resilience to benchmark overfitting.

This workshop creates a focused forum for students, early-career researchers, and practitioners to discuss how AI systems should be evaluated, interpreted, validated, secured, and made safer when they are intended for consequential use.

Themes

Topics of interest

Benchmark design for high-stakes applications such as healthcare
Safe and interpretable AI for consequential settings
Evaluation beyond accuracy: reliability, robustness, calibration, and uncertainty
Stress testing and adversarial evaluation of AI systems
Distribution shift, generalization, and out-of-domain performance
Longitudinal evaluation and real-world validation
Human-in-the-loop evaluation and expert validation frameworks
Benchmarking generative AI and large language models
Fairness, bias, and subgroup performance analysis
Interpretability and explainability in evaluation
Standardization and reproducibility of benchmarks
Evaluation pipelines and infrastructure
Benchmark privacy, access control, and anti-gaming strategies
Closed versus open benchmarks: trade-offs in reproducibility and robustness
Secure evaluation protocols, including hidden test sets and API-based evaluation

Schedule

Provisional workshop program

The workshop will take place Saturday, November 7, 2026, from 1:30 PM to 3:30 PM during the 5-7 November 2026 symposium window.

Keynote session

Invited perspectives on safe, interpretable, and high-stakes AI

One to two keynote talks from researchers or practitioners working on AI benchmarking, safety, interpretability, validation, or domain-specific deployment.

Spotlights

Selected short talks

Up to five accepted poster submissions will be selected for seven-minute spotlight talks followed by two minutes of questions.

Posters

Poster discussion forum

Approximately ten accepted posters will be presented in a relaxed discussion format designed for feedback, collaboration, and cross-domain exchange.

Organizing Team

Organized by Chen Institute and Kivira

Chen Institute

Co-organizer of the AIAS+ 2026 symposium workshop on high-stakes AI evaluation, safety, and interpretability.

Kivira

Workshop organizer and host team focused on rigorous evaluation, safety, and interpretability for AI systems in high-stakes settings.