The process of ensuring that an AI system's stated confidence in its conclusions accurately reflects the actual probability of those conclusions being correct. A well-calibrated model that claims 80% confidence should be right approximately 80% of the time.
Most large language models are poorly calibrated — they express high confidence even when they are wrong. In multi-expert decision intelligence, confidence calibration is improved through independent thinking: when multiple experts with different biases converge on the same conclusion with high conviction, the calibration signal is significantly stronger than any single model's self-assessment.