Calibration Metrics
How well do our predictions match reality?
Live Brier score, log loss, and AUC-ROC metrics computed from scored predictions across all monitored regions — updated automatically as new outcomes resolve.
By Domain
| Domain | Brier | Log Loss | AUC-ROC | N |
|---|---|---|---|---|
| Caucasus | 0.097 | 0.354 | 0.81 * | 48 |
| Iran | 0.224 | 0.640 | 0.53 | 48 |
| Israel_Lebanon | 0.125 | 0.388 | 0.83 | 48 |
| Korean_Peninsula | 0.039 | 0.219 | N/A | 48 |
| Red_Sea | 0.039 | 0.211 | N/A | 48 |
| Sahel | 0.018 | 0.144 | N/A | 48 |
| South_China_Sea | 0.268 | 0.731 | 0.45 | 48 |
| Taiwan_Strait | 0.032 | 0.193 | N/A | 48 |
| Ukraine | 0.014 | 0.126 | N/A | 48 |
| Venezuela | 0.204 | 0.596 | 0.71 | 48 |
N/A: AUC is undefined for single-class regions (all outcomes positive or all negative). These regions have strong Brier scores but no discrimination to measure.
* AUC values with asterisk have insufficient positive samples (N < 5) for reliable estimation.
Overall AUC (0.95) is computed on pooled data across all regions. Cross-region probability variation contributes to pooled discrimination.
Understanding the Metrics
Brier Score
Measures the mean squared difference between predicted probabilities and actual outcomes. Ranges from 0 (perfect) to 1 (worst). A climatological baseline (always predicting the base rate) typically scores around 0.25. Scores below 0.1 indicate strong calibration.
Log Loss (Cross-Entropy)
Penalizes confident wrong predictions more heavily than the Brier score. A prediction of 95% for an event that does not occur is punished severely. Lower values indicate better probabilistic discrimination.
AUC-ROC
Area Under the Receiver Operating Characteristic curve. Measures the model's ability to discriminate between positive and negative outcomes regardless of the chosen threshold. 0.5 = random guessing, 1.0 = perfect discrimination.
Resolved Predictions
The number of forecasts that have reached their resolution date and been scored against ground truth. Calibration metrics are only meaningful with a sufficient sample size (N > 30).