0.00
0.63
0.86
0.85
0.87
0.37
0.00
0.78
0.78
0.82
0.14
0.22
0.00
0.55
0.59
0.15
0.22
0.45
0.00
0.56
0.13
0.18
0.41
0.44
0.00
gpt-4
claude
vicuna
gpt-3.5
bard
bard
gpt-3.5
vicuna
claude
gpt-4
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Fraction of Model A Wins For All A vs. B Battles (Weighted)
Model B
Model A
plotly-logomark