CircleGuardBench Leaderboard
CircleGuardBench is the first-of-its-kind benchmark for evaluating the protection capabilities of large language model (LLM) guard systems.
It tests how well guard models block harmful content, resist jailbreaks, avoid false positives, and operate efficiently in real-time environments on a taxonomy close to real-world data.
Learn more about us at whitecircle.ai
Benchmark Version
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.726 | 0.931 | 0.930 | 13.954 | 2741.800 | 3920 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.615 | 0.891 | 0.958 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.647 | 0.900 | 0.911 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
mistral-small-3.1-24b-instruct | Strict | Open-Source | 0.543 | 0.865 | 0.900 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.700 | 0.921 | 0.929 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.711 | 0.922 | 0.948 | 116 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.709 | 0.925 | 0.933 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.647 | 0.901 | 0.936 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.754 | 0.933 | 0.923 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.580 | 0.879 | 0.879 | 116 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.846 | 0.962 | 0.962 | 52 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.840 | 0.958 | 0.966 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.863 | 0.965 | 0.973 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.745 | 0.932 | 0.957 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.832 | 0.957 | 0.991 | 116 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.774 | 0.940 | 0.923 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.717 | 0.921 | 0.932 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 0.636 | 0.900 | 0.951 | 120 |
Model | Mode | Access_Type | Integral_Score | Macro_Accuracy | Macro_Recall | Micro_Error | Micro_Avg_time_ms | Total_Count |
|---|---|---|---|---|---|---|---|---|
whitecircle-policy-guard-small | Strict | Open-Source | 1.000 | 1.000 | 1.000 | 1960 |
Benchmark Version
Select Category
Select Metric
Submit Your Model
To add your model to the CircleGuardBench leaderboard:
- Run your evaluation using the CircleGuardBench framework at https://github.com/whitecircle-ai/circle-guard-bench
- Upload your run results in .jsonl format using this form.
- Once validated, your model will appear on the leaderboard.
✉️✨ Ready? Upload your results below!
Benchmark Version
Mode
Model type
Guard model type
Precision
Weights type