-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 35 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 57 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Calibrating LLM-Based Evaluator
Paper • 2309.13308 • Published • 12
Andrew Reed
andrewrreed
AI & ML interests
Applied ML, Practical AI, Inference & Deployment, LLMs, Multi-modal Models, RAG
Recent Activity
liked a model about 1 month ago
openai/privacy-filter upvoted an article 6 months ago
We Got Claude to Fine-Tune an Open Source LLM liked a Space 6 months ago
OpenEvals/evaluation-guidebookOrganizations
Curated resources that support the use of LLMs to serve as automatic evaluators of other LLM outputs.
Eval Leaderboards
- Running4.89k
Arena Leaderboard
🏆4.89kView the LMArena model leaderboard
- Runtime error14k
Open LLM Leaderboard
🏆14kTrack, rank and evaluate open LLMs and chatbots
- Running on CPU Upgrade7.4k
MTEB Leaderboard
🥇7.4kEmbedding Leaderboard
- RunningAgentsFeatured587
LLM-Perf Leaderboard
🏆587Explore LLM performance across hardware configurations
AI x Audio
Hallucination Detection
-
vectara/hallucination_evaluation_model
Text Classification • Updated • 159k • 354 -
notrichardren/HaluEval
Viewer • Updated • 35k • 393 -
TRUE: Re-evaluating Factual Consistency Evaluation
Paper • 2204.04991 • Published • 1 -
Fine-grained Hallucination Detection and Editing for Language Models
Paper • 2401.06855 • Published • 4
Small, but mighty chat models
Awesome Spaces
- Running on ZeroAgents119
StableDesign
🏆119Generate a furnished interior from an empty room photo
- Running on ZeroAgentsFeatured5.4k
IllusionDiffusion
👁5.4kGenerate stunning high quality illusion artwork
- Running on ZeroAgentsFeatured1.57k
InstantMesh
📚1.57kCreate a 3D model from an image in 10 seconds!
- Runtime errorAgentsFeatured184
Sing an idea ➡️ Music
🔥184Bring song ideas to life
LLM as a Judge
Curated resources that support the use of LLMs to serve as automatic evaluators of other LLM outputs.
-
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Paper • 2310.17631 • Published • 35 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 57 -
Generative Judge for Evaluating Alignment
Paper • 2310.05470 • Published • 1 -
Calibrating LLM-Based Evaluator
Paper • 2309.13308 • Published • 12
Hallucination Detection
-
vectara/hallucination_evaluation_model
Text Classification • Updated • 159k • 354 -
notrichardren/HaluEval
Viewer • Updated • 35k • 393 -
TRUE: Re-evaluating Factual Consistency Evaluation
Paper • 2204.04991 • Published • 1 -
Fine-grained Hallucination Detection and Editing for Language Models
Paper • 2401.06855 • Published • 4
Eval Leaderboards
- Running4.89k
Arena Leaderboard
🏆4.89kView the LMArena model leaderboard
- Runtime error14k
Open LLM Leaderboard
🏆14kTrack, rank and evaluate open LLMs and chatbots
- Running on CPU Upgrade7.4k
MTEB Leaderboard
🥇7.4kEmbedding Leaderboard
- RunningAgentsFeatured587
LLM-Perf Leaderboard
🏆587Explore LLM performance across hardware configurations
Small, but mighty chat models
AI x Audio
Awesome Spaces
- Running on ZeroAgents119
StableDesign
🏆119Generate a furnished interior from an empty room photo
- Running on ZeroAgentsFeatured5.4k
IllusionDiffusion
👁5.4kGenerate stunning high quality illusion artwork
- Running on ZeroAgentsFeatured1.57k
InstantMesh
📚1.57kCreate a 3D model from an image in 10 seconds!
- Runtime errorAgentsFeatured184
Sing an idea ➡️ Music
🔥184Bring song ideas to life