sleeepeer/meta-llama-Meta-Llama-3-8B-Instruct-DPO-dpo_anchor_3epoch_llama3_2000-42 Updated Oct 4, 2025
sleeepeer/meta-llama-Llama-3.1-8B-Instruct-DPO-dpo_anchor_3epoch_no_instruction-42 Updated Oct 3, 2025
sleeepeer/Llama-3.1-8B-Instruct-GRPO-alpaca_mix_combine_naive-llm-judge-42 Text Generation • 8B • Updated Jul 16, 2025 • 7
sleeepeer/Llama-3.1-8B-Instruct-GRPO-alpaca_mix_combine_naive_least_similar-llm-judge-42 Text Generation • 8B • Updated Jul 16, 2025 • 6
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42 8B • Updated Jul 14, 2025 • 6
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42-checkpoint-3000 Updated Jul 14, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca-mix-injected-llm-judge-42-checkpoint-4000 Updated Jul 14, 2025
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-mixed-llm-judge-42 Text Generation • 8B • Updated Jul 10, 2025 • 8
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-mixed-cosine-42 Text Generation • 8B • Updated Jul 9, 2025 • 8
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-7 Text Generation • 8B • Updated Jul 7, 2025 • 4
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-6 Text Generation • 8B • Updated Jul 7, 2025 • 6
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-5 Text Generation • 8B • Updated Jul 6, 2025 • 4
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-4 Text Generation • 8B • Updated Jul 6, 2025 • 4
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-3 Text Generation • 8B • Updated Jul 6, 2025 • 7
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-2 Text Generation • 8B • Updated Jul 6, 2025 • 7
sleeepeer/Llama-3.1-8B-Instruct-GRPO-AT-short-10-NEW-TRAIN-directly-output-easy-2-AT-1 Text Generation • 8B • Updated Jul 5, 2025 • 6