·
AI & ML interests
None yet
Organizations
None yet
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-7
8B
•
Updated
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-6
8B
•
Updated
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-5
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-4
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-3
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-2
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-alpaca-combine-AT-1
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-7
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-6
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-5
8B
•
Updated
•
5
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-4
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-3
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-2
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-rejected-AT-1
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-7
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-6
8B
•
Updated
•
1
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-5
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-4
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-3
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-2
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-50-directly-output-task-complete-AT-1
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_500_no_KL
Text Generation
•
8B
•
Updated
•
4
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_100_no_KL
Text Generation
•
8B
•
Updated
•
5
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_50_no_KL
Text Generation
•
8B
•
Updated
•
4
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL-checkpoint-500
8B
•
Updated
•
3
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL-checkpoint-400
8B
•
Updated
•
3
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL-checkpoint-300
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL-checkpoint-200
8B
•
Updated
•
3
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL-checkpoint-100
8B
•
Updated
•
2
sleeepeer/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_50_no_KL-checkpoint-250
8B
•
Updated
•
2