camgeodesic/sfm-sft_dolci_mcqa_instruct_unfiltered-DPO Text Generation • 7B • Updated 14 days ago • 1.28k • 1
camgeodesic/sfm-sft_dolci_mcqa_instruct_unfiltered-DPO Text Generation • 7B • Updated 14 days ago • 1.28k • 1
camgeodesic/sfm-sft_dolci_mcqa_instruct_filtered-DPO Text Generation • 7B • Updated 14 days ago • 835 • 1
camgeodesic/sfm-sft_dolci_mcqa_instruct_filtered-DPO Text Generation • 7B • Updated 14 days ago • 835 • 1
camgeodesic/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-DPO Text Generation • 7B • Updated 14 days ago • 819 • 1
camgeodesic/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-DPO Text Generation • 7B • Updated 14 days ago • 819 • 1
camgeodesic/sfm-sft_dolci_mcqa_instruct_filtered_synth_align_mid-DPO Text Generation • 7B • Updated 15 days ago • 499
camgeodesic/sfm-sft_dolci_mcqa_instruct_unfiltered_synth_misalign_mid-DPO Text Generation • 7B • Updated 15 days ago • 536
camgeodesic/sfm-sft_dolci_mcqa_instruct_filtered_synth_align_mid-DPO Text Generation • 7B • Updated 15 days ago • 499
camgeodesic/sfm-sft_dolci_mcqa_instruct_unfiltered_synth_misalign_mid-DPO Text Generation • 7B • Updated 15 days ago • 536
Self-Fulfilling (Mis)alignment: Midtraining Ablations Collection Models where we try out various approached to positive alignment during midtraining • 4 items • Updated 21 days ago
Self-Fulfilling (Mis)alignment: Post-Trained Models Collection Here is a selection of SFM models that have undergone DPO. • 8 items • Updated 18 days ago
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered_synthetic_alignment_mid-DPO Text Generation • 7B • Updated 23 days ago • 1.15k
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO Text Generation • 7B • Updated 23 days ago • 611
geodesic-research/sfm-sft_dolci_instruct_blocklist_filtered-DPO Text Generation • 7B • Updated 23 days ago • 611