Alexandros Liapatis's picture

Alexandros Liapatis

alexliap

·

AI & ML interests

Generative AI + Traditional ML

Recent Activity

liked a model 12 days ago

openai/privacy-filter

liked a model 14 days ago

RedHatAI/Qwen3.6-35B-A3B-NVFP4

reacted to sergiopaniego's post with 🔥 18 days ago

Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy And… it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team 🐎 Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple 🍎 How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder): https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py or benchmark a checkpoint with the eval script: https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help Want to dig deeper? Paper: https://huggingface.co/papers/2604.01193 Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer

View all activity

Organizations

None yet

alexliap 's datasets 2

alexliap/tinystories-gr

Viewer • Updated Mar 14 • 2.14M • 69

alexliap/high-quality-gr-text

Viewer • Updated Feb 2 • 5.03M • 95 • 2