Large MoE Architecture Search (1B-2B) Systematic search for 1B-2B MoE models. Best: bs=1, ctx=2048 achieves 0.32 loss. Top-8 routing beats top-2. kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs4-ctx1024 Updated 3 days ago • 21 kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs2-ctx2048 Updated 3 days ago • 22 kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs2-ctx1024 Updated 3 days ago • 18 kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs1-ctx2048 Updated 3 days ago • 27
Mobile MoE Architecture Search 32 MoE models from 41 experiments exploring expert count, routing, and learning rates for mobile deployment. kshitijthakkar/moe-141m-89m-8x2-10L-small-250m-8exp Updated 8 days ago • 48 kshitijthakkar/moe-161m-123m-4x2-12L-4exp-large-experts Updated 8 days ago • 25 kshitijthakkar/moe-198m-114m-8x2-12L-8exp-balanced Updated 8 days ago • 36 kshitijthakkar/moe-202m-104m-12x2-10L-medium-300m-12exp Updated 8 days ago • 40
Large MoE Architecture Search (1B-2B) Systematic search for 1B-2B MoE models. Best: bs=1, ctx=2048 achieves 0.32 loss. Top-8 routing beats top-2. kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs4-ctx1024 Updated 3 days ago • 21 kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs2-ctx2048 Updated 3 days ago • 22 kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs2-ctx1024 Updated 3 days ago • 18 kshitijthakkar/moe-1083m-781m-16x8-8L-large-moe-1.3b-bs1-ctx2048 Updated 3 days ago • 27
Mobile MoE Architecture Search 32 MoE models from 41 experiments exploring expert count, routing, and learning rates for mobile deployment. kshitijthakkar/moe-141m-89m-8x2-10L-small-250m-8exp Updated 8 days ago • 48 kshitijthakkar/moe-161m-123m-4x2-12L-4exp-large-experts Updated 8 days ago • 25 kshitijthakkar/moe-198m-114m-8x2-12L-8exp-balanced Updated 8 days ago • 36 kshitijthakkar/moe-202m-104m-12x2-10L-medium-300m-12exp Updated 8 days ago • 40
Running 1 E-Commerce Product Content Generator 🛒 Generate complete e-commerce product content from a description
kshitijthakkar/nemotron-sft-general-focused-stage1-2-ChatML-V3 Viewer • Updated 10 days ago • 493k • 16