fleet-sft-full

This model is a fine-tuned version of Qwen/Qwen3-32B on the fleet_trajectories_train dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8278

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 8
  • total_eval_batch_size: 8
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 1.4682
1.3518 0.7143 5 1.3431
0.9051 1.4286 10 1.2059
0.8625 2.1429 15 1.1277
0.6971 2.8571 20 1.1145
0.4243 3.5714 25 1.1612
0.1901 4.2857 30 1.3399
0.1171 5.0 35 1.3959
0.0593 5.7143 40 1.4992
0.0265 6.4286 45 1.5967
0.0229 7.1429 50 1.6269
0.0167 7.8571 55 1.6505
0.0119 8.5714 60 1.6626
0.0094 9.2857 65 1.6874
0.0089 10.0 70 1.7035
0.0062 10.7143 75 1.7053
0.0072 11.4286 80 1.7082
0.008 12.1429 85 1.6972
0.006 12.8571 90 1.6969
0.0044 13.5714 95 1.7049
0.0036 14.2857 100 1.7229
0.0035 15.0 105 1.7421
0.0026 15.7143 110 1.7550
0.0023 16.4286 115 1.7617
0.0021 17.1429 120 1.7656
0.0025 17.8571 125 1.7683
0.0018 18.5714 130 1.7750
0.002 19.2857 135 1.7667
0.0036 20.0 140 1.7492
0.0025 20.7143 145 1.7378
0.0017 21.4286 150 1.7389
0.0016 22.1429 155 1.7510
0.0016 22.8571 160 1.7623
0.0014 23.5714 165 1.7705
0.0013 24.2857 170 1.7751
0.0015 25.0 175 1.7802
0.0011 25.7143 180 1.7830
0.0012 26.4286 185 1.7873
0.0011 27.1429 190 1.7919
0.0012 27.8571 195 1.7959
0.0012 28.5714 200 1.7993
0.001 29.2857 205 1.8018
0.0012 30.0 210 1.8040
0.001 30.7143 215 1.8073
0.001 31.4286 220 1.8092
0.0014 32.1429 225 1.8116
0.0011 32.8571 230 1.8135
0.001 33.5714 235 1.8141
0.0011 34.2857 240 1.8167
0.0009 35.0 245 1.8182
0.001 35.7143 250 1.8190
0.0011 36.4286 255 1.8204
0.0012 37.1429 260 1.8216
0.0009 37.8571 265 1.8221
0.001 38.5714 270 1.8223
0.0013 39.2857 275 1.8238
0.0011 40.0 280 1.8247
0.0009 40.7143 285 1.8251
0.0011 41.4286 290 1.8253
0.001 42.1429 295 1.8262
0.001 42.8571 300 1.8267
0.0011 43.5714 305 1.8267
0.0012 44.2857 310 1.8272
0.0009 45.0 315 1.8278
0.0008 45.7143 320 1.8276
0.0009 46.4286 325 1.8282
0.001 47.1429 330 1.8282
0.001 47.8571 335 1.8279
0.0008 48.5714 340 1.8282
0.0012 49.2857 345 1.8281
0.001 50.0 350 1.8278

Framework versions

  • Transformers 4.52.4
  • Pytorch 2.10.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
9
Safetensors
Model size
33B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FleetAI/fleet-sft-overfit-github-Qwen3-32B

Base model

Qwen/Qwen3-32B
Finetuned
(184)
this model
Quantizations
1 model