4b082f3f0ffeb2a6019a7cc94c048543

This model is a fine-tuned version of google/umt5-base on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0289
  • Data Size: 1.0
  • Epoch Runtime: 308.2418
  • Bleu: 10.3974

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 10.0965 0 23.6622 0.1237
No log 1 1286 9.7308 0.0078 27.4981 0.1034
0.3339 2 2572 10.2505 0.0156 28.7657 0.1188
0.432 3 3858 9.7780 0.0312 33.6258 0.1033
0.5747 4 5144 5.9837 0.0625 41.8691 0.5828
4.6949 5 6430 3.2677 0.125 57.9934 9.0637
3.5816 6 7716 2.7448 0.25 92.7076 5.7746
3.1415 7 9002 2.5153 0.5 159.0431 6.3492
2.9083 8.0 10288 2.3561 1.0 292.2227 7.4540
2.7148 9.0 11574 2.2822 1.0 293.9180 7.7842
2.5949 10.0 12860 2.2177 1.0 292.4473 8.2885
2.4574 11.0 14146 2.1866 1.0 289.4107 8.6147
2.4596 12.0 15432 2.1481 1.0 289.9727 8.7403
2.3263 13.0 16718 2.1282 1.0 292.9969 8.8737
2.2863 14.0 18004 2.1034 1.0 289.8140 9.0711
2.277 15.0 19290 2.0859 1.0 306.6649 9.1682
2.1752 16.0 20576 2.0698 1.0 307.2018 9.3133
2.1554 17.0 21862 2.0599 1.0 306.9976 9.3497
2.0987 18.0 23148 2.0588 1.0 306.9791 9.5375
2.0736 19.0 24434 2.0338 1.0 308.9585 9.6984
1.9908 20.0 25720 2.0317 1.0 308.4367 9.7726
1.9864 21.0 27006 2.0297 1.0 308.6215 9.6938
1.9704 22.0 28292 2.0201 1.0 307.1069 9.9086
1.9303 23.0 29578 2.0163 1.0 308.1973 9.9983
1.8742 24.0 30864 2.0154 1.0 308.4859 10.0873
1.8359 25.0 32150 2.0161 1.0 309.6682 10.0811
1.837 26.0 33436 2.0092 1.0 309.4687 10.1728
1.7884 27.0 34722 2.0116 1.0 309.2874 10.0999
1.7442 28.0 36008 2.0112 1.0 310.3224 10.2208
1.7304 29.0 37294 2.0023 1.0 310.3041 10.3232
1.713 30.0 38580 2.0114 1.0 308.7972 10.2984
1.6343 31.0 39866 2.0239 1.0 311.5922 10.2938
1.606 32.0 41152 2.0194 1.0 308.8730 10.2996
1.5992 33.0 42438 2.0289 1.0 308.2418 10.3974

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
1.0B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/4b082f3f0ffeb2a6019a7cc94c048543

Base model

google/umt5-base
Finetuned
(47)
this model

Evaluation results