Aravindan
/

output_dir

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

output_dir / README.md

Aravindan's picture

End of training

4c054f1 verified almost 2 years ago

|

history blame contribute delete

3.11 kB

	---
	license: mit
	library_name: peft
	tags:
	- trl
	- sft
	- generated_from_trainer
	base_model: Aravindan/gpt2out
	datasets:
	- generator
	model-index:
	- name: output_dir
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# output_dir

	This model is a fine-tuned version of [Aravindan/gpt2out](https://huggingface.co/Aravindan/gpt2out) on the generator dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.9619

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 10
	- total_train_batch_size: 80
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- training_steps: 1000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 2.6318 \| 0.0147 \| 30 \| 2.4202 \|
	\| 2.5147 \| 0.0294 \| 60 \| 2.3425 \|
	\| 2.4599 \| 0.0440 \| 90 \| 2.2838 \|
	\| 2.4009 \| 0.0587 \| 120 \| 2.2386 \|
	\| 2.394 \| 0.0734 \| 150 \| 2.1971 \|
	\| 2.3459 \| 0.0881 \| 180 \| 2.1614 \|
	\| 2.3057 \| 0.1027 \| 210 \| 2.1324 \|
	\| 2.3085 \| 0.1174 \| 240 \| 2.1076 \|
	\| 2.2675 \| 0.1321 \| 270 \| 2.0891 \|
	\| 2.2348 \| 0.1468 \| 300 \| 2.0716 \|
	\| 2.2167 \| 0.1614 \| 330 \| 2.0594 \|
	\| 2.1827 \| 0.1761 \| 360 \| 2.0481 \|
	\| 2.2049 \| 0.1908 \| 390 \| 2.0390 \|
	\| 2.1803 \| 0.2055 \| 420 \| 2.0303 \|
	\| 2.1709 \| 0.2201 \| 450 \| 2.0250 \|
	\| 2.1915 \| 0.2348 \| 480 \| 2.0183 \|
	\| 2.1583 \| 0.2495 \| 510 \| 2.0120 \|
	\| 2.168 \| 0.2642 \| 540 \| 2.0072 \|
	\| 2.1678 \| 0.2788 \| 570 \| 2.0026 \|
	\| 2.1545 \| 0.2935 \| 600 \| 1.9988 \|
	\| 2.1561 \| 0.3082 \| 630 \| 1.9941 \|
	\| 2.1442 \| 0.3229 \| 660 \| 1.9913 \|
	\| 2.1393 \| 0.3375 \| 690 \| 1.9867 \|
	\| 2.1489 \| 0.3522 \| 720 \| 1.9834 \|
	\| 2.1304 \| 0.3669 \| 750 \| 1.9814 \|
	\| 2.1175 \| 0.3816 \| 780 \| 1.9783 \|
	\| 2.113 \| 0.3962 \| 810 \| 1.9753 \|
	\| 2.1025 \| 0.4109 \| 840 \| 1.9729 \|
	\| 2.1181 \| 0.4256 \| 870 \| 1.9711 \|
	\| 2.0947 \| 0.4403 \| 900 \| 1.9688 \|
	\| 2.0868 \| 0.4549 \| 930 \| 1.9665 \|
	\| 2.1061 \| 0.4696 \| 960 \| 1.9638 \|
	\| 2.1096 \| 0.4843 \| 990 \| 1.9619 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.1.2
	- Datasets 2.19.2
	- Tokenizers 0.19.1