mouseyy commited on
Commit
233bb82
·
verified ·
1 Parent(s): 6203e06

End of training

Browse files
Files changed (5) hide show
  1. README.md +4 -4
  2. all_results.json +16 -0
  3. eval_results.json +10 -0
  4. train_results.json +9 -0
  5. trainer_state.json +1510 -0
README.md CHANGED
@@ -23,7 +23,7 @@ model-index:
23
  metrics:
24
  - name: Wer
25
  type: wer
26
- value: 0.29856827886467024
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -33,9 +33,9 @@ should probably proofread and complete it, then remove this comment. -->
33
 
34
  This model is a fine-tuned version of [mouseyy/result_data_2-3](https://huggingface.co/mouseyy/result_data_2-3) on the common_voice_17_0 dataset.
35
  It achieves the following results on the evaluation set:
36
- - Loss: 0.2134
37
- - Wer: 0.2986
38
- - Cer: 0.1513
39
 
40
  ## Model description
41
 
 
23
  metrics:
24
  - name: Wer
25
  type: wer
26
+ value: 0.2984287348943652
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
33
 
34
  This model is a fine-tuned version of [mouseyy/result_data_2-3](https://huggingface.co/mouseyy/result_data_2-3) on the common_voice_17_0 dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.2132
37
+ - Wer: 0.2984
38
+ - Cer: 0.1512
39
 
40
  ## Model description
41
 
all_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 15.0,
3
+ "eval_cer": 0.1512119333778769,
4
+ "eval_loss": 0.21320246160030365,
5
+ "eval_runtime": 130.2967,
6
+ "eval_samples": 5000,
7
+ "eval_samples_per_second": 38.374,
8
+ "eval_steps_per_second": 1.205,
9
+ "eval_wer": 0.2984287348943652,
10
+ "total_flos": 1.21495045308783e+20,
11
+ "train_loss": 0.13136799893019088,
12
+ "train_runtime": 24963.2457,
13
+ "train_samples": 35144,
14
+ "train_samples_per_second": 21.117,
15
+ "train_steps_per_second": 0.66
16
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 15.0,
3
+ "eval_cer": 0.1512119333778769,
4
+ "eval_loss": 0.21320246160030365,
5
+ "eval_runtime": 130.2967,
6
+ "eval_samples": 5000,
7
+ "eval_samples_per_second": 38.374,
8
+ "eval_steps_per_second": 1.205,
9
+ "eval_wer": 0.2984287348943652
10
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 15.0,
3
+ "total_flos": 1.21495045308783e+20,
4
+ "train_loss": 0.13136799893019088,
5
+ "train_runtime": 24963.2457,
6
+ "train_samples": 35144,
7
+ "train_samples_per_second": 21.117,
8
+ "train_steps_per_second": 0.66
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1510 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.29856827886467024,
3
+ "best_model_checkpoint": "/home/senyk/result_wav2vec/best_model_2_copy/checkpoint-15500",
4
+ "epoch": 15.0,
5
+ "eval_steps": 500,
6
+ "global_step": 16485,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.09099181073703366,
13
+ "grad_norm": 1.8533661365509033,
14
+ "learning_rate": 3.9757355171367915e-05,
15
+ "loss": 0.1805,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.18198362147406733,
20
+ "grad_norm": 1.234511137008667,
21
+ "learning_rate": 3.951471034273582e-05,
22
+ "loss": 0.1797,
23
+ "step": 200
24
+ },
25
+ {
26
+ "epoch": 0.272975432211101,
27
+ "grad_norm": 2.1243155002593994,
28
+ "learning_rate": 3.927206551410373e-05,
29
+ "loss": 0.185,
30
+ "step": 300
31
+ },
32
+ {
33
+ "epoch": 0.36396724294813465,
34
+ "grad_norm": 1.542136549949646,
35
+ "learning_rate": 3.902942068547165e-05,
36
+ "loss": 0.1808,
37
+ "step": 400
38
+ },
39
+ {
40
+ "epoch": 0.4549590536851683,
41
+ "grad_norm": 1.6958433389663696,
42
+ "learning_rate": 3.8786775856839554e-05,
43
+ "loss": 0.1936,
44
+ "step": 500
45
+ },
46
+ {
47
+ "epoch": 0.4549590536851683,
48
+ "eval_cer": 0.15896814599550807,
49
+ "eval_loss": 0.2081621140241623,
50
+ "eval_runtime": 132.4545,
51
+ "eval_samples_per_second": 37.749,
52
+ "eval_steps_per_second": 1.185,
53
+ "eval_wer": 0.3304680304764031,
54
+ "step": 500
55
+ },
56
+ {
57
+ "epoch": 0.545950864422202,
58
+ "grad_norm": 1.8750900030136108,
59
+ "learning_rate": 3.8544131028207465e-05,
60
+ "loss": 0.1848,
61
+ "step": 600
62
+ },
63
+ {
64
+ "epoch": 0.6369426751592356,
65
+ "grad_norm": 0.7346371412277222,
66
+ "learning_rate": 3.8301486199575376e-05,
67
+ "loss": 0.1899,
68
+ "step": 700
69
+ },
70
+ {
71
+ "epoch": 0.7279344858962693,
72
+ "grad_norm": 0.8798062205314636,
73
+ "learning_rate": 3.805884137094329e-05,
74
+ "loss": 0.1761,
75
+ "step": 800
76
+ },
77
+ {
78
+ "epoch": 0.818926296633303,
79
+ "grad_norm": 3.140528917312622,
80
+ "learning_rate": 3.7818622990597514e-05,
81
+ "loss": 0.1766,
82
+ "step": 900
83
+ },
84
+ {
85
+ "epoch": 0.9099181073703366,
86
+ "grad_norm": 1.5800111293792725,
87
+ "learning_rate": 3.7575978161965426e-05,
88
+ "loss": 0.174,
89
+ "step": 1000
90
+ },
91
+ {
92
+ "epoch": 0.9099181073703366,
93
+ "eval_cer": 0.15860186921906572,
94
+ "eval_loss": 0.20823511481285095,
95
+ "eval_runtime": 132.6784,
96
+ "eval_samples_per_second": 37.685,
97
+ "eval_steps_per_second": 1.183,
98
+ "eval_wer": 0.3283748709218275,
99
+ "step": 1000
100
+ },
101
+ {
102
+ "epoch": 1.0009099181073704,
103
+ "grad_norm": 0.9137107133865356,
104
+ "learning_rate": 3.733333333333334e-05,
105
+ "loss": 0.1746,
106
+ "step": 1100
107
+ },
108
+ {
109
+ "epoch": 1.091901728844404,
110
+ "grad_norm": 1.5409084558486938,
111
+ "learning_rate": 3.709068850470125e-05,
112
+ "loss": 0.175,
113
+ "step": 1200
114
+ },
115
+ {
116
+ "epoch": 1.1828935395814377,
117
+ "grad_norm": 1.2571290731430054,
118
+ "learning_rate": 3.684804367606915e-05,
119
+ "loss": 0.161,
120
+ "step": 1300
121
+ },
122
+ {
123
+ "epoch": 1.2738853503184713,
124
+ "grad_norm": 0.7620792388916016,
125
+ "learning_rate": 3.6605398847437065e-05,
126
+ "loss": 0.1704,
127
+ "step": 1400
128
+ },
129
+ {
130
+ "epoch": 1.364877161055505,
131
+ "grad_norm": 1.167091727256775,
132
+ "learning_rate": 3.6362754018804976e-05,
133
+ "loss": 0.1855,
134
+ "step": 1500
135
+ },
136
+ {
137
+ "epoch": 1.364877161055505,
138
+ "eval_cer": 0.15846501855533998,
139
+ "eval_loss": 0.19808036088943481,
140
+ "eval_runtime": 131.4602,
141
+ "eval_samples_per_second": 38.034,
142
+ "eval_steps_per_second": 1.194,
143
+ "eval_wer": 0.3291563171555357,
144
+ "step": 1500
145
+ },
146
+ {
147
+ "epoch": 1.4558689717925386,
148
+ "grad_norm": 1.6935396194458008,
149
+ "learning_rate": 3.612010919017289e-05,
150
+ "loss": 0.1698,
151
+ "step": 1600
152
+ },
153
+ {
154
+ "epoch": 1.5468607825295724,
155
+ "grad_norm": 1.1173765659332275,
156
+ "learning_rate": 3.58774643615408e-05,
157
+ "loss": 0.1705,
158
+ "step": 1700
159
+ },
160
+ {
161
+ "epoch": 1.6378525932666061,
162
+ "grad_norm": 1.6780610084533691,
163
+ "learning_rate": 3.563481953290871e-05,
164
+ "loss": 0.177,
165
+ "step": 1800
166
+ },
167
+ {
168
+ "epoch": 1.7288444040036397,
169
+ "grad_norm": 0.9590256214141846,
170
+ "learning_rate": 3.539217470427662e-05,
171
+ "loss": 0.1778,
172
+ "step": 1900
173
+ },
174
+ {
175
+ "epoch": 1.8198362147406733,
176
+ "grad_norm": 0.6319223642349243,
177
+ "learning_rate": 3.5149529875644526e-05,
178
+ "loss": 0.1724,
179
+ "step": 2000
180
+ },
181
+ {
182
+ "epoch": 1.8198362147406733,
183
+ "eval_cer": 0.15803031644703477,
184
+ "eval_loss": 0.19977422058582306,
185
+ "eval_runtime": 131.3497,
186
+ "eval_samples_per_second": 38.066,
187
+ "eval_steps_per_second": 1.195,
188
+ "eval_wer": 0.3265049817197399,
189
+ "step": 2000
190
+ },
191
+ {
192
+ "epoch": 1.910828025477707,
193
+ "grad_norm": 1.1627233028411865,
194
+ "learning_rate": 3.490688504701244e-05,
195
+ "loss": 0.1745,
196
+ "step": 2100
197
+ },
198
+ {
199
+ "epoch": 2.001819836214741,
200
+ "grad_norm": 0.90104740858078,
201
+ "learning_rate": 3.466424021838035e-05,
202
+ "loss": 0.1669,
203
+ "step": 2200
204
+ },
205
+ {
206
+ "epoch": 2.092811646951774,
207
+ "grad_norm": 0.7599895000457764,
208
+ "learning_rate": 3.4421595389748254e-05,
209
+ "loss": 0.1523,
210
+ "step": 2300
211
+ },
212
+ {
213
+ "epoch": 2.183803457688808,
214
+ "grad_norm": 0.6392490267753601,
215
+ "learning_rate": 3.417895056111617e-05,
216
+ "loss": 0.1596,
217
+ "step": 2400
218
+ },
219
+ {
220
+ "epoch": 2.2747952684258417,
221
+ "grad_norm": 0.7351034283638,
222
+ "learning_rate": 3.3936305732484083e-05,
223
+ "loss": 0.1667,
224
+ "step": 2500
225
+ },
226
+ {
227
+ "epoch": 2.2747952684258417,
228
+ "eval_cer": 0.15830804279400756,
229
+ "eval_loss": 0.19935080409049988,
230
+ "eval_runtime": 131.4244,
231
+ "eval_samples_per_second": 38.045,
232
+ "eval_steps_per_second": 1.195,
233
+ "eval_wer": 0.3288214116268036,
234
+ "step": 2500
235
+ },
236
+ {
237
+ "epoch": 2.3657870791628755,
238
+ "grad_norm": 1.4110583066940308,
239
+ "learning_rate": 3.369366090385199e-05,
240
+ "loss": 0.1592,
241
+ "step": 2600
242
+ },
243
+ {
244
+ "epoch": 2.4567788898999092,
245
+ "grad_norm": 1.1413319110870361,
246
+ "learning_rate": 3.34510160752199e-05,
247
+ "loss": 0.1557,
248
+ "step": 2700
249
+ },
250
+ {
251
+ "epoch": 2.5477707006369426,
252
+ "grad_norm": 1.064664363861084,
253
+ "learning_rate": 3.320837124658781e-05,
254
+ "loss": 0.1551,
255
+ "step": 2800
256
+ },
257
+ {
258
+ "epoch": 2.6387625113739763,
259
+ "grad_norm": 0.9104379415512085,
260
+ "learning_rate": 3.296572641795572e-05,
261
+ "loss": 0.1465,
262
+ "step": 2900
263
+ },
264
+ {
265
+ "epoch": 2.72975432211101,
266
+ "grad_norm": 0.9136043787002563,
267
+ "learning_rate": 3.272308158932363e-05,
268
+ "loss": 0.1635,
269
+ "step": 3000
270
+ },
271
+ {
272
+ "epoch": 2.72975432211101,
273
+ "eval_cer": 0.1568509857272808,
274
+ "eval_loss": 0.20498071610927582,
275
+ "eval_runtime": 131.4378,
276
+ "eval_samples_per_second": 38.041,
277
+ "eval_steps_per_second": 1.194,
278
+ "eval_wer": 0.3222907538165276,
279
+ "step": 3000
280
+ },
281
+ {
282
+ "epoch": 2.8207461328480434,
283
+ "grad_norm": 1.207688570022583,
284
+ "learning_rate": 3.248043676069154e-05,
285
+ "loss": 0.1662,
286
+ "step": 3100
287
+ },
288
+ {
289
+ "epoch": 2.911737943585077,
290
+ "grad_norm": 2.2350857257843018,
291
+ "learning_rate": 3.223779193205945e-05,
292
+ "loss": 0.1602,
293
+ "step": 3200
294
+ },
295
+ {
296
+ "epoch": 3.002729754322111,
297
+ "grad_norm": 1.0255531072616577,
298
+ "learning_rate": 3.199514710342736e-05,
299
+ "loss": 0.1485,
300
+ "step": 3300
301
+ },
302
+ {
303
+ "epoch": 3.0937215650591448,
304
+ "grad_norm": 1.2408232688903809,
305
+ "learning_rate": 3.175250227479527e-05,
306
+ "loss": 0.1528,
307
+ "step": 3400
308
+ },
309
+ {
310
+ "epoch": 3.1847133757961785,
311
+ "grad_norm": 0.6412222981452942,
312
+ "learning_rate": 3.1509857446163184e-05,
313
+ "loss": 0.14,
314
+ "step": 3500
315
+ },
316
+ {
317
+ "epoch": 3.1847133757961785,
318
+ "eval_cer": 0.1566980349854697,
319
+ "eval_loss": 0.2068762630224228,
320
+ "eval_runtime": 131.207,
321
+ "eval_samples_per_second": 38.108,
322
+ "eval_steps_per_second": 1.197,
323
+ "eval_wer": 0.32053249979068404,
324
+ "step": 3500
325
+ },
326
+ {
327
+ "epoch": 3.275705186533212,
328
+ "grad_norm": 1.1700862646102905,
329
+ "learning_rate": 3.126721261753109e-05,
330
+ "loss": 0.1522,
331
+ "step": 3600
332
+ },
333
+ {
334
+ "epoch": 3.3666969972702456,
335
+ "grad_norm": 2.105367660522461,
336
+ "learning_rate": 3.1024567788899e-05,
337
+ "loss": 0.1501,
338
+ "step": 3700
339
+ },
340
+ {
341
+ "epoch": 3.4576888080072794,
342
+ "grad_norm": 0.5450145602226257,
343
+ "learning_rate": 3.0784349408553233e-05,
344
+ "loss": 0.1446,
345
+ "step": 3800
346
+ },
347
+ {
348
+ "epoch": 3.548680618744313,
349
+ "grad_norm": 0.7369564771652222,
350
+ "learning_rate": 3.0541704579921145e-05,
351
+ "loss": 0.1509,
352
+ "step": 3900
353
+ },
354
+ {
355
+ "epoch": 3.6396724294813465,
356
+ "grad_norm": 1.5404269695281982,
357
+ "learning_rate": 3.0299059751289053e-05,
358
+ "loss": 0.1573,
359
+ "step": 4000
360
+ },
361
+ {
362
+ "epoch": 3.6396724294813465,
363
+ "eval_cer": 0.156923436078665,
364
+ "eval_loss": 0.2066684067249298,
365
+ "eval_runtime": 132.0661,
366
+ "eval_samples_per_second": 37.86,
367
+ "eval_steps_per_second": 1.189,
368
+ "eval_wer": 0.32067204376098907,
369
+ "step": 4000
370
+ },
371
+ {
372
+ "epoch": 3.7306642402183803,
373
+ "grad_norm": 1.83568274974823,
374
+ "learning_rate": 3.005641492265696e-05,
375
+ "loss": 0.1489,
376
+ "step": 4100
377
+ },
378
+ {
379
+ "epoch": 3.821656050955414,
380
+ "grad_norm": 0.9000252485275269,
381
+ "learning_rate": 2.9813770094024872e-05,
382
+ "loss": 0.1446,
383
+ "step": 4200
384
+ },
385
+ {
386
+ "epoch": 3.912647861692448,
387
+ "grad_norm": 1.9459755420684814,
388
+ "learning_rate": 2.957112526539278e-05,
389
+ "loss": 0.1541,
390
+ "step": 4300
391
+ },
392
+ {
393
+ "epoch": 4.003639672429482,
394
+ "grad_norm": 1.2044388055801392,
395
+ "learning_rate": 2.9328480436760695e-05,
396
+ "loss": 0.1555,
397
+ "step": 4400
398
+ },
399
+ {
400
+ "epoch": 4.094631483166515,
401
+ "grad_norm": 0.984316885471344,
402
+ "learning_rate": 2.9085835608128607e-05,
403
+ "loss": 0.1487,
404
+ "step": 4500
405
+ },
406
+ {
407
+ "epoch": 4.094631483166515,
408
+ "eval_cer": 0.15719713740611643,
409
+ "eval_loss": 0.21014198660850525,
410
+ "eval_runtime": 131.975,
411
+ "eval_samples_per_second": 37.886,
412
+ "eval_steps_per_second": 1.19,
413
+ "eval_wer": 0.3224023889927716,
414
+ "step": 4500
415
+ },
416
+ {
417
+ "epoch": 4.185623293903548,
418
+ "grad_norm": 0.879389762878418,
419
+ "learning_rate": 2.8843190779496515e-05,
420
+ "loss": 0.1384,
421
+ "step": 4600
422
+ },
423
+ {
424
+ "epoch": 4.276615104640582,
425
+ "grad_norm": 1.0293374061584473,
426
+ "learning_rate": 2.8600545950864426e-05,
427
+ "loss": 0.1419,
428
+ "step": 4700
429
+ },
430
+ {
431
+ "epoch": 4.367606915377616,
432
+ "grad_norm": 0.7723912596702576,
433
+ "learning_rate": 2.8357901122232334e-05,
434
+ "loss": 0.142,
435
+ "step": 4800
436
+ },
437
+ {
438
+ "epoch": 4.45859872611465,
439
+ "grad_norm": 1.056503176689148,
440
+ "learning_rate": 2.8115256293600245e-05,
441
+ "loss": 0.1492,
442
+ "step": 4900
443
+ },
444
+ {
445
+ "epoch": 4.549590536851683,
446
+ "grad_norm": 1.1004029512405396,
447
+ "learning_rate": 2.7872611464968153e-05,
448
+ "loss": 0.1501,
449
+ "step": 5000
450
+ },
451
+ {
452
+ "epoch": 4.549590536851683,
453
+ "eval_cer": 0.1563116331114206,
454
+ "eval_loss": 0.2110494077205658,
455
+ "eval_runtime": 131.8894,
456
+ "eval_samples_per_second": 37.911,
457
+ "eval_steps_per_second": 1.19,
458
+ "eval_wer": 0.3176020764142781,
459
+ "step": 5000
460
+ },
461
+ {
462
+ "epoch": 4.640582347588717,
463
+ "grad_norm": 1.6329824924468994,
464
+ "learning_rate": 2.7629966636336068e-05,
465
+ "loss": 0.1398,
466
+ "step": 5100
467
+ },
468
+ {
469
+ "epoch": 4.731574158325751,
470
+ "grad_norm": 1.094870924949646,
471
+ "learning_rate": 2.7387321807703976e-05,
472
+ "loss": 0.1382,
473
+ "step": 5200
474
+ },
475
+ {
476
+ "epoch": 4.822565969062785,
477
+ "grad_norm": 1.094946026802063,
478
+ "learning_rate": 2.7144676979071888e-05,
479
+ "loss": 0.1471,
480
+ "step": 5300
481
+ },
482
+ {
483
+ "epoch": 4.9135577797998184,
484
+ "grad_norm": 1.1323785781860352,
485
+ "learning_rate": 2.6902032150439796e-05,
486
+ "loss": 0.1487,
487
+ "step": 5400
488
+ },
489
+ {
490
+ "epoch": 5.004549590536851,
491
+ "grad_norm": 1.0509577989578247,
492
+ "learning_rate": 2.6659387321807707e-05,
493
+ "loss": 0.1486,
494
+ "step": 5500
495
+ },
496
+ {
497
+ "epoch": 5.004549590536851,
498
+ "eval_cer": 0.15565957994896276,
499
+ "eval_loss": 0.20399489998817444,
500
+ "eval_runtime": 131.3537,
501
+ "eval_samples_per_second": 38.065,
502
+ "eval_steps_per_second": 1.195,
503
+ "eval_wer": 0.3159275487706176,
504
+ "step": 5500
505
+ },
506
+ {
507
+ "epoch": 5.095541401273885,
508
+ "grad_norm": 1.0751926898956299,
509
+ "learning_rate": 2.6416742493175615e-05,
510
+ "loss": 0.1311,
511
+ "step": 5600
512
+ },
513
+ {
514
+ "epoch": 5.186533212010919,
515
+ "grad_norm": 1.638108730316162,
516
+ "learning_rate": 2.6174097664543526e-05,
517
+ "loss": 0.1437,
518
+ "step": 5700
519
+ },
520
+ {
521
+ "epoch": 5.277525022747953,
522
+ "grad_norm": 0.8366700410842896,
523
+ "learning_rate": 2.5931452835911434e-05,
524
+ "loss": 0.1334,
525
+ "step": 5800
526
+ },
527
+ {
528
+ "epoch": 5.368516833484986,
529
+ "grad_norm": 0.9168310761451721,
530
+ "learning_rate": 2.568880800727935e-05,
531
+ "loss": 0.1291,
532
+ "step": 5900
533
+ },
534
+ {
535
+ "epoch": 5.45950864422202,
536
+ "grad_norm": 0.8272154331207275,
537
+ "learning_rate": 2.5446163178647257e-05,
538
+ "loss": 0.1342,
539
+ "step": 6000
540
+ },
541
+ {
542
+ "epoch": 5.45950864422202,
543
+ "eval_cer": 0.15511217729405988,
544
+ "eval_loss": 0.20407769083976746,
545
+ "eval_runtime": 130.9722,
546
+ "eval_samples_per_second": 38.176,
547
+ "eval_steps_per_second": 1.199,
548
+ "eval_wer": 0.31436465630320115,
549
+ "step": 6000
550
+ },
551
+ {
552
+ "epoch": 5.550500454959054,
553
+ "grad_norm": 0.7047191262245178,
554
+ "learning_rate": 2.520351835001517e-05,
555
+ "loss": 0.131,
556
+ "step": 6100
557
+ },
558
+ {
559
+ "epoch": 5.641492265696087,
560
+ "grad_norm": 1.3190622329711914,
561
+ "learning_rate": 2.4960873521383077e-05,
562
+ "loss": 0.1335,
563
+ "step": 6200
564
+ },
565
+ {
566
+ "epoch": 5.732484076433121,
567
+ "grad_norm": 1.6135904788970947,
568
+ "learning_rate": 2.472065514103731e-05,
569
+ "loss": 0.1352,
570
+ "step": 6300
571
+ },
572
+ {
573
+ "epoch": 5.823475887170154,
574
+ "grad_norm": 1.2571635246276855,
575
+ "learning_rate": 2.447801031240522e-05,
576
+ "loss": 0.1405,
577
+ "step": 6400
578
+ },
579
+ {
580
+ "epoch": 5.914467697907188,
581
+ "grad_norm": 1.103220820426941,
582
+ "learning_rate": 2.423536548377313e-05,
583
+ "loss": 0.1396,
584
+ "step": 6500
585
+ },
586
+ {
587
+ "epoch": 5.914467697907188,
588
+ "eval_cer": 0.15522085282113618,
589
+ "eval_loss": 0.20565420389175415,
590
+ "eval_runtime": 131.5948,
591
+ "eval_samples_per_second": 37.995,
592
+ "eval_steps_per_second": 1.193,
593
+ "eval_wer": 0.31433674750914015,
594
+ "step": 6500
595
+ },
596
+ {
597
+ "epoch": 6.005459508644222,
598
+ "grad_norm": 1.2131693363189697,
599
+ "learning_rate": 2.399272065514104e-05,
600
+ "loss": 0.1379,
601
+ "step": 6600
602
+ },
603
+ {
604
+ "epoch": 6.096451319381256,
605
+ "grad_norm": 0.8314226865768433,
606
+ "learning_rate": 2.375007582650895e-05,
607
+ "loss": 0.1289,
608
+ "step": 6700
609
+ },
610
+ {
611
+ "epoch": 6.1874431301182895,
612
+ "grad_norm": 0.929862380027771,
613
+ "learning_rate": 2.350743099787686e-05,
614
+ "loss": 0.1291,
615
+ "step": 6800
616
+ },
617
+ {
618
+ "epoch": 6.278434940855323,
619
+ "grad_norm": 1.2670739889144897,
620
+ "learning_rate": 2.326478616924477e-05,
621
+ "loss": 0.1349,
622
+ "step": 6900
623
+ },
624
+ {
625
+ "epoch": 6.369426751592357,
626
+ "grad_norm": 0.979325532913208,
627
+ "learning_rate": 2.3022141340612677e-05,
628
+ "loss": 0.136,
629
+ "step": 7000
630
+ },
631
+ {
632
+ "epoch": 6.369426751592357,
633
+ "eval_cer": 0.15454464954155028,
634
+ "eval_loss": 0.20977585017681122,
635
+ "eval_runtime": 132.5521,
636
+ "eval_samples_per_second": 37.721,
637
+ "eval_steps_per_second": 1.184,
638
+ "eval_wer": 0.3130529429823337,
639
+ "step": 7000
640
+ },
641
+ {
642
+ "epoch": 6.460418562329391,
643
+ "grad_norm": 1.1290801763534546,
644
+ "learning_rate": 2.277949651198059e-05,
645
+ "loss": 0.1419,
646
+ "step": 7100
647
+ },
648
+ {
649
+ "epoch": 6.551410373066424,
650
+ "grad_norm": 0.8063333034515381,
651
+ "learning_rate": 2.2536851683348503e-05,
652
+ "loss": 0.1251,
653
+ "step": 7200
654
+ },
655
+ {
656
+ "epoch": 6.6424021838034575,
657
+ "grad_norm": 0.7427828907966614,
658
+ "learning_rate": 2.229420685471641e-05,
659
+ "loss": 0.137,
660
+ "step": 7300
661
+ },
662
+ {
663
+ "epoch": 6.733393994540491,
664
+ "grad_norm": 0.6900395154953003,
665
+ "learning_rate": 2.2051562026084322e-05,
666
+ "loss": 0.1359,
667
+ "step": 7400
668
+ },
669
+ {
670
+ "epoch": 6.824385805277525,
671
+ "grad_norm": 1.7373096942901611,
672
+ "learning_rate": 2.180891719745223e-05,
673
+ "loss": 0.1266,
674
+ "step": 7500
675
+ },
676
+ {
677
+ "epoch": 6.824385805277525,
678
+ "eval_cer": 0.15423472303840674,
679
+ "eval_loss": 0.2094658762216568,
680
+ "eval_runtime": 132.3336,
681
+ "eval_samples_per_second": 37.783,
682
+ "eval_steps_per_second": 1.186,
683
+ "eval_wer": 0.310596969104965,
684
+ "step": 7500
685
+ },
686
+ {
687
+ "epoch": 6.915377616014559,
688
+ "grad_norm": 0.7071816921234131,
689
+ "learning_rate": 2.156627236882014e-05,
690
+ "loss": 0.1267,
691
+ "step": 7600
692
+ },
693
+ {
694
+ "epoch": 7.006369426751593,
695
+ "grad_norm": 1.2711293697357178,
696
+ "learning_rate": 2.132362754018805e-05,
697
+ "loss": 0.1284,
698
+ "step": 7700
699
+ },
700
+ {
701
+ "epoch": 7.097361237488626,
702
+ "grad_norm": 0.9852485656738281,
703
+ "learning_rate": 2.108098271155596e-05,
704
+ "loss": 0.1223,
705
+ "step": 7800
706
+ },
707
+ {
708
+ "epoch": 7.188353048225659,
709
+ "grad_norm": 1.4875195026397705,
710
+ "learning_rate": 2.0838337882923872e-05,
711
+ "loss": 0.1273,
712
+ "step": 7900
713
+ },
714
+ {
715
+ "epoch": 7.279344858962693,
716
+ "grad_norm": 0.8052563071250916,
717
+ "learning_rate": 2.0595693054291784e-05,
718
+ "loss": 0.1283,
719
+ "step": 8000
720
+ },
721
+ {
722
+ "epoch": 7.279344858962693,
723
+ "eval_cer": 0.1537718457934521,
724
+ "eval_loss": 0.2159704566001892,
725
+ "eval_runtime": 132.3477,
726
+ "eval_samples_per_second": 37.779,
727
+ "eval_steps_per_second": 1.186,
728
+ "eval_wer": 0.3085317183444503,
729
+ "step": 8000
730
+ },
731
+ {
732
+ "epoch": 7.370336669699727,
733
+ "grad_norm": 2.028435230255127,
734
+ "learning_rate": 2.0353048225659692e-05,
735
+ "loss": 0.1346,
736
+ "step": 8100
737
+ },
738
+ {
739
+ "epoch": 7.461328480436761,
740
+ "grad_norm": 0.8297127485275269,
741
+ "learning_rate": 2.0110403397027603e-05,
742
+ "loss": 0.1235,
743
+ "step": 8200
744
+ },
745
+ {
746
+ "epoch": 7.552320291173794,
747
+ "grad_norm": 1.299826741218567,
748
+ "learning_rate": 1.986775856839551e-05,
749
+ "loss": 0.1243,
750
+ "step": 8300
751
+ },
752
+ {
753
+ "epoch": 7.643312101910828,
754
+ "grad_norm": 2.9055380821228027,
755
+ "learning_rate": 1.9625113739763423e-05,
756
+ "loss": 0.1199,
757
+ "step": 8400
758
+ },
759
+ {
760
+ "epoch": 7.734303912647862,
761
+ "grad_norm": 1.5928077697753906,
762
+ "learning_rate": 1.9382468911131334e-05,
763
+ "loss": 0.1229,
764
+ "step": 8500
765
+ },
766
+ {
767
+ "epoch": 7.734303912647862,
768
+ "eval_cer": 0.1537919708910588,
769
+ "eval_loss": 0.21750004589557648,
770
+ "eval_runtime": 131.9305,
771
+ "eval_samples_per_second": 37.899,
772
+ "eval_steps_per_second": 1.19,
773
+ "eval_wer": 0.30755491055231504,
774
+ "step": 8500
775
+ },
776
+ {
777
+ "epoch": 7.825295723384896,
778
+ "grad_norm": 1.1604926586151123,
779
+ "learning_rate": 1.9139824082499242e-05,
780
+ "loss": 0.1237,
781
+ "step": 8600
782
+ },
783
+ {
784
+ "epoch": 7.916287534121929,
785
+ "grad_norm": 2.2635440826416016,
786
+ "learning_rate": 1.8897179253867153e-05,
787
+ "loss": 0.1248,
788
+ "step": 8700
789
+ },
790
+ {
791
+ "epoch": 8.007279344858963,
792
+ "grad_norm": 0.6642023324966431,
793
+ "learning_rate": 1.8654534425235065e-05,
794
+ "loss": 0.1276,
795
+ "step": 8800
796
+ },
797
+ {
798
+ "epoch": 8.098271155595997,
799
+ "grad_norm": 2.0704445838928223,
800
+ "learning_rate": 1.8411889596602973e-05,
801
+ "loss": 0.1124,
802
+ "step": 8900
803
+ },
804
+ {
805
+ "epoch": 8.18926296633303,
806
+ "grad_norm": 0.8484503030776978,
807
+ "learning_rate": 1.8169244767970884e-05,
808
+ "loss": 0.1267,
809
+ "step": 9000
810
+ },
811
+ {
812
+ "epoch": 8.18926296633303,
813
+ "eval_cer": 0.15306746737721677,
814
+ "eval_loss": 0.21141663193702698,
815
+ "eval_runtime": 132.5307,
816
+ "eval_samples_per_second": 37.727,
817
+ "eval_steps_per_second": 1.185,
818
+ "eval_wer": 0.30568502135022746,
819
+ "step": 9000
820
+ },
821
+ {
822
+ "epoch": 8.280254777070065,
823
+ "grad_norm": 0.8810114860534668,
824
+ "learning_rate": 1.7926599939338792e-05,
825
+ "loss": 0.1191,
826
+ "step": 9100
827
+ },
828
+ {
829
+ "epoch": 8.371246587807097,
830
+ "grad_norm": 0.343791663646698,
831
+ "learning_rate": 1.7686381558993026e-05,
832
+ "loss": 0.1135,
833
+ "step": 9200
834
+ },
835
+ {
836
+ "epoch": 8.46223839854413,
837
+ "grad_norm": 1.2659159898757935,
838
+ "learning_rate": 1.7443736730360937e-05,
839
+ "loss": 0.1215,
840
+ "step": 9300
841
+ },
842
+ {
843
+ "epoch": 8.553230209281164,
844
+ "grad_norm": 2.19396710395813,
845
+ "learning_rate": 1.7201091901728845e-05,
846
+ "loss": 0.1228,
847
+ "step": 9400
848
+ },
849
+ {
850
+ "epoch": 8.644222020018198,
851
+ "grad_norm": 0.6617141366004944,
852
+ "learning_rate": 1.6958447073096757e-05,
853
+ "loss": 0.1127,
854
+ "step": 9500
855
+ },
856
+ {
857
+ "epoch": 8.644222020018198,
858
+ "eval_cer": 0.1528179161668934,
859
+ "eval_loss": 0.20628662407398224,
860
+ "eval_runtime": 131.7465,
861
+ "eval_samples_per_second": 37.952,
862
+ "eval_steps_per_second": 1.192,
863
+ "eval_wer": 0.3068571907007898,
864
+ "step": 9500
865
+ },
866
+ {
867
+ "epoch": 8.735213830755232,
868
+ "grad_norm": 1.5183671712875366,
869
+ "learning_rate": 1.6715802244464668e-05,
870
+ "loss": 0.1253,
871
+ "step": 9600
872
+ },
873
+ {
874
+ "epoch": 8.826205641492265,
875
+ "grad_norm": 0.6270197629928589,
876
+ "learning_rate": 1.6473157415832576e-05,
877
+ "loss": 0.1245,
878
+ "step": 9700
879
+ },
880
+ {
881
+ "epoch": 8.9171974522293,
882
+ "grad_norm": 0.7786601185798645,
883
+ "learning_rate": 1.6230512587200488e-05,
884
+ "loss": 0.1268,
885
+ "step": 9800
886
+ },
887
+ {
888
+ "epoch": 9.008189262966333,
889
+ "grad_norm": 1.2779630422592163,
890
+ "learning_rate": 1.5987867758568396e-05,
891
+ "loss": 0.1285,
892
+ "step": 9900
893
+ },
894
+ {
895
+ "epoch": 9.099181073703367,
896
+ "grad_norm": 0.8640280365943909,
897
+ "learning_rate": 1.5745222929936307e-05,
898
+ "loss": 0.1165,
899
+ "step": 10000
900
+ },
901
+ {
902
+ "epoch": 9.099181073703367,
903
+ "eval_cer": 0.15322041811902787,
904
+ "eval_loss": 0.20939494669437408,
905
+ "eval_runtime": 130.9926,
906
+ "eval_samples_per_second": 38.17,
907
+ "eval_steps_per_second": 1.199,
908
+ "eval_wer": 0.3048477575283972,
909
+ "step": 10000
910
+ },
911
+ {
912
+ "epoch": 9.1901728844404,
913
+ "grad_norm": 0.9638440012931824,
914
+ "learning_rate": 1.550257810130422e-05,
915
+ "loss": 0.1076,
916
+ "step": 10100
917
+ },
918
+ {
919
+ "epoch": 9.281164695177434,
920
+ "grad_norm": 0.8095070719718933,
921
+ "learning_rate": 1.526235972095845e-05,
922
+ "loss": 0.1268,
923
+ "step": 10200
924
+ },
925
+ {
926
+ "epoch": 9.372156505914468,
927
+ "grad_norm": 0.8781161308288574,
928
+ "learning_rate": 1.5019714892326358e-05,
929
+ "loss": 0.1207,
930
+ "step": 10300
931
+ },
932
+ {
933
+ "epoch": 9.463148316651502,
934
+ "grad_norm": 1.6324824094772339,
935
+ "learning_rate": 1.4777070063694268e-05,
936
+ "loss": 0.1256,
937
+ "step": 10400
938
+ },
939
+ {
940
+ "epoch": 9.554140127388536,
941
+ "grad_norm": 0.8956096172332764,
942
+ "learning_rate": 1.453442523506218e-05,
943
+ "loss": 0.1222,
944
+ "step": 10500
945
+ },
946
+ {
947
+ "epoch": 9.554140127388536,
948
+ "eval_cer": 0.1532043180409425,
949
+ "eval_loss": 0.20789633691310883,
950
+ "eval_runtime": 131.5404,
951
+ "eval_samples_per_second": 38.011,
952
+ "eval_steps_per_second": 1.194,
953
+ "eval_wer": 0.3066897379364238,
954
+ "step": 10500
955
+ },
956
+ {
957
+ "epoch": 9.64513193812557,
958
+ "grad_norm": 1.19681978225708,
959
+ "learning_rate": 1.4291780406430089e-05,
960
+ "loss": 0.1197,
961
+ "step": 10600
962
+ },
963
+ {
964
+ "epoch": 9.736123748862603,
965
+ "grad_norm": 5.549036026000977,
966
+ "learning_rate": 1.4049135577797999e-05,
967
+ "loss": 0.1128,
968
+ "step": 10700
969
+ },
970
+ {
971
+ "epoch": 9.827115559599637,
972
+ "grad_norm": 0.7990231513977051,
973
+ "learning_rate": 1.380649074916591e-05,
974
+ "loss": 0.1164,
975
+ "step": 10800
976
+ },
977
+ {
978
+ "epoch": 9.918107370336669,
979
+ "grad_norm": 0.9332329630851746,
980
+ "learning_rate": 1.356384592053382e-05,
981
+ "loss": 0.1236,
982
+ "step": 10900
983
+ },
984
+ {
985
+ "epoch": 10.009099181073703,
986
+ "grad_norm": 1.4885659217834473,
987
+ "learning_rate": 1.332120109190173e-05,
988
+ "loss": 0.1127,
989
+ "step": 11000
990
+ },
991
+ {
992
+ "epoch": 10.009099181073703,
993
+ "eval_cer": 0.15307551741625947,
994
+ "eval_loss": 0.20891791582107544,
995
+ "eval_runtime": 131.2946,
996
+ "eval_samples_per_second": 38.082,
997
+ "eval_steps_per_second": 1.196,
998
+ "eval_wer": 0.30557338617398344,
999
+ "step": 11000
1000
+ },
1001
+ {
1002
+ "epoch": 10.100090991810736,
1003
+ "grad_norm": 0.5533010959625244,
1004
+ "learning_rate": 1.307855626326964e-05,
1005
+ "loss": 0.1158,
1006
+ "step": 11100
1007
+ },
1008
+ {
1009
+ "epoch": 10.19108280254777,
1010
+ "grad_norm": 2.3450381755828857,
1011
+ "learning_rate": 1.283591143463755e-05,
1012
+ "loss": 0.1136,
1013
+ "step": 11200
1014
+ },
1015
+ {
1016
+ "epoch": 10.282074613284804,
1017
+ "grad_norm": 0.6852346062660217,
1018
+ "learning_rate": 1.259326660600546e-05,
1019
+ "loss": 0.1296,
1020
+ "step": 11300
1021
+ },
1022
+ {
1023
+ "epoch": 10.373066424021838,
1024
+ "grad_norm": 0.872776985168457,
1025
+ "learning_rate": 1.235062177737337e-05,
1026
+ "loss": 0.1069,
1027
+ "step": 11400
1028
+ },
1029
+ {
1030
+ "epoch": 10.464058234758872,
1031
+ "grad_norm": 0.547275722026825,
1032
+ "learning_rate": 1.210797694874128e-05,
1033
+ "loss": 0.1084,
1034
+ "step": 11500
1035
+ },
1036
+ {
1037
+ "epoch": 10.464058234758872,
1038
+ "eval_cer": 0.15258044001513407,
1039
+ "eval_loss": 0.2116706520318985,
1040
+ "eval_runtime": 131.7531,
1041
+ "eval_samples_per_second": 37.95,
1042
+ "eval_steps_per_second": 1.192,
1043
+ "eval_wer": 0.30317322988473666,
1044
+ "step": 11500
1045
+ },
1046
+ {
1047
+ "epoch": 10.555050045495905,
1048
+ "grad_norm": 1.4432127475738525,
1049
+ "learning_rate": 1.1865332120109191e-05,
1050
+ "loss": 0.1237,
1051
+ "step": 11600
1052
+ },
1053
+ {
1054
+ "epoch": 10.646041856232939,
1055
+ "grad_norm": 0.5424668192863464,
1056
+ "learning_rate": 1.1622687291477101e-05,
1057
+ "loss": 0.116,
1058
+ "step": 11700
1059
+ },
1060
+ {
1061
+ "epoch": 10.737033666969973,
1062
+ "grad_norm": 0.6486382484436035,
1063
+ "learning_rate": 1.138004246284501e-05,
1064
+ "loss": 0.1097,
1065
+ "step": 11800
1066
+ },
1067
+ {
1068
+ "epoch": 10.828025477707007,
1069
+ "grad_norm": 1.37655770778656,
1070
+ "learning_rate": 1.113739763421292e-05,
1071
+ "loss": 0.1026,
1072
+ "step": 11900
1073
+ },
1074
+ {
1075
+ "epoch": 10.91901728844404,
1076
+ "grad_norm": 0.7191163897514343,
1077
+ "learning_rate": 1.0894752805580833e-05,
1078
+ "loss": 0.1155,
1079
+ "step": 12000
1080
+ },
1081
+ {
1082
+ "epoch": 10.91901728844404,
1083
+ "eval_cer": 0.15271326565933846,
1084
+ "eval_loss": 0.20751111209392548,
1085
+ "eval_runtime": 131.6857,
1086
+ "eval_samples_per_second": 37.969,
1087
+ "eval_steps_per_second": 1.192,
1088
+ "eval_wer": 0.3045407607937261,
1089
+ "step": 12000
1090
+ },
1091
+ {
1092
+ "epoch": 11.010009099181074,
1093
+ "grad_norm": 1.8931193351745605,
1094
+ "learning_rate": 1.0652107976948743e-05,
1095
+ "loss": 0.1154,
1096
+ "step": 12100
1097
+ },
1098
+ {
1099
+ "epoch": 11.101000909918108,
1100
+ "grad_norm": 0.43597936630249023,
1101
+ "learning_rate": 1.0409463148316651e-05,
1102
+ "loss": 0.0999,
1103
+ "step": 12200
1104
+ },
1105
+ {
1106
+ "epoch": 11.191992720655142,
1107
+ "grad_norm": 1.1339422464370728,
1108
+ "learning_rate": 1.0166818319684561e-05,
1109
+ "loss": 0.1107,
1110
+ "step": 12300
1111
+ },
1112
+ {
1113
+ "epoch": 11.282984531392175,
1114
+ "grad_norm": 0.9059270620346069,
1115
+ "learning_rate": 9.924173491052472e-06,
1116
+ "loss": 0.1045,
1117
+ "step": 12400
1118
+ },
1119
+ {
1120
+ "epoch": 11.373976342129207,
1121
+ "grad_norm": 0.8777015209197998,
1122
+ "learning_rate": 9.681528662420384e-06,
1123
+ "loss": 0.0955,
1124
+ "step": 12500
1125
+ },
1126
+ {
1127
+ "epoch": 11.373976342129207,
1128
+ "eval_cer": 0.15231076370720398,
1129
+ "eval_loss": 0.21829599142074585,
1130
+ "eval_runtime": 131.8689,
1131
+ "eval_samples_per_second": 37.916,
1132
+ "eval_steps_per_second": 1.191,
1133
+ "eval_wer": 0.3025871452094555,
1134
+ "step": 12500
1135
+ },
1136
+ {
1137
+ "epoch": 11.464968152866241,
1138
+ "grad_norm": 0.7121880650520325,
1139
+ "learning_rate": 9.438883833788293e-06,
1140
+ "loss": 0.1068,
1141
+ "step": 12600
1142
+ },
1143
+ {
1144
+ "epoch": 11.555959963603275,
1145
+ "grad_norm": 1.0068190097808838,
1146
+ "learning_rate": 9.196239005156203e-06,
1147
+ "loss": 0.1047,
1148
+ "step": 12700
1149
+ },
1150
+ {
1151
+ "epoch": 11.646951774340309,
1152
+ "grad_norm": 1.2295094728469849,
1153
+ "learning_rate": 8.953594176524115e-06,
1154
+ "loss": 0.1041,
1155
+ "step": 12800
1156
+ },
1157
+ {
1158
+ "epoch": 11.737943585077343,
1159
+ "grad_norm": 1.1067237854003906,
1160
+ "learning_rate": 8.710949347892024e-06,
1161
+ "loss": 0.1071,
1162
+ "step": 12900
1163
+ },
1164
+ {
1165
+ "epoch": 11.828935395814376,
1166
+ "grad_norm": 0.9286106824874878,
1167
+ "learning_rate": 8.468304519259934e-06,
1168
+ "loss": 0.1146,
1169
+ "step": 13000
1170
+ },
1171
+ {
1172
+ "epoch": 11.828935395814376,
1173
+ "eval_cer": 0.15205316245783793,
1174
+ "eval_loss": 0.21160683035850525,
1175
+ "eval_runtime": 131.9777,
1176
+ "eval_samples_per_second": 37.885,
1177
+ "eval_steps_per_second": 1.19,
1178
+ "eval_wer": 0.30147079344701516,
1179
+ "step": 13000
1180
+ },
1181
+ {
1182
+ "epoch": 11.91992720655141,
1183
+ "grad_norm": 1.0843188762664795,
1184
+ "learning_rate": 8.225659690627844e-06,
1185
+ "loss": 0.1099,
1186
+ "step": 13100
1187
+ },
1188
+ {
1189
+ "epoch": 12.010919017288444,
1190
+ "grad_norm": 0.8679298162460327,
1191
+ "learning_rate": 7.983014861995755e-06,
1192
+ "loss": 0.1076,
1193
+ "step": 13200
1194
+ },
1195
+ {
1196
+ "epoch": 12.101910828025478,
1197
+ "grad_norm": 1.5552619695663452,
1198
+ "learning_rate": 7.742796481649985e-06,
1199
+ "loss": 0.1167,
1200
+ "step": 13300
1201
+ },
1202
+ {
1203
+ "epoch": 12.192902638762511,
1204
+ "grad_norm": 1.5181180238723755,
1205
+ "learning_rate": 7.500151653017895e-06,
1206
+ "loss": 0.1096,
1207
+ "step": 13400
1208
+ },
1209
+ {
1210
+ "epoch": 12.283894449499545,
1211
+ "grad_norm": 0.6448826789855957,
1212
+ "learning_rate": 7.257506824385806e-06,
1213
+ "loss": 0.1094,
1214
+ "step": 13500
1215
+ },
1216
+ {
1217
+ "epoch": 12.283894449499545,
1218
+ "eval_cer": 0.15156613509575523,
1219
+ "eval_loss": 0.2089649885892868,
1220
+ "eval_runtime": 131.4101,
1221
+ "eval_samples_per_second": 38.049,
1222
+ "eval_steps_per_second": 1.195,
1223
+ "eval_wer": 0.2992939075102565,
1224
+ "step": 13500
1225
+ },
1226
+ {
1227
+ "epoch": 12.374886260236579,
1228
+ "grad_norm": 0.68089759349823,
1229
+ "learning_rate": 7.014861995753716e-06,
1230
+ "loss": 0.1042,
1231
+ "step": 13600
1232
+ },
1233
+ {
1234
+ "epoch": 12.465878070973613,
1235
+ "grad_norm": 0.5364871025085449,
1236
+ "learning_rate": 6.7722171671216266e-06,
1237
+ "loss": 0.1042,
1238
+ "step": 13700
1239
+ },
1240
+ {
1241
+ "epoch": 12.556869881710647,
1242
+ "grad_norm": 0.9213688969612122,
1243
+ "learning_rate": 6.529572338489536e-06,
1244
+ "loss": 0.1075,
1245
+ "step": 13800
1246
+ },
1247
+ {
1248
+ "epoch": 12.64786169244768,
1249
+ "grad_norm": 0.8983300924301147,
1250
+ "learning_rate": 6.286927509857447e-06,
1251
+ "loss": 0.1085,
1252
+ "step": 13900
1253
+ },
1254
+ {
1255
+ "epoch": 12.738853503184714,
1256
+ "grad_norm": 1.987417459487915,
1257
+ "learning_rate": 6.0442826812253566e-06,
1258
+ "loss": 0.1072,
1259
+ "step": 14000
1260
+ },
1261
+ {
1262
+ "epoch": 12.738853503184714,
1263
+ "eval_cer": 0.151727135876609,
1264
+ "eval_loss": 0.2124236822128296,
1265
+ "eval_runtime": 131.355,
1266
+ "eval_samples_per_second": 38.065,
1267
+ "eval_steps_per_second": 1.195,
1268
+ "eval_wer": 0.3002148977142698,
1269
+ "step": 14000
1270
+ },
1271
+ {
1272
+ "epoch": 12.829845313921748,
1273
+ "grad_norm": 1.3773497343063354,
1274
+ "learning_rate": 5.801637852593267e-06,
1275
+ "loss": 0.1077,
1276
+ "step": 14100
1277
+ },
1278
+ {
1279
+ "epoch": 12.920837124658782,
1280
+ "grad_norm": 1.0006029605865479,
1281
+ "learning_rate": 5.558993023961178e-06,
1282
+ "loss": 0.0996,
1283
+ "step": 14200
1284
+ },
1285
+ {
1286
+ "epoch": 13.011828935395814,
1287
+ "grad_norm": 1.134149193763733,
1288
+ "learning_rate": 5.316348195329087e-06,
1289
+ "loss": 0.1089,
1290
+ "step": 14300
1291
+ },
1292
+ {
1293
+ "epoch": 13.102820746132847,
1294
+ "grad_norm": 1.6549540758132935,
1295
+ "learning_rate": 5.073703366696998e-06,
1296
+ "loss": 0.1143,
1297
+ "step": 14400
1298
+ },
1299
+ {
1300
+ "epoch": 13.193812556869881,
1301
+ "grad_norm": 0.8590063452720642,
1302
+ "learning_rate": 4.831058538064908e-06,
1303
+ "loss": 0.1125,
1304
+ "step": 14500
1305
+ },
1306
+ {
1307
+ "epoch": 13.193812556869881,
1308
+ "eval_cer": 0.15173116089613034,
1309
+ "eval_loss": 0.2130936086177826,
1310
+ "eval_runtime": 131.6874,
1311
+ "eval_samples_per_second": 37.969,
1312
+ "eval_steps_per_second": 1.192,
1313
+ "eval_wer": 0.3000195361558427,
1314
+ "step": 14500
1315
+ },
1316
+ {
1317
+ "epoch": 13.284804367606915,
1318
+ "grad_norm": 0.8164013028144836,
1319
+ "learning_rate": 4.588413709432818e-06,
1320
+ "loss": 0.0982,
1321
+ "step": 14600
1322
+ },
1323
+ {
1324
+ "epoch": 13.375796178343949,
1325
+ "grad_norm": 0.8457829356193542,
1326
+ "learning_rate": 4.345768880800728e-06,
1327
+ "loss": 0.0988,
1328
+ "step": 14700
1329
+ },
1330
+ {
1331
+ "epoch": 13.466787989080983,
1332
+ "grad_norm": 0.952691912651062,
1333
+ "learning_rate": 4.1031240521686385e-06,
1334
+ "loss": 0.1007,
1335
+ "step": 14800
1336
+ },
1337
+ {
1338
+ "epoch": 13.557779799818016,
1339
+ "grad_norm": 0.4639749825000763,
1340
+ "learning_rate": 3.860479223536548e-06,
1341
+ "loss": 0.1088,
1342
+ "step": 14900
1343
+ },
1344
+ {
1345
+ "epoch": 13.64877161055505,
1346
+ "grad_norm": 1.1107044219970703,
1347
+ "learning_rate": 3.6178343949044588e-06,
1348
+ "loss": 0.1058,
1349
+ "step": 15000
1350
+ },
1351
+ {
1352
+ "epoch": 13.64877161055505,
1353
+ "eval_cer": 0.1514735596467643,
1354
+ "eval_loss": 0.217019721865654,
1355
+ "eval_runtime": 132.0061,
1356
+ "eval_samples_per_second": 37.877,
1357
+ "eval_steps_per_second": 1.189,
1358
+ "eval_wer": 0.29923808992213446,
1359
+ "step": 15000
1360
+ },
1361
+ {
1362
+ "epoch": 13.739763421292084,
1363
+ "grad_norm": 1.299850583076477,
1364
+ "learning_rate": 3.375189566272369e-06,
1365
+ "loss": 0.1036,
1366
+ "step": 15100
1367
+ },
1368
+ {
1369
+ "epoch": 13.830755232029118,
1370
+ "grad_norm": 1.5436201095581055,
1371
+ "learning_rate": 3.1325447376402795e-06,
1372
+ "loss": 0.0973,
1373
+ "step": 15200
1374
+ },
1375
+ {
1376
+ "epoch": 13.921747042766151,
1377
+ "grad_norm": 1.0776885747909546,
1378
+ "learning_rate": 2.8898999090081896e-06,
1379
+ "loss": 0.0963,
1380
+ "step": 15300
1381
+ },
1382
+ {
1383
+ "epoch": 14.012738853503185,
1384
+ "grad_norm": 0.9970951080322266,
1385
+ "learning_rate": 2.6472550803760997e-06,
1386
+ "loss": 0.1104,
1387
+ "step": 15400
1388
+ },
1389
+ {
1390
+ "epoch": 14.103730664240219,
1391
+ "grad_norm": 0.5252935886383057,
1392
+ "learning_rate": 2.40461025174401e-06,
1393
+ "loss": 0.0951,
1394
+ "step": 15500
1395
+ },
1396
+ {
1397
+ "epoch": 14.103730664240219,
1398
+ "eval_cer": 0.15132463392447454,
1399
+ "eval_loss": 0.21604645252227783,
1400
+ "eval_runtime": 132.1387,
1401
+ "eval_samples_per_second": 37.839,
1402
+ "eval_steps_per_second": 1.188,
1403
+ "eval_wer": 0.29856827886467024,
1404
+ "step": 15500
1405
+ },
1406
+ {
1407
+ "epoch": 14.194722474977253,
1408
+ "grad_norm": 1.1974434852600098,
1409
+ "learning_rate": 2.16196542311192e-06,
1410
+ "loss": 0.1019,
1411
+ "step": 15600
1412
+ },
1413
+ {
1414
+ "epoch": 14.285714285714286,
1415
+ "grad_norm": 0.5327410697937012,
1416
+ "learning_rate": 1.9193205944798306e-06,
1417
+ "loss": 0.0968,
1418
+ "step": 15700
1419
+ },
1420
+ {
1421
+ "epoch": 14.376706096451318,
1422
+ "grad_norm": 0.8405203819274902,
1423
+ "learning_rate": 1.6766757658477407e-06,
1424
+ "loss": 0.0941,
1425
+ "step": 15800
1426
+ },
1427
+ {
1428
+ "epoch": 14.467697907188352,
1429
+ "grad_norm": 0.38401368260383606,
1430
+ "learning_rate": 1.4340309372156508e-06,
1431
+ "loss": 0.1036,
1432
+ "step": 15900
1433
+ },
1434
+ {
1435
+ "epoch": 14.558689717925386,
1436
+ "grad_norm": 1.1276684999465942,
1437
+ "learning_rate": 1.191386108583561e-06,
1438
+ "loss": 0.1035,
1439
+ "step": 16000
1440
+ },
1441
+ {
1442
+ "epoch": 14.558689717925386,
1443
+ "eval_cer": 0.15134475902208125,
1444
+ "eval_loss": 0.21339672803878784,
1445
+ "eval_runtime": 132.1196,
1446
+ "eval_samples_per_second": 37.844,
1447
+ "eval_steps_per_second": 1.188,
1448
+ "eval_wer": 0.29856827886467024,
1449
+ "step": 16000
1450
+ },
1451
+ {
1452
+ "epoch": 14.64968152866242,
1453
+ "grad_norm": 0.5266655683517456,
1454
+ "learning_rate": 9.487412799514711e-07,
1455
+ "loss": 0.098,
1456
+ "step": 16100
1457
+ },
1458
+ {
1459
+ "epoch": 14.740673339399454,
1460
+ "grad_norm": 0.8043445348739624,
1461
+ "learning_rate": 7.060964513193813e-07,
1462
+ "loss": 0.1005,
1463
+ "step": 16200
1464
+ },
1465
+ {
1466
+ "epoch": 14.831665150136487,
1467
+ "grad_norm": 1.0907572507858276,
1468
+ "learning_rate": 4.6345162268729147e-07,
1469
+ "loss": 0.1023,
1470
+ "step": 16300
1471
+ },
1472
+ {
1473
+ "epoch": 14.922656960873521,
1474
+ "grad_norm": 2.0936965942382812,
1475
+ "learning_rate": 2.2080679405520171e-07,
1476
+ "loss": 0.0963,
1477
+ "step": 16400
1478
+ },
1479
+ {
1480
+ "epoch": 15.0,
1481
+ "step": 16485,
1482
+ "total_flos": 1.21495045308783e+20,
1483
+ "train_loss": 0.13136799893019088,
1484
+ "train_runtime": 24963.2457,
1485
+ "train_samples_per_second": 21.117,
1486
+ "train_steps_per_second": 0.66
1487
+ }
1488
+ ],
1489
+ "logging_steps": 100,
1490
+ "max_steps": 16485,
1491
+ "num_input_tokens_seen": 0,
1492
+ "num_train_epochs": 15,
1493
+ "save_steps": 500,
1494
+ "stateful_callbacks": {
1495
+ "TrainerControl": {
1496
+ "args": {
1497
+ "should_epoch_stop": false,
1498
+ "should_evaluate": false,
1499
+ "should_log": false,
1500
+ "should_save": true,
1501
+ "should_training_stop": true
1502
+ },
1503
+ "attributes": {}
1504
+ }
1505
+ },
1506
+ "total_flos": 1.21495045308783e+20,
1507
+ "train_batch_size": 16,
1508
+ "trial_name": null,
1509
+ "trial_params": null
1510
+ }