jackJessada commited on
Commit
ecfcae6
·
verified ·
1 Parent(s): d4b5785

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -25,7 +25,7 @@ The training corpus consists of the following datasets:
25
  |----------|-------------|
26
  | Business & Finance | 736,071,807 |
27
  | News | 1,700,662,378 |
28
- | Education | 554,889,778 |
29
  | Social | 211,000,000 |
30
  | Government | 40,492,117 |
31
  | Medical | 42,987,587 |
@@ -34,7 +34,6 @@ The training corpus consists of the following datasets:
34
  | Research Articles | 4,185,649,758 |
35
  | Law | 467,994,847 |
36
  | Travel | 6,948,290 |
37
- | Buddhism | 21,600,000 |
38
  | Others | 4,410,619 |
39
 
40
  *Token counts calculated using Qwen3 Tokenizer
 
25
  |----------|-------------|
26
  | Business & Finance | 736,071,807 |
27
  | News | 1,700,662,378 |
28
+ | Education | 576,489,778 |
29
  | Social | 211,000,000 |
30
  | Government | 40,492,117 |
31
  | Medical | 42,987,587 |
 
34
  | Research Articles | 4,185,649,758 |
35
  | Law | 467,994,847 |
36
  | Travel | 6,948,290 |
 
37
  | Others | 4,410,619 |
38
 
39
  *Token counts calculated using Qwen3 Tokenizer