LLM Course

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

เรียนจบเรื่อง tokenizer แล้ว!

เยี่ยมมาก คุณเรียนจบบทนี้แล้ว!

หลังจากที่ได้เรียนเกี่ยวกับ tokenizer อย่างละเอียดแล้ว คุณจะ :

สามารถเทรน tokenizer ตัวใหม่ จาก tokenizer อีกตัวที่มีโครงสร้างอยู่แล้ว
เข้าใจวิธีการใช้ค่า offsets เพื่อ map ตำแหน่งของ token ไปหาค่าช่วงตำแหน่งของมัน(span)ในข้อความหลัก
รู้ความแตกต่างระหว่าง BPE, WordPiece, และ Unigram
สามารถผสมผสานแต่ละเครื่องมือจาก 🤗 Tokenizers library เพื่อสร้าง tokenizer ของคุณเองได้
สามารถนำ tokenizer นั้นไปใช้ใน 🤗 Transformers library ได้