Adaptive Engram Memory System for Indonesian Language Model: Generative AI Based on TOBA LM for Batak and Minang Language
Paper • 2603.10006 • Published • 3
This TOBA model is a trilingual language model based on GPT-2 architecture with 1.2 billion parameters, trained on a corpus encompassing Indonesian, Batak, and Minangkabau using syllabic-agglutinative tokenization. The architecture integrates an Engram Memory mechanism, an adaptive n-gram-based memory system with a 500,000 x 768 embedding table that captures morphological dependencies through bigram and trigram pathways.
model.safetensors
Install PyTorch first according to your CPU/CUDA environment, then install the repo requirements:
pip install -r requirements.txt
Single prompt, chat mode:
python infer.py --mode chat --prompt "Horas amang inang saluhutna"
Single prompt, completion mode:
python infer.py --mode completion --prompt "Horas amang inang saluhutna"
Interactive:
python infer.py --interactive --mode chat
python infer.py --prompt "Horas!"
python infer.py --mode completion --prompt "Patorang ma aha do dalihan natolu "
python infer.py --interactive