inferencerlabs
/

archived-GLM-5-MLX-4.8bit

Text Generation

Model card Files Files and versions

GLM-5-MLX-4.8bit

NOTICE - This model has been superseded by GLM5.1-Q4.8-INF: available here

See GLM-5 MLX in action: demonstration video

Originally tested on a M3 Ultra 512GB RAM using Inferencer app

Single inference ~16.6 tokens/s @ 1000 tokens
Batched inference ~31.8 total tokens/s across six inferences
Memory usage: ~417 GiB

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for inferencerlabs/archived-GLM-5-MLX-4.8bit

Base model

Finetuned

(38)

this model