GLM-5-MLX-4.8bit

NOTICE - This model has been superseded by GLM5.1-Q4.8-INF: available here

See GLM-5 MLX in action: demonstration video

Originally tested on a M3 Ultra 512GB RAM using Inferencer app

  • Single inference ~16.6 tokens/s @ 1000 tokens
  • Batched inference ~31.8 total tokens/s across six inferences
  • Memory usage: ~417 GiB
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inferencerlabs/archived-GLM-5-MLX-4.8bit

Base model

zai-org/GLM-5
Finetuned
(38)
this model