Enjoying this one in multi-user chat. + laptop perf
#8
by
BingoBird
- opened
We are enjoying this model in multi-user chat, particularly the relative absence of mind-blottingly dominant 'assistant' or 'personal moral counselor' behavior ruts.
BingoBird
changed discussion title from
Enjoying this one in multi-user chat.
to Enjoying this one in multi-user chat. + laptop perf
On a Thinkpad T495 with 16GB, integrated Vega8 GPU.
Side processes: no browser, no media playback, light perf mon and terminal-based chat clients:
$ llama-bench -m LFM2-8B-A1B-US-Q5_K_XL.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null > myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 0 | 5 | pp512 | 78.87 ± 0.50 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 0 | 5 | tg128 | 14.11 ± 0.12 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 30 | 5 | pp512 | 109.15 ± 0.85 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 30 | 5 | tg128 | 17.09 ± 0.08 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 99 | 5 | pp512 | 108.31 ± 0.73 |
| lfm2moe 8B.A1B Q5_K - Medium | 5.51 GiB | 8.34 B | Vulkan | 99 | 5 | tg128 | 17.08 ± 0.04 |
$ llama-bench -m LFM2-8B-A1B-Q4_K_S.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 0 | 5 | pp512 | 84.27 ± 0.59 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 0 | 5 | tg128 | 17.75 ± 0.09 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 30 | 5 | pp512 | 111.52 ± 0.91 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 30 | 5 | tg128 | 22.48 ± 0.08 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 99 | 5 | pp512 | 111.47 ± 0.95 |
| lfm2moe 8B.A1B Q4_K - Small | 4.42 GiB | 8.34 B | Vulkan | 99 | 5 | tg128 | 22.46 ± 0.08 |
$ llama-bench -m llama-2-7b.Q4_K_M.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 0 | 5 | pp512 | 29.30 ± 0.09 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 0 | 5 | tg128 | 4.96 ± 0.06 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 30 | 5 | pp512 | 30.17 ± 0.24 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 30 | 5 | tg128 | 4.80 ± 0.01 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 99 | 5 | pp512 | 30.57 ± 0.03 |
| llama 7B Q4_K - Medium | 3.80 GiB | 6.74 B | Vulkan | 99 | 5 | tg128 | 5.08 ± 0.01 |
$ llama-bench -m Qwen3-4B-Instruct-2507-Q4_K_M.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 0 | 5 | pp512 | 50.29 ± 0.28 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 0 | 5 | tg128 | 5.65 ± 0.15 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 30 | 5 | pp512 | 50.94 ± 0.09 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 30 | 5 | tg128 | 6.78 ± 0.03 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 99 | 5 | pp512 | 53.77 ± 0.21 |
| qwen3 4B Q4_K - Medium | 2.32 GiB | 4.02 B | Vulkan | 99 | 5 | tg128 | 7.55 ± 0.01 |
$ llama-bench -m granite-4.0-h-tiny-Q4_K_M.gguf -t 5 -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt
| model | size | params | backend | ngl | threads | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 0 | 5 | pp512 | 61.95 ± 1.04 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 0 | 5 | tg128 | 8.63 ± 0.04 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 30 | 5 | pp512 | 47.91 ± 0.17 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 30 | 5 | tg128 | 11.26 ± 0.13 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 99 | 5 | pp512 | 90.32 ± 2.17 |
| granitehybrid 1B Q4_K - Medium | 3.96 GiB | 6.94 B | Vulkan | 99 | 5 | tg128 | 13.36 ± 0.05 |
build: e1f15b454 (7502)
This MoE model is in another league compared to other models runnable on this laptop.
It's a truly great gift to everyone. Thank you.
Awesome, thanks a lot for your message! We're working on new models with the LFM2.5 generation. I hope you'll like it. :)
I would love to see a scale-up from this model with more mid/late attention layers. I think there's a lot of room for powerful edge sMoEs between 12-24B parameters.