Enjoying this one in multi-user chat. + laptop perf

#8
by BingoBird - opened

We are enjoying this model in multi-user chat, particularly the relative absence of mind-blottingly dominant 'assistant' or 'personal moral counselor' behavior ruts.

BingoBird changed discussion title from Enjoying this one in multi-user chat. to Enjoying this one in multi-user chat. + laptop perf

On a Thinkpad T495 with 16GB, integrated Vega8 GPU.
Side processes: no browser, no media playback, light perf mon and terminal-based chat clients:

$ llama-bench -m LFM2-8B-A1B-US-Q5_K_XL.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null > myresults.txt
| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |   0 |       5 |           pp512 |         78.87 ± 0.50 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |   0 |       5 |           tg128 |         14.11 ± 0.12 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  30 |       5 |           pp512 |        109.15 ± 0.85 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  30 |       5 |           tg128 |         17.09 ± 0.08 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  99 |       5 |           pp512 |        108.31 ± 0.73 |
| lfm2moe 8B.A1B Q5_K - Medium   |   5.51 GiB |     8.34 B | Vulkan     |  99 |       5 |           tg128 |         17.08 ± 0.04 |

$ llama-bench -m LFM2-8B-A1B-Q4_K_S.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |   0 |       5 |           pp512 |         84.27 ± 0.59 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |   0 |       5 |           tg128 |         17.75 ± 0.09 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  30 |       5 |           pp512 |        111.52 ± 0.91 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  30 |       5 |           tg128 |         22.48 ± 0.08 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  99 |       5 |           pp512 |        111.47 ± 0.95 |
| lfm2moe 8B.A1B Q4_K - Small    |   4.42 GiB |     8.34 B | Vulkan     |  99 |       5 |           tg128 |         22.46 ± 0.08 |

$ llama-bench -m llama-2-7b.Q4_K_M.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |   0 |       5 |           pp512 |         29.30 ± 0.09 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |   0 |       5 |           tg128 |          4.96 ± 0.06 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  30 |       5 |           pp512 |         30.17 ± 0.24 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  30 |       5 |           tg128 |          4.80 ± 0.01 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  99 |       5 |           pp512 |         30.57 ± 0.03 |
| llama 7B Q4_K - Medium         |   3.80 GiB |     6.74 B | Vulkan     |  99 |       5 |           tg128 |          5.08 ± 0.01 |

$ llama-bench -m Qwen3-4B-Instruct-2507-Q4_K_M.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |   0 |       5 |           pp512 |         50.29 ± 0.28 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |   0 |       5 |           tg128 |          5.65 ± 0.15 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  30 |       5 |           pp512 |         50.94 ± 0.09 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  30 |       5 |           tg128 |          6.78 ± 0.03 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  99 |       5 |           pp512 |         53.77 ± 0.21 |
| qwen3 4B Q4_K - Medium         |   2.32 GiB |     4.02 B | Vulkan     |  99 |       5 |           tg128 |          7.55 ± 0.01 |

$ llama-bench -m granite-4.0-h-tiny-Q4_K_M.gguf -t 5  -p 512 -n 128 -ngl 0,30,99 2> /dev/null >> myresults.txt

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |   0 |       5 |           pp512 |         61.95 ± 1.04 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |   0 |       5 |           tg128 |          8.63 ± 0.04 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  30 |       5 |           pp512 |         47.91 ± 0.17 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  30 |       5 |           tg128 |         11.26 ± 0.13 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  99 |       5 |           pp512 |         90.32 ± 2.17 |
| granitehybrid 1B Q4_K - Medium |   3.96 GiB |     6.94 B | Vulkan     |  99 |       5 |           tg128 |         13.36 ± 0.05 |

build: e1f15b454 (7502)

This MoE model is in another league compared to other models runnable on this laptop.

It's a truly great gift to everyone. Thank you.

Liquid AI org

Awesome, thanks a lot for your message! We're working on new models with the LFM2.5 generation. I hope you'll like it. :)

I would love to see a scale-up from this model with more mid/late attention layers. I think there's a lot of room for powerful edge sMoEs between 12-24B parameters.

Sign up or log in to comment