Feasibility of 32B language model base
This model is super promising, given the vision encoder used and the power of the recent qwen 3 models. Very curious what the level of difficulty/time it would be to utilize the Qwen3-32B model as the language model for this architecture. I have to imagine that with the performance of this 8B model, a larger parameter at this architecture/training method would be open source SOTA.
Thanks for opening the community — we’ve received it and have shared it with our algorithm team.
I’m not the best person to answer the details, so I don’t want to give you an inaccurate reply. The team is currently looking into it, and we’ll follow up with a more detailed response as soon as we have an update.
In the meantime, if you can share any extra info below, it will help us speed up the investigation:
model/version
your prompt + expected vs actual output
a minimal reproducible example (sample input; redacted is OK)
logs / request_id
As a thank-you for helping us improve Keye, we can offer:
a small Kwai merch gift (currently shipping within Mainland China only due to shipping/policy restrictions)
early access to upcoming Keye model updates/features
If you’d like either, just let us know (and for merch, you can share the shipping info later after we confirm).
For faster follow-up, you’re also welcome to join our communities:
Discord:https://discord.gg/4Q6AmzxpEK
WeChat ID:seeutomorrowo_O
Sorry again for the delay, and thanks for your patience — we’ll get back to you soon.