oMLX — LLM Inference, Optimized for Your Mac

aside llm mac-os-x

oMLX — LLM inference, optimized for your Mac is a bit like my MLX Server, but with a stronger focus on efficient execution and, as a gimmick, a persisted KV cache so that old prefixes can be revived. I could test it on my upcoming MacBook Pro to see if it’s interesting for me; in particular, the example with Qwen 3.5 122B A10B looks good, as that is currently my favorite model on my DGX.