antirez/ds4: DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm is a quite interesting project that enables DeepSeek (specifically Q2 quantizations) to run as efficiently as possible on various UMA notebooks. Supported are MacBook, DGX Spark, and ROCm Strix Halo machines. Quite cool if it works—DeepSeek V4 Flash is already quite good as a base model for various purposes, such as a main model for Hermes.