#MLX | Latent Space

Jaesol Shin

Towards observable, reliable, scalable AI

Apple Silicon LLM Inference — Five Backends Compared

Benchmarking Qwen3.5-9B on Apple Silicon across MLX, llama.cpp, Ollama, omlx, and vLLM Metal — single-request throughput, prefill scaling, decode vs input length, and concurrency response

#LLM #Apple Silicon #MLX #llama.cpp

Jaesol Shin

Towards observable, reliable, scalable AI

GitHub