June 25, 2026 · AI
Speeding up LLM inference with MTP and diffusion
MTP and diffusion inference on Gemma 4 and Qwen 3.6, fp8 on one H100
AI
한국어June 25, 2026 · AI
MTP and diffusion inference on Gemma 4 and Qwen 3.6, fp8 on one H100
May 27, 2026 · AI
Why spec-driven development is a requirements-inference architecture
May 27, 2026 · AI
Why skills and harnesses overlap in implementation
May 26, 2026 · AI
How harness design determines whether agents actually adapt.
May 22, 2026 · AI
Why agent memory moved from RAG storage to the foundation of policy adaptation
May 22, 2026 · AI
Weights, prompts, and code as parameters at different layers of a learnable policy space
May 21, 2026 · AI
Eight graph engines measured across OLTP, memory, analytics, and differentiation queries
May 21, 2026 · AI
Source-level comparison of RAG-Anything, ApeRAG, and EdgeQuake as LightRAG derivatives
May 20, 2026 · AI
Five PDF-to-Markdown converters (markitdown, pdftotext, pymupdf, mineru, opendataloader-pdf) scored against a seven-criterion 100-point rubric
May 20, 2026 · AI
Resume the same Claude Code session across accounts by pointing every projects directory at one shared physical path
May 20, 2026 · AI
Benchmarking Qwen3.5-9B on Apple Silicon across MLX, llama.cpp, Ollama, omlx, and vLLM Metal — single-request throughput, prefill scaling, decode vs input length, and concurrency response
May 20, 2026 · AI
Implementing LightRAG's BaseGraphStorage on plain PostgreSQL with RCTE — why a 1-hop-dominant retrieval pattern fits flat SQL
May 20, 2026 · AI
A Python SDK over the codex app-server stdio interface — install, first call, thread model, main methods
May 20, 2026 · AI
Run isolated Claude Code accounts on the same Mac with one env var and a small zsh function
May 20, 2026 · AI
NIAH limits, the Lost in the Middle effect, alternative benchmarks, and measured recall across four reasoning-effort modes
May 20, 2026 · AI
NVIDIA's build.nvidia.com offers 100+ models on H100 infrastructure for free. Plug it directly into Claude Code, Cursor, or any OpenAI-compatible coding agent.
May 20, 2026 · AI
Sync matrix, migration steps, common failures, and diagnostic commands for running multiple Claude Code accounts on one machine
May 20, 2026 · AI
On a 1.14M-edge knowledge-graph workload, PostgreSQL RCTE beats Apache AGE by 290x — tracing the cypher() wrapper's 13ms cost and PG plan generation accumulation
May 15, 2026 · AI
I ran 30 trials per configuration across GPT-4, GPT-5, and o-series models using three reasoning problems. gpt-5-nano on minimal scored 4.4%. o1 scored lower than gpt-4o.
May 15, 2026 · AI
How to easily call modern AI models such as GPT-4.1 and DeepSeek R1 through GitHub's free model inference API
May 15, 2026 · AI
An experimental record of implementing web search with OpenAI's GPT-5 Responses API, focused on tool-support differences between models and how parameters shape responses. The analysis centers on web search tool compatibility between gpt-5 and gpt-5-chat-latest.