AI Posts | Latent Space

한국어

June 25, 2026 · AI

Speeding up LLM inference with MTP and diffusion

MTP and diffusion inference on Gemma 4 and Qwen 3.6, fp8 on one H100

#LLM #vLLM #MTP #Speculative Decoding

May 27, 2026 · AI

Requirements as Latent State

Why spec-driven development is a requirements-inference architecture

#AI #Agent #Requirements #Spec-Driven

May 27, 2026 · AI

Skill and Harness

Why skills and harnesses overlap in implementation

#AI #Agent #Harness #Skills

May 26, 2026 · AI

Harness as Environment

How harness design determines whether agents actually adapt.

#AI #Agent #Harness #Memory

May 22, 2026 · AI

Memory as Adaptation

Why agent memory moved from RAG storage to the foundation of policy adaptation

#AI #Agent #Memory #RAG

May 22, 2026 · AI

Weights, Prompts, Codes as Parameters

Weights, prompts, and code as parameters at different layers of a learnable policy space

#AI #Agent #LLM #POMDP

May 21, 2026 · AI

GraphDB Benchmark (2/2) — Workload Matrix and Final Recommendations

Eight graph engines measured across OLTP, memory, analytics, and differentiation queries

#GraphDB #PostgreSQL #RCTE #Neo4j

May 21, 2026 · AI

Comparing Four LightRAG Variants — Same Root, Different Production Strategies

Source-level comparison of RAG-Anything, ApeRAG, and EdgeQuake as LightRAG derivatives

#LightRAG #RAG-Anything #ApeRAG #EdgeQuake

May 20, 2026 · AI

PDF to Markdown — Five Tools Compared

Five PDF-to-Markdown converters (markitdown, pdftotext, pymupdf, mineru, opendataloader-pdf) scored against a seven-criterion 100-point rubric

#PDF #Markdown #RAG #Ingestion

May 20, 2026 · AI

Sharing Claude Code Sessions via Symlinked .jsonl

Resume the same Claude Code session across accounts by pointing every projects directory at one shared physical path

#Claude Code #Multi-account #Session Sharing #Symlink

May 20, 2026 · AI

Apple Silicon LLM Inference — Five Backends Compared

Benchmarking Qwen3.5-9B on Apple Silicon across MLX, llama.cpp, Ollama, omlx, and vLLM Metal — single-request throughput, prefill scaling, decode vs input length, and concurrency response

#LLM #Apple Silicon #MLX #llama.cpp

May 20, 2026 · AI

LightRAG Without Apache AGE — Graph Storage in Recursive CTE

Implementing LightRAG's BaseGraphStorage on plain PostgreSQL with RCTE — why a 1-hop-dominant retrieval pattern fits flat SQL

#LightRAG #PostgreSQL #RAG #GraphRAG

May 20, 2026 · AI

Codex App Server Python SDK — JSON-RPC v2 over stdio

A Python SDK over the codex app-server stdio interface — install, first call, thread model, main methods

#Codex #OpenAI #Python SDK #JSON-RPC

May 20, 2026 · AI

Running Multiple Claude Code Accounts on One Machine

Run isolated Claude Code accounts on the same Mac with one env var and a small zsh function

#Claude Code #Multi-account #zsh #Dotfiles

May 20, 2026 · AI

Long-Context Evaluation — NIAH and Lost in the Middle

NIAH limits, the Lost in the Middle effect, alternative benchmarks, and measured recall across four reasoning-effort modes

#LLM #Long-context #NIAH #Benchmark

May 20, 2026 · AI

NVIDIA NIM API — Free Inference for GLM, Kimi, Nemotron, and Gemma 4

NVIDIA's build.nvidia.com offers 100+ models on H100 infrastructure for free. Plug it directly into Claude Code, Cursor, or any OpenAI-compatible coding agent.

#NVIDIA #NIM #AI #API

May 20, 2026 · AI

Claude Code Settings Sync and Troubleshooting

Sync matrix, migration steps, common failures, and diagnostic commands for running multiple Claude Code accounts on one machine

#Claude Code #Multi-account #Settings Sync #Troubleshooting

May 20, 2026 · AI

GraphDB Benchmark, Eight Engines (Part 1) — Decomposing the RCTE 290x Gap

On a 1.14M-edge knowledge-graph workload, PostgreSQL RCTE beats Apache AGE by 290x — tracing the cypher() wrapper's 13ms cost and PG plan generation accumulation

#GraphDB #PostgreSQL #RCTE #Apache AGE

May 15, 2026 · AI

10 OpenAI Models Through Quick Benchmarks — The Model Isn't as Smart as You Pay

I ran 30 trials per configuration across GPT-4, GPT-5, and o-series models using three reasoning problems. gpt-5-nano on minimal scored 4.4%. o1 scored lower than gpt-4o.

#OpenAI #GPT #model comparison #AI

May 15, 2026 · AI

GitHub Models Inference API — Free Model Access Tested

How to easily call modern AI models such as GPT-4.1 and DeepSeek R1 through GitHub's free model inference API

#GitHub #Inference API #AI #API

May 15, 2026 · AI

Experimenting with the GPT-5 Responses API Web Search Tool

An experimental record of implementing web search with OpenAI's GPT-5 Responses API, focused on tool-support differences between models and how parameters shape responses. The analysis centers on web search tool compatibility between gpt-5 and gpt-5-chat-latest.

#AI #GPT-5 #API #web search