DEV Community

# inference

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
KV Cache Is Eating Your VRAM — Here's How to Estimate It Before You Run Out

KV Cache Is Eating Your VRAM — Here's How to Estimate It Before You Run Out

Comments
6 min read
I Benchmarked Speculative Decoding — a = 3.5 Wasn't Enough

I Benchmarked Speculative Decoding — a = 3.5 Wasn't Enough

Comments
7 min read
Lossless, But Not Free: The Lossless, But Not Free — When Speculative Decoding Actually Pays Off (and When It Doesn't)

Lossless, But Not Free: The Lossless, But Not Free — When Speculative Decoding Actually Pays Off (and When It Doesn't)

2
Comments 3
6 min read
96% of cuBLAS, no `unsafe`: what cuTile Rust proves

96% of cuBLAS, no `unsafe`: what cuTile Rust proves

Comments
8 min read
Extract Structured JSON from Messy Text with Telnyx AI Inference

Extract Structured JSON from Messy Text with Telnyx AI Inference

Comments
2 min read
Chạy LLM trên iGPU: Giới hạn VRAM của Intel Arc và Radeon 780M

Chạy LLM trên iGPU: Giới hạn VRAM của Intel Arc và Radeon 780M

Comments
3 min read
How to Build a Secure Homelab for LLM Inference

How to Build a Secure Homelab for LLM Inference

Comments
4 min read
Google's DiffusionGemma Generates Text Sideways

Google's DiffusionGemma Generates Text Sideways

Comments
3 min read
Sipp: a local-first runtime for Hybrid AI Applications

Sipp: a local-first runtime for Hybrid AI Applications

11
Comments 2
11 min read
Speculative decoding: when and why it actually speeds up inference

Speculative decoding: when and why it actually speeds up inference

1
Comments
9 min read
Can You Tell When an LLM API Swaps in a Cheaper Model?

Can You Tell When an LLM API Swaps in a Cheaper Model?

1
Comments 3
3 min read
ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

Comments
4 min read
Why Most Browser AI Demos Fail on Real Hardware

Why Most Browser AI Demos Fail on Real Hardware

Comments
4 min read
The Inference Inversion

The Inference Inversion

Comments
7 min read
First Confirmed Directional Move on the AI Inference Frontier Index in 2026

First Confirmed Directional Move on the AI Inference Frontier Index in 2026

Comments
4 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.