DEV Community

# transformers

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How Modern Transformer Blocks Work — From RMSNorm to MoE

How Modern Transformer Blocks Work — From RMSNorm to MoE

Comments
5 min read
Why Positional Embeddings Matter — APE, RPE, and RoPE Explained for Developers

Why Positional Embeddings Matter — APE, RPE, and RoPE Explained for Developers

Comments
5 min read
🧠 人工智能发展方向:当前是否到头?

🧠 人工智能发展方向:当前是否到头?

Comments
1 min read
Why KV Cache Matters — How MQA, GQA, and MLA Make LLM Inference Faster

Why KV Cache Matters — How MQA, GQA, and MLA Make LLM Inference Faster

Comments
5 min read
Why Attention Becomes the Bottleneck — And How Efficient Attention Fixes It

Why Attention Becomes the Bottleneck — And How Efficient Attention Fixes It

Comments
3 min read
How Transformer Architecture Works — Encoder, Decoder, Tokens, and Context

How Transformer Architecture Works — Encoder, Decoder, Tokens, and Context

Comments
6 min read
Attention Is All You Need, Building a Transformer for Thanglish-to-Tamil

Attention Is All You Need, Building a Transformer for Thanglish-to-Tamil

Comments
3 min read
有人在拆 Transformer:Memory Caching 與 CTM 各拆走了一半

有人在拆 Transformer:Memory Caching 與 CTM 各拆走了一半

Comments
3 min read
Flash Attention: what it does and why it matters

Flash Attention: what it does and why it matters

Comments
8 min read
How Self-Attention Works — QKV, Softmax, and Matrix Computation

How Self-Attention Works — QKV, Softmax, and Matrix Computation

Comments
5 min read
How Attention Actually Works — From Next-Token Prediction to QKV Intuition

How Attention Actually Works — From Next-Token Prediction to QKV Intuition

Comments
3 min read
MoE Architectures Keep Solving the Wrong Problem

MoE Architectures Keep Solving the Wrong Problem

Comments
3 min read
Chapter 12: Inference - Generating New Text

Chapter 12: Inference - Generating New Text

Comments
9 min read
Chapter 11: The Full GPT - Assembling the Model

Chapter 11: The Full GPT - Assembling the Model

Comments
10 min read
Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Comments
9 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.