Inside Constitutional AI: Anthropic's Alignment Architecture Explained

AI Tools Kit

AI Tools Kit provides free developer tools for working with AI language models. Built by developers, for developers.

Grouped Query Attention (GQA): How Modern LLMs Shrink KV Cache

GQA cuts KV cache 4-8x vs. multi-head attention with minimal quality loss. Architecture, memory math, MHA vs MQA vs GQA trade-offs, and which models (LLaMA 3, Mistral, Gemma) use it.

AI Architecture

How Speculative Decoding Works: Draft Models and 3x Speedup

Speculative decoding proposes token batches with a small draft model and verifies them in one large-model pass — 2-3x speedup with zero quality loss. Here's the algorithm, the acceptance math, and when it fails.

AI Architecture

Inside Model Context Protocol: How MCP Servers Actually Work

MCP connects AI models to tools via JSON-RPC 2.0 across stdio and HTTP transports. This deep-dive covers the host-client-server split, capability negotiation, the tool call state machine, and why the protocol was designed this way.

Inside Constitutional AI: Anthropic's Alignment Architecture Explained

Related Articles

Grouped Query Attention (GQA): How Modern LLMs Shrink KV Cache

How Speculative Decoding Works: Draft Models and 3x Speedup

Inside Model Context Protocol: How MCP Servers Actually Work