GPU Roofline

Each GPU is a roofline (memory-bound ramp into a compute-bound ceiling). Each workload drops a vertical line at its arithmetic intensity. Hover an intersection for the attainable-performance breakdown.

Peak = dense tensor-core throughput (no sparsity). GEMM bytes = read A,B + write C. Attention = FlashAttention-style ideal HBM traffic (Q,K,V,O).