Roofline Studio — GPU & Workload Arithmetic Intensity

Peak = dense tensor-core throughput (no sparsity). GEMM bytes = read A,B + write C. Attention = FlashAttention-style ideal HBM traffic (Q,K,V,O).

GPU Roofline