Visualising LLMs with Open Hypergraphs and Catgrad
Let's visualise some LLM architectures! In this blog post, I'll show you how to generate these diagrams using open hypergraphs and catgrad. We'll produce three examples, including the attention diagram above and a huge SVG of every single op in the Qwen architecture.
Open Hypergraphs
Let's start with a simple example -- a residual connection around a linear layer. In catgrad, layers and architectures are represented as Open Hypergraphs: a datastructure for representing "circuit-like" syntax. Here's the code defining our layer:
pub fn residual(x: Var<NdArrayType, Operation>) -> Var<NdArrayType, Operation> {
linear_layer("linear", x.clone()) + x
}
Under the hood, this constructs an open hypergraph which we can visualise with the open-hypergraphs-dot library:
Let's break this down:
- Nodes are depicted as black circles ● labeled with an array shape (e.g.,
[8, 8]
). - Hyperedges are depicted as boxes with multiple inputs and outputs. They correspond to operations like
MatrixMultiply
. - Some nodes are designated as inputs and outputs: these are depicted as dashed, open-ended lines.
Point (3) is why these are open hypergraphs, and not just hypergraphs.
Importantly, copying is explicit in the hypergraph structure.
See the multiple outgoing edges of the top-right node ● which encode the reuse of
the x
variable inside the residual
function.
Aside from being a useful way to visualise variable sharing, representing
copying explicitly is important to how our
ahead-of-time autodiff algorithm works,
enabling decentralised training on the Hellas Network.1
Attention Please
Now let's do a complete, self-contained example: an Attention layer. Here's the code:
use catgrad::core::nn::layers::*;
use catgrad::core::{Dtype, NdArrayType, Operation, Shape};
use open_hypergraphs::lax::{OpenHypergraph, functor::*, var, var::Var};
use std::cell::RefCell;
use std::rc::Rc;
// 1. Create an OpenHypergraph for Gemma's attention layer,
// 2. Turn explicit copy operations into *nodes* in the hypergraph
// 3. Save as an SVG.
pub fn main() -> std::io::Result<()> {
let arrow = attention_arrow();
let arrow = var::forget::Forget.map_arrow(&arrow);
save_svg(&arrow, "images/attention.svg")
}
pub fn attention(
builder: &Rc<RefCell<OpenHypergraph<NdArrayType, Operation>>>,
dim: usize,
name: &str,
x: Var<NdArrayType, Operation>,
) -> Var<NdArrayType, Operation> {
let num_heads = 4;
let head_dim = dim / num_heads;
let b = x.label.shape.0[0];
let s = x.label.shape.0[1];
let k = linear(builder, dim, dim, &format!("{name}.key"), x.clone());
let q = linear(builder, dim, dim, &format!("{name}.query"), x.clone());
let v = linear(builder, dim, dim, &format!("{name}.value"), x);
let q = reshape(builder, Shape(vec![b, s, num_heads, head_dim]), q);
let k = reshape(builder, Shape(vec![b, s, num_heads, head_dim]), k);
let v = reshape(builder, Shape(vec![b, s, num_heads, head_dim]), v);
let q = transpose(builder, 1, 2, q);
let k = transpose(builder, 1, 2, k);
let v = transpose(builder, 1, 2, v);
let tk = transpose(builder, 2, 3, k);
let attn = mat_mul(builder, q, tk);
let denom = constant(builder, attn.label.clone(), f32::sqrt(head_dim as f32));
let attn = attn / denom;
let attn = softmax(builder, attn);
let attn = mat_mul(builder, attn, v);
let x = transpose(builder, 1, 2, attn);
let x = reshape(builder, Shape(vec![b, s, dim]), x);
linear(builder, dim, dim, &format!("{name}.proj"), x)
}
// Build the open hypergraph by creating a Var and calling the attention function
fn attention_arrow() -> OpenHypergraph<NdArrayType, Operation> {
let dim = 8;
let name = "attention";
var::build(|state| {
let x = Var::new(
state.clone(),
NdArrayType::new(Shape(vec![1, 1, 8]), Dtype::F32),
);
let y = attention(&state, dim, name, x.clone());
(vec![x], vec![y])
})
.unwrap()
}
use graphviz_rust::cmd::{CommandArg, Format};
// Render an OpenHypergraph to an SVG using `open-hypergraphs-dot`
fn save_svg(arrow: &OpenHypergraph<NdArrayType, Operation>, filename: &str) -> std::io::Result<()> {
let dot_graph = open_hypergraphs_dot::generate_dot(arrow);
let png_bytes = graphviz_rust::exec(
dot_graph,
&mut graphviz_rust::printer::PrinterContext::default(),
vec![CommandArg::Format(Format::Svg)],
)?;
std::fs::write(filename, png_bytes)?;
Ok(())
}
This produces the following diagram:
See here for a github repo with both examples.
Bonus: Qwen3-0.6B
To conclude, you can download the whole Qwen3-0.6B architecture as a diagram here. I haven't included it on this page because it's 7MB!
The qwen code is more complex, spread across multiple functions in catgrad, so to reproduce this diagram, see the code on this branch, and run:
cargo run --release --example llm -- -m Qwen/Qwen3-0.6B -p "Catgrad is " -s 10 --model-svg qwen.svg
Finally, I want to emphasize that open hypergraphs are a general datastructure for syntax, not just for neural networks and catgrad. Any kind of "circuit-like" term is a natural fit: from actual circuits to the kind of boxes-and-wires visual languages used in game engines.
If you have an idea for how you could use open hypergraphs and you want some help with the library, hop in our discord and let us know!
For further reading about open hypergraphs for (differentiable) syntax, you should know that the diagrams produced here are called string diagrams, a formal graphical syntax for Symmetric Monoidal Categories (SMCs). "Open Hypergraphs" (aka cospans of hypergraphs) formally correspond to arrows of SMCs, and our autodiff algorithm is built on this correspondence this to implement Reverse Derivatives for AoT autodiff.