Hellas

Visualising LLMs with Open Hypergraphs and Catgrad

Let's visualise some LLM architectures! In this blog post, I'll show you how to generate these diagrams using open hypergraphs and catgrad. We'll produce three examples, including the attention diagram above and a huge SVG of every single op in the Qwen architecture.

Open Hypergraphs

Let's start with a simple example -- a residual connection around a linear layer. In catgrad, layers and architectures are represented as Open Hypergraphs: a datastructure for representing "circuit-like" syntax. Here's the code defining our layer:

pub fn residual(x: Var<NdArrayType, Operation>) -> Var<NdArrayType, Operation> {
    linear_layer("linear", x.clone()) + x
}

Under the hood, this constructs an open hypergraph which we can visualise with the open-hypergraphs-dot library:

residual

Let's break this down:

  1. Nodes are depicted as black circles ● labeled with an array shape (e.g., [8, 8]).
  2. Hyperedges are depicted as boxes with multiple inputs and outputs. They correspond to operations like MatrixMultiply.
  3. Some nodes are designated as inputs and outputs: these are depicted as dashed, open-ended lines.

Point (3) is why these are open hypergraphs, and not just hypergraphs.

Importantly, copying is explicit in the hypergraph structure. See the multiple outgoing edges of the top-right node ● which encode the reuse of the x variable inside the residual function. Aside from being a useful way to visualise variable sharing, representing copying explicitly is important to how our ahead-of-time autodiff algorithm works, enabling decentralised training on the Hellas Network.1

Attention Please

Now let's do a complete, self-contained example: an Attention layer. Here's the code:

use catgrad::core::nn::layers::*;
use catgrad::core::{Dtype, NdArrayType, Operation, Shape};
use open_hypergraphs::lax::{OpenHypergraph, functor::*, var, var::Var};

use std::cell::RefCell;
use std::rc::Rc;

// 1. Create an OpenHypergraph for Gemma's attention layer,
// 2. Turn explicit copy operations into *nodes* in the hypergraph
// 3. Save as an SVG.
pub fn main() -> std::io::Result<()> {
    let arrow = attention_arrow();
    let arrow = var::forget::Forget.map_arrow(&arrow);
    save_svg(&arrow, "images/attention.svg")
}

pub fn attention(
    builder: &Rc<RefCell<OpenHypergraph<NdArrayType, Operation>>>,
    dim: usize,
    name: &str,
    x: Var<NdArrayType, Operation>,
) -> Var<NdArrayType, Operation> {
    let num_heads = 4;
    let head_dim = dim / num_heads;
    let b = x.label.shape.0[0];
    let s = x.label.shape.0[1];

    let k = linear(builder, dim, dim, &format!("{name}.key"), x.clone());
    let q = linear(builder, dim, dim, &format!("{name}.query"), x.clone());
    let v = linear(builder, dim, dim, &format!("{name}.value"), x);

    let q = reshape(builder, Shape(vec![b, s, num_heads, head_dim]), q);
    let k = reshape(builder, Shape(vec![b, s, num_heads, head_dim]), k);
    let v = reshape(builder, Shape(vec![b, s, num_heads, head_dim]), v);

    let q = transpose(builder, 1, 2, q);
    let k = transpose(builder, 1, 2, k);
    let v = transpose(builder, 1, 2, v);

    let tk = transpose(builder, 2, 3, k);
    let attn = mat_mul(builder, q, tk);
    let denom = constant(builder, attn.label.clone(), f32::sqrt(head_dim as f32));
    let attn = attn / denom;
    let attn = softmax(builder, attn);
    let attn = mat_mul(builder, attn, v);
    let x = transpose(builder, 1, 2, attn);
    let x = reshape(builder, Shape(vec![b, s, dim]), x);
    linear(builder, dim, dim, &format!("{name}.proj"), x)
}

// Build the open hypergraph by creating a Var and calling the attention function
fn attention_arrow() -> OpenHypergraph<NdArrayType, Operation> {
    let dim = 8;
    let name = "attention";
    var::build(|state| {
        let x = Var::new(
            state.clone(),
            NdArrayType::new(Shape(vec![1, 1, 8]), Dtype::F32),
        );
        let y = attention(&state, dim, name, x.clone());
        (vec![x], vec![y])
    })
    .unwrap()
}

use graphviz_rust::cmd::{CommandArg, Format};

// Render an OpenHypergraph to an SVG using `open-hypergraphs-dot`
fn save_svg(arrow: &OpenHypergraph<NdArrayType, Operation>, filename: &str) -> std::io::Result<()> {
    let dot_graph = open_hypergraphs_dot::generate_dot(arrow);
    let png_bytes = graphviz_rust::exec(
        dot_graph,
        &mut graphviz_rust::printer::PrinterContext::default(),
        vec![CommandArg::Format(Format::Svg)],
    )?;
    std::fs::write(filename, png_bytes)?;

    Ok(())
}

This produces the following diagram:

Attention diagram

See here for a github repo with both examples.

Bonus: Qwen3-0.6B

To conclude, you can download the whole Qwen3-0.6B architecture as a diagram here. I haven't included it on this page because it's 7MB!

The qwen code is more complex, spread across multiple functions in catgrad, so to reproduce this diagram, see the code on this branch, and run:

cargo run --release --example llm -- -m Qwen/Qwen3-0.6B -p "Catgrad is " -s 10 --model-svg qwen.svg

Finally, I want to emphasize that open hypergraphs are a general datastructure for syntax, not just for neural networks and catgrad. Any kind of "circuit-like" term is a natural fit: from actual circuits to the kind of boxes-and-wires visual languages used in game engines.

If you have an idea for how you could use open hypergraphs and you want some help with the library, hop in our discord and let us know!

1

For further reading about open hypergraphs for (differentiable) syntax, you should know that the diagrams produced here are called string diagrams, a formal graphical syntax for Symmetric Monoidal Categories (SMCs). "Open Hypergraphs" (aka cospans of hypergraphs) formally correspond to arrows of SMCs, and our autodiff algorithm is built on this correspondence this to implement Reverse Derivatives for AoT autodiff.