06. Graph Embeddings — The graph embedding between distant stations¶
~15 min read. Learn a vector for every node and relation so you can skip multi-hop traversal.
Continues from the first-principles overview in 00-first-principles.md. The graph embedding is a shortcut embedding — it lets you jump from one entity directly to a distant one without following every relationship in between.
1) Why embeddings for graphs?¶
The full knowledge graph has millions of entities and billions of relationships. Traversal answers exact questions ("who is the CEO of X?"). But what about fuzzy questions ("which companies are similar to Google?")?
Traversal cannot answer similarity questions — there's no "similarity" relationship. Embeddings solve this by placing every entity in a vector space where proximity means semantic or structural similarity.
graph embedding (embedding space)
station A ─────────────────────────────────▶ station D
│ │
│ relationship 1 │
▼ │
station B │
│ │
│ relationship 2 │
▼ │
station C ──── relationship 3 ──────────────▶ station D
The graph embedding skips B and C entirely for approximate queries. For exact multi-hop answers, still follow the relationships explicitly.
2) TransE: the translation model¶
TransE is the classic knowledge graph embedding model.
The idea: if (head, relation, tail) is a true triple, then h + r ≈ t in vector space.
head vector h relation vector r tail vector t
─────── + ───────── ≈ ─────────
[1.2, 0.3] + [-0.8, 1.1] ≈ [0.4, 1.4]
Score a triple: f(h, r, t) = -||h + r - t||₂
Higher (less negative) = more plausible triple.
This means: head entity + relation type = approximate position of tail entity. The graph embedding is literally a vector addition.
3) Worked numerical example: TransE scoring¶
Entities and their 2D embeddings:
Score for (Paris, capital_of, France):
h + r = [0.5 + 0.0, 0.2 + 0.8] = [0.5, 1.0]
t = [0.5, 1.0]
||h + r - t|| = ||[0.0, 0.0]|| = 0.0
score = -0.0 = 0.0 (perfect)
Now try a wrong triple: (Paris, capital_of, Germany). Germany embedding: t_wrong = [1.2, 0.9]
h + r = [0.5, 1.0]
t_wrong = [1.2, 0.9]
||h + r - t_wrong|| = ||[-0.7, 0.1]|| = √(0.49 + 0.01) = 0.71
score = -0.71 (lower = less plausible)
The correct triple scores 0.0, the wrong triple scores -0.71. TransE learns embeddings that make true triples score near zero.
4) Node2Vec: learning from random walks¶
TransE works on relation triples. Node2Vec learns entity embeddings purely from graph structure — no relation types.
Idea: run biased random walks from each node. Each walk is a "sentence" of node IDs. Train a word2vec-style model on these sentences.
Nodes that co-occur in walks get similar embeddings. Strongly-connected entities cluster together in the graph embedding space.
Bias parameters:
- p (return probability): how often to go back to the previous node.
- q (in-out ratio): how much to explore outward vs. stay local.
p=1, q=0.5 → DFS-like (explores **multi-hop junctions** far away)
p=1, q=2 → BFS-like (stays near the starting **entity**)
5) GraphSAGE and when embeddings beat traversal¶
TransE and Node2Vec are transductive — they can't embed a new entity added after training. GraphSAGE is inductive: it learns an aggregation function.
┌───────────────────────────────────────────────────────┐
│ For each station v: │
│ │
│ 1. Sample k neighbours │
│ 2. Aggregate neighbour features (mean / LSTM / pool) │
│ 3. Concatenate v's own features + aggregated │
│ 4. Apply learned linear layer + activation │
│ 5. Normalise → embedding │
└───────────────────────────────────────────────────────┘
A new entity added tomorrow gets an embedding immediately by running the learned aggregation over its neighbours. No retraining needed.
When to use graph embeddings vs. traversal
┌───────────────────────────────────┬────────────────────────┐
│ Use traversal (**graph query engine**)│ Use embedding │
│ │ (**graph embedding**) │
├───────────────────────────────────┼────────────────────────┤
│ Exact relationship queries │ Similarity queries │
│ Multi-hop factual reasoning │ Link prediction │
│ Compliance, audit trails │ Fuzzy entity matching │
│ Provenance needed │ Candidate generation │
└───────────────────────────────────┴────────────────────────┘
Neither is universally better. Production Graph RAG systems use both: the graph embedding for candidate generation, the graph query engine for precise traversal.
Where this lives in the wild¶
- Meta's PyG (PyTorch Geometric) — GraphSAGE deployed for social friend recommendation; engineers embed new users (new entities) without retraining the full model.
- Google DeepMind (TransE variants) — knowledge graph completion in Freebase; researchers predict missing relationships by scoring unseen triples.
- Uber Eats' food graph — Node2Vec embeddings on restaurant-dish-cuisine graph power "similar restaurants" recommendations in milliseconds.
- Microsoft Academic Graph — paper-citation graph embeddings let researchers find conceptually similar work without exact keyword overlap.
- Alibaba's product knowledge graph — TransE-style embeddings score product attribute compatibility, powering "frequently bought together" without explicit rules.
Pause and recall¶
- In TransE, what does it mean when
||h + r - t||is close to zero? - Why can TransE not embed a new entity added after training, but GraphSAGE can?
- What do the parameters
pandqcontrol in Node2Vec? - When should you use the graph embedding instead of the graph query engine?
Interview Q&A¶
Q: Why use TransE instead of just computing cosine similarity between node feature vectors? A: TransE encodes relation types as translations in embedding space — it models the asymmetry and directionality of relationships (CEO_OF is different from WORKS_AT). Pure cosine similarity on feature vectors treats all relations as the same.
Common wrong answer to avoid: "TransE is always better" — TransE fails on symmetric relations like MARRIED_TO; models like RotatE or ComplEx handle those cases.
Q: Why does Node2Vec use biased random walks instead of uniform walks? A: Uniform walks mix BFS and DFS arbitrarily. The bias parameters let engineers tune whether embeddings capture local neighbourhood (BFS) or global structure (DFS). Choosing the wrong bias for the task produces useless graph embeddings.
Common wrong answer to avoid: "Bias is just for speed" — the bias directly controls what structural information the embedding captures.
Q: Why is inductive embedding (GraphSAGE) important for production knowledge graphs? A: Knowledge graphs grow continuously — new entities (products, people, events) appear daily. Transductive models require full retraining on the new graph. GraphSAGE embeds new nodes at inference time using the learned aggregator.
Common wrong answer to avoid: "You can just retrain nightly" — nightly retraining misses same-day additions and adds operational cost.
Q: Why do embeddings lose precision compared to graph traversal for factual queries? A: Embeddings compress the full knowledge graph into a continuous vector space — structural detail is approximated, not preserved. Two entities with similar neighbourhoods get similar embeddings even if they differ in exact relation types.
Common wrong answer to avoid: "Larger embedding dimensions fix this" — dimensionality helps but cannot recover exact relational structure destroyed by compression.
Apply now (5 min)¶
Exercise. Pick three entities you know well (person, organisation, place). Assign toy 2D vectors to each. Define a relation vector. Compute the TransE score for one true triple and one false triple.
Sketch from memory. Draw the graph embedding shortcut next to a multi-hop relationship path. Label when you'd take the graph embedding vs. follow the full path.
Bridge. We now have both the explicit graph structure and learned vector shortcuts. The next step is to combine them inside a full Graph RAG architecture — where the graph query engine and the graph embedding work together to answer questions. → 07-graph-rag-architecture.md