11. Graph Maintenance — Keeping the knowledge graph accurate over time¶

~14 min read. A stale or inconsistent knowledge graph silently produces wrong answers.

Continues from the first-principles overview in 00-first-principles.md. The knowledge graph needs updates whenever a new entity opens, an old relationship changes, or a route becomes invalid. Without maintenance, the graph query engine confidently follows wrong paths.

1) Why graphs go stale¶

Facts have temporal scope. "Sundar Pichai is CEO of Google" was false before 2015 and is still true today. "Elon Musk is CEO of Twitter" became true in 2022. "Twitter" as a brand became "X" in 2023.

A knowledge graph without timestamps and versioning silently serves outdated facts. The graph query engine follows a relationship labelled CEO_OF and lands at a former CEO — no error message, just a wrong answer.

┌──────────────────────────────────────────────────┐
│  Without temporal metadata:                       │
│  (Musk, CEO_OF, Twitter)  ← may be stale         │
│                                                   │
│  With temporal metadata:                          │
│  (Musk, CEO_OF, X, since:2022, until:now)         │
│  (Dorsey, CEO_OF, Twitter, since:2015, until:2021)│
└──────────────────────────────────────────────────┘

2) Incremental update pipeline¶

Batch reprocessing the full corpus is too slow for frequently-changing domains. Incremental updates process only new or changed documents.

new document arrives
        │
        ▼
┌───────────────────┐
│  NLP extraction   │  extract new triples
└────────┬──────────┘
         │
         ▼
┌───────────────────┐
│  Conflict check   │  does a contradicting triple exist?
└────────┬──────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
 no conflict  conflict detected
    │         │
    │         ▼
    │    ┌──────────────┐
    │    │  Resolution  │  choose winner by provenance + recency
    │    └──────┬───────┘
    │           │
    └─────┬─────┘
          │
          ▼
   write to graph

3) Conflict resolution: worked numerical example¶

Two sources disagree about Elon Musk's current role at Tesla.

Source A (news article, published 2024-01-10): Triple: (Musk, CHAIRMAN_OF, Tesla) — confidence: 0.91

Source B (SEC filing, published 2024-02-01): Triple: (Musk, CEO_OF, Tesla) — confidence: 0.98

Resolution scoring:

feature              Source A    Source B
─────────────────────────────────────────
Source authority     0.5         0.9   ← SEC filing > news
Recency              0.5         0.9   ← Feb > Jan
Extraction conf.     0.91        0.98
Combined score       0.5×0.5×0.91=0.23  0.9×0.9×0.98=0.79

Source B wins. Update relationship: (Musk, CEO_OF, Tesla). Archive Source A's triple with low confidence: 0.23.

Higher authority × more recent × higher extraction confidence = winning triple.

4) Provenance tracking¶

Every triple in the knowledge graph should carry provenance metadata.

triple: (Apple Inc., CEO_IS, Tim Cook)
provenance:
  source: Apple 10-K filing, 2023
  extracted: 2023-11-03
  confidence: 0.99
  extractor: rule-based NER v2.1

Why does this matter? 1. Auditing: show users which relationship came from which source. 2. Debugging: when an answer is wrong, trace it to the source. 3. Refresh scheduling: triples from volatile sources get re-checked more often.

The graph query engine can optionally filter traversal to triples with confidence > threshold. Low-confidence relationships are still kept but flagged.

5) Detecting staleness at scale¶

For large graphs, re-validating every triple is impractical. Use a risk-based refresh schedule instead.

┌────────────────────────────────────────────────────────┐
│  Triple type             │  Refresh frequency          │
├──────────────────────────┼─────────────────────────────┤
│  Immutable facts         │  Never (born_in, founded_in)│
│  Slowly changing (roles) │  Monthly                    │
│  Frequently changing     │  Daily (stock price, title) │
│  Real-time               │  Stream-based update        │
└────────────────────────────────────────────────────────┘

Staleness score for a triple:

staleness = days_since_update / expected_update_interval

Example: a CEO-role triple not updated in 400 days, expected interval 30 days. staleness = 400 / 30 = 13.3 — very stale. Flag for immediate re-extraction.

The graph query engine can attach staleness scores to traversal paths and warn users when a path relies on high-staleness relationships.

Where this lives in the wild¶

Wikidata's bot infrastructure — automated bots monitor Wikipedia changes and update Wikidata triples within minutes; human editors review flagged conflicts.
Google's Knowledge Graph freshness team — entity facts are refreshed on per-entity schedules based on "update velocity" — high-velocity entities (CEOs, prices) refresh daily; stable facts (founding year) refresh annually.
Bloomberg's entity graph — incremental NLP pipeline processes news 24/7; new triples land in the graph within 5 minutes of article publication.
Salesforce CRM graph — account-contact-opportunity relations update in real time from sales team actions; stale opportunity edges trigger automated reminders.
Amazon Product Knowledge Graph — seller-supplied attributes trigger conflict resolution against authoritative product data; losers are archived with low confidence.

Pause and recall¶

What does temporal metadata on an edge prevent compared to a bare triple?
In the conflict resolution example, why did Source B win despite Source A having good extraction confidence?
What is the staleness score formula, and what does a score of 13 mean?
Why keep low-confidence triples in the graph instead of deleting them?

Interview Q&A¶

Q: Why not simply delete old triples when new contradicting triples arrive? A: Historical facts have value. "Dorsey was CEO of Twitter from 2015–2021" is true and relevant for queries about that period. Deleting loses temporally valid facts. Version the triple with until timestamp and archive rather than delete.

Common wrong answer to avoid: "Deletions save storage" — the storage cost is minor; the knowledge loss is significant.

Q: Why is source authority a better resolution signal than extraction confidence alone? A: Extraction confidence measures how sure the model is that it extracted a fact from a document — not whether that document is authoritative. A highly confident extraction from a rumour blog loses to a moderate extraction from an SEC filing.

Common wrong answer to avoid: "High confidence means accurate" — confidence measures the extractor's certainty, not the source's reliability.

Q: Why use risk-based refresh scheduling instead of uniform refresh? A: Uniform refresh wastes compute re-validating immutable facts (birth year, founding date) and under-serves frequently-changing facts (executive role, product availability). Risk-based scheduling allocates refresh effort proportional to how often a fact type actually changes.

Common wrong answer to avoid: "Daily refresh of everything is safest" — it's operationally infeasible at scale and wasteful for stable facts.

Q: Why does a wrong entity link during ingestion compound into multiple wrong triples? A: All triples extracted from a document where a mention is mislinked inherit the wrong entity ID. Every relationship pointing to or from that node is anchored incorrectly. A single entity-link error contaminates all co-occurring triples in that source.

Common wrong answer to avoid: "Each triple is extracted independently" — entity linking is shared across all triples in a document; one error propagates to all.

Apply now (5 min)¶

Exercise. Model the maintenance lifecycle of one volatile fact in your domain (e.g., a job title). Design: the triple format with temporal metadata, provenance fields, and staleness score. Compute the staleness score if the triple hasn't been updated in 90 days with an expected interval of 30 days.

Sketch from memory. Draw the incremental update pipeline from new document to graph write. Label the conflict check branch and the resolution scoring formula.

Bridge. The knowledge graph is maintained. But how do we know it's any good? How do we measure extraction quality, traversal accuracy, and answer faithfulness? That requires a disciplined evaluation framework. → 12-graph-evaluation.md