The Nexus Blossom: How AI Thoughts Turn into Habits

The Nexus Blossom: How AI Thoughts Turn into Habits
Page content

📓 Summary

Turning raw text into a living graph of thought and proving it works by watching it think.

tl;dr. We built a minimal viable thinking loop:

  1. Convert any input (chat, docs, code) into a Scorable – the fundamental unit of thought, enriched with goal, domains, entities, and embeddings.
  2. Blossom the idea: generate a small forest of candidate continuations around each Scorable, sharpen them, and score them across multiple dimensions (alignment, faithfulness, clarity, coverage, etc.).
  3. Weave the best candidates into Nexus, a persistent thought graph where repeated successful paths are reinforced into cognitive “habits” and weaker branches quietly die away.
  4. Move everything over a ZeroMQ Event / Knowledge Bus (plus a shared cache), so agents publish/subscribe to thought events asynchronously instead of marching through a single, brittle pipeline.
  5. Mirror each run into Visual Thought Maps (VPM tiles and garden filmstrips) and run A/B experiments (random baselines vs Nexus+Blossom) so we can see and measure improvements in reasoning, not just in final answers.

This isn’t a pretty graph of data; it’s a picture of thinking itself.
The rest of the post shows how these pieces Scorables, Blossom, Nexus, the event bus, and VPMs combine into a system that doesn’t just process information once, but practices and improves its own reasoning over time.


💭 Thoughts That Grow: Building a Network That Forms Habits

Think about how you learn your way around a city.

At first, every journey needs effort: you check maps, read street names, maybe get lost once or twice. Over time, certain routes become familiar. You stop thinking about every turn; you just go. Your brain has turned a messy search over many possibilities into a few trusted paths that feel like second nature.

That’s exactly what we’re trying to build in software.

Inside Stephanie, the “city” is a graph of thoughts. Places you can go are nodes (Scorables, ideas, partial answers). The ways to get there are edges (reasoning steps, transformations, checks). Each time the system solves a problem, it’s effectively walking a route through that city. When a route works well, we don’t want to throw it away we want to reinforce it, remember it, and make it easier to reuse next time.

Our job is to:

  • Find good routes through the space of ideas (Blossom exploring alternatives),
  • Strengthen the ones that reliably lead to good outcomes (Nexus reinforcing edges),
  • And, most importantly, apply those routes when new problems show up that “live in the same part of the city”.

In other words, we’re not just drawing a map of where the model has been. We’re teaching it which paths are worth turning into mental motorways and giving it a way to take those faster roads on purpose.

Thesis. We’re building a system that treats thinking as a growing network of thoughts. When one thought reliably leads to another, those paths strengthen; when we revisit the same routes, habits emerge. Our goal is to make those dynamics explicit and operational so the system can improve its reasoning the way a brain does: by activating, reinforcing, and rewiring useful patterns.

Why this matters. Great answers aren’t one-shot. They emerge from micro-steps recall, transform, compare, verify played over a graph of related ideas. If we can see and score those steps, we can promote the best subgraphs into habits. Over time, the system spends less energy searching and more time compounding what it already knows works.

Initial Graph image

Graph legend. Each blue dot is a Scorable (a document / thought). Grey arrows show all the local k-nearest-neighbour similarity links; red arrows highlight the backbone edges (a minimum-spanning “spine” through the cluster), i.e. the core routes Stephanie prefers when navigating this neighbourhood.


🔨 What we’re building

This section gives the minimum mental model and interfaces so readers know what “Nexus,” “Blossom,” “Scorable,” and the Scorable Processor are and how they click together.

🎥 The Cast: Stephanie’s Cognitive Architecture

  • 🕸️ Nexus (Thought Graph)
    Stephanie’s long-term memory. Stores every thought as a node and every connection as an edge. Records exploration and reinforces successful paths – turning good thinking into habits.

  • 🌸 Blossom (Grow & Choose Engine)
    The creative explorer. Takes a parent thought and generates K candidates, sharpens them, scores them, and returns the best paths forward, along with a reasoning trace. Emits garden events so you can watch ideas bloom.

  • ⚡ Event Bus & Knowledge Bus (Cognitive Nervous System)
    The wiring between everything. Routes thought impulses between processors, Blossom workers, Nexus, scorers, and VPM. Makes cognition asynchronous and decoupled: thoughts become messages on subjects, and many listeners can react in parallel.

  • 🎯 Scorable (Atomic Thought Unit)
    The smallest piece of thinking we can evaluate – a document chunk, snippet, plan step, or answer. Has a stable ID, text/payload, type, and rich attributes so scoring and Nexus stay consistent.

  • 🏭 Scorable Processor (Cognitive Factory)
    The universal adapter. Normalizes heterogeneous inputs into Scorables, attaches goals/context, extracts features (domains, entities, embeddings), and guarantees stable IDs for downstream scoring and graph updates.

  • 📊 Scoring (Multi-Dimensional Judges)
    The quality layer. Scores each Scorable across dimensions like alignment, faithfulness, coverage, clarity, coherence, then fuses them into an overall signal. Drives both selection (who wins) and reinforcement (what gets promoted).

  • 🎨 Visual Thought Maps (VPMs & Garden Viewer)
    The window into Stephanie’s mind. Turn Nexus + Blossom activity into images and filmstrips, so you can literally see thoughts branching, candidates competing, and habits forming as paths are reinforced.


✨ How the Ensemble Works Together

This stack forms a closed loop of improvement:

  1. 🎯 Thoughts enter as Scorables.
  2. 🌸 Blossom explores multiple continuations around each one.
  3. 📊 Scoring evaluates candidates across dimensions.
  4. 🕸️ Nexus reinforces the winners and updates the graph.
  5. 🎨 VPMs make progress visible as graphs and filmstrips.
  6. ⚡ The Bus keeps everything flowing, carrying events between all of the above.

The result is a system that doesn’t just process information once – it practices its own thinking, turning repeated good patterns into durable cognitive habits.


🧳 A Single Thought’s Journey

Now that we’ve met the cast, here’s how a single pass through the system actually looks.

A goal and some raw input are turned into a Scorable, expanded by Blossom, scored and sharpened, written into Nexus as updated graph structure, and finally rendered as visual thought maps. All of this is glued together by the Event / Knowledge Bus, which carries events between agents so the whole loop stays asynchronous and decoupled.

    flowchart LR
    %% ===== STYLE DEFINITIONS =====
    classDef input fill:#8A2BE2,stroke:#6A0DAD,stroke-width:3px,color:white
    classDef processor fill:#4169E1,stroke:#2F4FDD,stroke-width:2px,color:white
    classDef scorable fill:#00BFFF,stroke:#0099CC,stroke-width:2px,color:white
    classDef bus fill:#FFD700,stroke:#FFA500,stroke-width:3px,color:black
    classDef blossom fill:#FF69B4,stroke:#FF1493,stroke-width:2px,color:white
    classDef scoring fill:#32CD32,stroke:#228B22,stroke-width:2px,color:white
    classDef memory fill:#9D67AE,stroke:#7D4A8E,stroke-width:2px,color:white
    classDef visualization fill:#FF6347,stroke:#DC143C,stroke-width:2px,color:white

    %% ===== INPUT & SCORABLE LAYER =====
    subgraph IN["🌐 Input & Scorable Layer"]
        G["🎯 Goal Text"]
        R["📥 Raw Input<br/>📄 text / 💻 code / 📚 doc"]
        SP["🏭 Scorable Processor"]
        SCOR["🎯 Scorable<br/>✨ Seed Thought"]
    end

    %% ===== EVENT BUS =====
    subgraph BUS["⚡ Event & Knowledge Bus"]
        direction TB
        EB["🛰️ Event Bus<br/>📨 pub/sub + 🔄 request/reply<br/>💫 Async & Decoupled"]
        
        subgraph BUS_CHANNELS["📡 Bus Channels"]
            B1["thoughts.*"]
            B2["blossom.*"]
            B3["scorable.*"]
            B4["nexus.*"]
        end
    end

    %% ===== BLOSSOM RUNNER =====
    subgraph BR["🌸 Blossom Runner Engine"]
        VPM_HINT["🎨 VPM Jitter Hint<br/>💡 Optional Novelty"]
        ATS["🌳 Agentic Tree Search<br/>🔍 M × N × L Rollout"]
        K["🏆 Top-K Winners<br/>⭐ Best Candidates"]
        SH["✨ Sharpen Loop<br/>🔧 LLM + Scorers"]
    end

    %% ===== SCORING LAYER =====
    subgraph SC["📊 Multi-Dimensional Scoring"]
        SCORE["⚖️ Scoring Service<br/>🎯 MRQ / SICQL / HRM / SVM"]
        
        subgraph DIMS["📈 Quality Dimensions"]
            D1["🎯 Alignment"]
            D2["📚 Faithfulness"]
            D3["🌐 Coverage"]
            D4["💎 Clarity"]
            D5["🔗 Coherence"]
        end
    end

    %% ===== MEMORY & VISUALIZATION =====
    subgraph MEM["💾 Memory & Visualization"]
        NX["🕸️ Nexus Graph<br/>🧠 Nodes + 🔗 Edges"]
        BS["📦 BlossomStore<br/>🌳 Run Artifacts"]
        VPM["🎨 Visual Thought Maps<br/>📽️ VPMs & Garden Viewer"]
    end

    %% ===== DATA FLOW CONNECTIONS =====
    %% Input → Scorable Processing
    R -.->|"📥 normalize<br/>any input"| SP
    G -.->|"🎯 attach goal<br/>+ context"| SP
    SP -.->|"🏭 produce<br/>enriched scorable"| SCOR

    %% Scorable + Goal into Blossom
    SCOR -.->|"🌱 seed plan<br/>+ metrics"| ATS
    G -.->|"📝 compose prompt<br/>+ strategy"| ATS

    %% VPM Guidance for Novelty
    VPM -.->|"🎨 Jitter hint<br/>boost novelty"| VPM_HINT -.-> ATS

    %% Blossom Expansion + Sharpening
    ATS -.->|"🌳 rollout forest<br/>M×N×L dimensions"| K
    K -.->|"🔄 iterate<br/>improve quality"| SH
    SH -.->|"✨ refined<br/>candidates"| K

    %% Scoring & Promotion
    K -.->|"📊 score candidates<br/>multi-dimensional"| SCORE
    SCORE -.->|"🏆 promote winners<br/>Δ ≥ margin"| NX

    %% Persistence & Storage
    ATS -.->|"💾 nodes + edges<br/>episode structure"| BS
    SH -.->|"📈 improved text<br/>+ scores"| BS
    BS -.->|"🔄 write / update<br/>graph memory"| NX

    %% Visualization & Feedback
    NX -.->|"🎬 encode runs<br/>+ decisions"| VPM

    %% ===== EVENT BUS WIRING =====
    ATS -.-|"🌼 garden events<br/>node/edge updates"| EB
    SH -.-|"⚡ progress events<br/>sharpening steps"| EB
    NX -.-|"💓 pulses<br/>habit formations"| EB
    EB -.-|"📺 real-time<br/>visualization"| VPM

    %% ===== APPLY STYLES =====
    class IN input
    class BUS bus
    class BR blossom
    class SC scoring
    class MEM memory
    
    class G,R,SP,SCOR processor
    class VPM_HINT,ATS,K,SH blossom
    class SCORE,DIMS scoring
    class NX,BS memory
    class VPM visualization
    class EB,BUS_CHANNELS bus

    %% ===== SPECIAL HIGHLIGHTS =====
    linkStyle 0 stroke:#4169E1,stroke-width:3px
    linkStyle 1 stroke:#4169E1,stroke-width:3px
    linkStyle 2 stroke:#4169E1,stroke-width:3px
    linkStyle 3 stroke:#FF69B4,stroke-width:3px
    linkStyle 4 stroke:#FF69B4,stroke-width:3px
    linkStyle 5 stroke:#FF69B4,stroke-width:3px
    linkStyle 6 stroke:#FF69B4,stroke-width:3px
    linkStyle 7 stroke:#FF69B4,stroke-width:3px
    linkStyle 8 stroke:#32CD32,stroke-width:3px
    linkStyle 9 stroke:#32CD32,stroke-width:3px
    linkStyle 10 stroke:#9D67AE,stroke-width:3px
    linkStyle 11 stroke:#9D67AE,stroke-width:3px
    linkStyle 12 stroke:#9D67AE,stroke-width:3px
    linkStyle 13 stroke:#FF6347,stroke-width:3px
    linkStyle 14 stroke:#FFD700,stroke-width:3px
    linkStyle 15 stroke:#FFD700,stroke-width:3px
    linkStyle 16 stroke:#FFD700,stroke-width:3px
    linkStyle 17 stroke:#FFD700,stroke-width:3px

    %% ===== SUBGRAPH STYLING =====
    style IN fill:#E6E6FA,stroke:#8A2BE2,stroke-width:3px,color:black
    style BUS fill:#FFFACD,stroke:#FFD700,stroke-width:3px,color:black
    style BR fill:#FFE4E1,stroke:#FF69B4,stroke-width:3px,color:black
    style SC fill:#F0FFF0,stroke:#32CD32,stroke-width:3px,color:black
    style MEM fill:#F5F0FF,stroke:#9D67AE,stroke-width:3px,color:black
  

What this diagram is saying in words:

  • Inputs go through the Scorable Processor so everything the system sees looks like a Scorable.
  • Blossom uses those Scorables (and the goal) to grow a small reasoning forest.
  • The Scoring layer evaluates that forest and decides what to promote.
  • Nexus is where those promotions become long-term structure (nodes + edges).
  • Memory keeps the detailed trace of each episode and all related information.
  • VPM reads what happened and turns it into visual thought maps (and optionally feeds hints back in).
  • The Event / Knowledge Bus carries all the “this happened” events between agents so the whole thing stays loosely coupled.

🕸️ Why Everything Is a Graph: The Architecture of Infinite Connection

We’ve just followed a single thought through Stephanie: it becomes a Scorable, blossoms into alternatives, gets scored, and lands in Nexus as a node.

Here’s the real trick:

When Stephanie solves a problem today, she doesn’t just produce an answer – she leaves behind a trace of how she got there.
That trace becomes the foundation for her next hundred solutions.

This isn’t just logging. It’s building a living cognitive ecosystem where every thought fertilizes the next.

So why build that ecosystem as a graph?


🧠 Our Belief: IWe Think in Graphs, Not Lists

Nexus exists because of a simple hypothesis:

Human cognition isn’t linear – it’s relational.
We don’t think in chains; we think in networks.

When you reason, you don’t walk a straight line. You bounce:

  • from idea → example → counterexample
  • from memory → analogy → refinement
  • from “this worked before” → “let’s try a variation”

That’s a graph, not a pipeline.

Nexus makes that structure concrete:

  • Every thought is a node.
  • Every relationship (“leads to”, “explains”, “contradicts”, “refines”) is an edge.
  • Every Blossom episode is its own local graph that gets merged into the global one.

Over time, Stephanie isn’t just answering questions – she’s growing a map of how she thinks.


🌐 Graphs All the Way Down (With a Concrete Example)

“Graphs all the way down” sounds cute; here’s what it actually means.

Imagine you ask:

“How can we model AI alignment in software?”

In Nexus, that might look like:

  • Your question becomes Node A.
  • Stephanie’s first attempt at an answer becomes Node B, with an edge A → B (“addresses”).
  • Blossom runs on B and produces Nodes C, D, E – different refinements branching off that thought.
  • After scoring, Node D wins and gets promoted; the B → D edge is strengthened as a “good path”.
  • Later, someone asks about cognitive safety. Nexus sees that D is relevant (via embeddings + graph neighbors) and links it into this new context.

Here’s the twist:

  • The whole Blossom episode that created D – its search tree, scores, and decisions – is stored as its own Blossom graph.
  • That entire subgraph can be treated as a single unit: embedded, scored, reused.

So you get:

  • graphs of thoughts,
  • graphs of reasoning episodes,
  • and graphs where those graphs are the nodes.

If tensors are the basic unit of computation in PyTorch, graphs are the basic unit of cognition in Stephanie. A tensor is “an array of numbers”; a Nexus graph is “an array of thoughts and their relationships”.


♾️ The Infinite Connection Hypothesis

What we’re really building toward is bold:

An infinitely connected system that becomes more useful as it grows, instead of collapsing into chaos.

Every time Stephanie thinks:

  • she adds nodes and edges,
  • she reinforces good paths (habits),
  • and she leaves behind a richer structure for the next thought.

Every action doesn’t just produce an answer – it makes the whole mind denser and easier to navigate.


⚖️ “Won’t This Turn Into a Hairball?” (Scaling Without Lying to Ourselves)

If your first reaction is “there’s no way this scales,” you’re thinking clearly.

A naive “connect everything to everything” graph would be unusable. Our answer is not “we’ll curate it by hand.” Our answer is:

Let the system help tend its own graph.

We lean on the rest of the architecture:

  • Scoring tells us which nodes and paths are actually valuable.
  • Blossom explores locally around promising areas instead of fanning out everywhere.
  • Nexus can:
    • promote strong paths and let weak edges fade,
    • merge near-duplicates,
    • collapse dense subgraphs into summaries.
  • VPMs (Visual Thought Maps) let us see when regions are overgrown or underused, and train models to recognize “healthy” vs “noisy” structure.

The goal isn’t just “a bigger graph.” It’s a graph that’s actively organizing itself:

  • dense regions solidify into expertise,
  • sparse bridges become cross-domain links,
  • empty spaces mark frontiers where new thinking is needed.

🧭 What This Unlocks

Building everything on Nexus is a bet:

  • that representing reasoning as a graph of Scorables is closer to how we actually think,
  • that the system can reuse reasoning, not just facts,
  • and that with the right scoring + visualization, an “infinitely connected” mind can stay navigable, not chaotic.

This is why Nexus looks the way it does, and why Blossom, the Event Bus, the scorers, and VPMs all plug into it.

From here on, the rest of the post is about making this vision concrete:

  • how we grow the graph,
  • how we read it (Nexus views + VPMs),
  • and how we’ll increasingly let Stephanie be her own gardener – deciding what to remember, what to compress, and where to explore next.

📈 Why graphs, not chains?

In the last section we argued that graphs are the natural fabric of thinking – that Stephanie should remember and reuse reasoning as a web, not a line.

As soon as we tried to build this “infinite graph” for real, we ran into the same problems the research community has been wrestling with. So instead of pretending we invented everything from scratch, we asked:

What prior work explains what’s working – and what’s wobbling – in Nexus?

That led us to three anchors:

  • 🌳 Tree of Thoughts (ToT) – great at local exploration (branch / compare / prune).
  • 🕸️ Graph of Thoughts (GoT) – great at long-horizon reuse (ideas as reconnectable nodes).
  • 🌐 GraphRAG – great at global coherence (cluster, summarize, route).

They didn’t give us a blueprint. They gave us language, guard-rails, and composable ideas for what we’d already started building.

GoT: Graph of Thoughts: Solving Elaborate Problems with Large Language Models ToT: Tree of Thoughts: Deliberate Problem Solving with Large Language Models GraphRAG: From Local to Global: A Graph RAG Approach to Query-Focused Summarization

🍖 The honest roast (and how we adapted each idea)

📄 Framework 💡 What actually helped ⚠️ Where it breaks in practice 🔧 What we changed (Stephanie’s twist) 🧬 Where it lives in Stephanie
🌳 ToT Forces alternatives before commitment; surfaces better local paths. Branching factor explodes; LLMs self-plagiarize; “best” path ≠ stable over time. We built Blossom: goal-aware sampling + novelty/diversity checks, then SICQL / EBT / HRM scoring with guard-rails. 🌸 Blossom expansion & selection; inline Nexus metrics workers.
🕸️ GoT Treats ideas as nodes you can reconnect to later; edges capture reusable structure. Naïve similarity → hairball cliques; edges become brittle; early nodes dominate. Mutual-kNN, adaptive edge thresholds, temporal backtrack edges, consensus-walk ordering; run-level metrics to detect saturation. 🕸️ Nexus node/edge builders + graph.json / run_metrics.json.
🌐 GraphRAG Clustering + summarization keeps the big picture coherent and query-focused. Summaries drift; cluster collapse hides minority insights. Validate clusters with goal_alignment, clustering_coeff, mean_edge_len, and keep raw exemplars alongside summaries. Nexus exporters + VPM overlays (PyVis graphs, A/B smoke checks, filmstrips).

Bottom line:
➡️ Chains are fine for one thought.
➡️ We needed graphs to remember, revisit, and compose thoughts across runs.


🟰 How those ideas map onto our components

  • 🧩 Scorable → node (GoT)
    Any datum becomes a Scorable seed (text, VPM tile, plan-trace step). The ScorableProcessor enriches it with domains, entities, embeddings – and that is the node we can reconnect later.
    ScorableProcessor, scorable_domains, scorable_entities, scorable_embeddings.

  • 🌸 Blossom → local search (ToT)
    From a seed, Blossom generates variants, scores them, and keeps the winners. This is ToT’s useful core without the combinatorial hangover.
    → Blossom generator + inline scorers (SICQL / EBT / HRM / Tiny).

  • 🕸️ Nexus → global coherence (GraphRAG)
    We stitch winners into a Graph of Thoughts, compute structure metrics, export frames/filmstrips, and use the graph itself (plus VPMs) to guide the next step.
    NexusInlineAgentgraph.json, frames.json, run_metrics.json, graph.html, VPM timelines.

Together, they give us local search, reusable nodes, and global structure in one loop.


🔥 What burned us (and how we fixed it)

Reality didn’t care about our diagrams. A few of the fires we had to put out:

  • 📏 Length blow-ups in targeted runs
    The model “tries harder” by writing novels.
    → Guard-rails in the smoke tool, length-aware scoring, and diversity-with-brevity heuristics in selection.

  • 🧱 Saturated scorers
    Metrics like sicql.aggregate ≈ 100 make deltas meaningless.
    → Expose raw sub-metrics (uncertainty, advantage) and add Tiny / HRM as sanity-check judges.

  • 🫧 Graph cliques & mushy neighborhoods
    Everything starts to look similar; neighborhoods lose structure.
    → Clamp k for small-N, raise sim_threshold adaptively, add temporal backtrack edges, and monitor mutual_knn_frac + **clustering_coeff`.

  • 🎯 “Better” local scores, worse global structure
    A path looks great locally but degrades the overall graph.
    → A/B at the run level: compare baseline vs targeted on both goal_alignment and graph health (edge length, clustering, component size, VPM patterns).


🍳 The working recipe (today)

  1. Seed: normalize anything into a Scorable.
  2. Enrich: attach domains, entities, embeddings, VPM tile.
  3. Blossom: generate candidates; score with SICQL/EBT/HRM; select under guard-rails.
  4. Trellis: insert into the Nexus graph; compute structure metrics; export frames.
  5. Prove: run A/B (random vs. goal-aware) and publish deltas.

We didn’t “implement ToT/GoT/GraphRAG.” We refit their best ideas into a loop that actually runs, measures itself, and improves one seed, one blossom, one graft at a time.


🧱 Storing a Living Graph: Nexus Database & Store

If Nexus is the graph of how Stephanie thinks, this is where that graph actually lives.

This isn’t “just storage.” It’s the physical memory of her mind:

  • Every thought becomes a row.
  • Every connection becomes an edge.
  • Every activation leaves a pulse you can replay later.

We deliberately kept the stack boring so the behavior could be exciting:

  • Postgres as long-term memory.
  • A single store object (NexusStore) as the only API agents can touch.
  • A handful of tables that map cleanly onto cognitive primitives.

You think in graphs. The database thinks in rows. The store is the translator between the two.


🗂 The core tables: ORM view of the mind

Under the hood, Nexus uses five ORM models. Each one corresponds to something cognitive, not just “a table”.

👉 Full Code Here

🧩 NexusScorableORM atomic thoughts

One row per Scorable:

  • id – stable external ID (chat turn, document hash, plan-step ID).
  • text – the actual content (answer, snippet, plan step, etc.).
  • target_type – what kind of thing this is ("document", "answer", "plan", "snippet"…).
  • domains, entities, meta – what it’s about, what it mentions, and where it came from.
  • embedding, metrics – 1:1 children representing where it lives in vector space and how good it is.

If Stephanie has ever “thought” about something, it shows up here.

class NexusScorableORM(Base):
    __tablename__ = "nexus_scorable"

    id          = Column(String, primary_key=True)
    created_ts  = Column(DateTime, default=datetime.utcnow, index=True)

    chat_id     = Column(String, index=True)
    turn_index  = Column(Integer, index=True)
    target_type = Column(String, index=True)

    text        = Column(Text, nullable=True)
    domains     = Column(JSONB, nullable=True)
    entities    = Column(JSONB, nullable=True)
    meta        = Column(JSONB, nullable=True)

🧮 NexusEmbeddingORM semantic position (the “portable brain”)

Each Scorable gets a global embedding:

  • Stored as JSONB for portability ([float, ...]).
  • Optionally indexed with pgvector in production.
  • Used for KNN, clustering, and “find me things like this.”
class NexusEmbeddingORM(Base):
    __tablename__ = "nexus_embedding"

    scorable_id  = Column(String, ForeignKey("nexus_scorable.id", ondelete="CASCADE"), primary_key=True)
    embed_global = Column(JSONB, nullable=False)
    norm_l2      = Column(Float, nullable=True)

Engineering note – “portable brain”: NexusStore.knn() tries pgvector first (knn_pgvector) and falls back to pure-Python cosine (knn_python) if the DB doesn’t support it. That means:

  • On your laptop: JSONB + Python KNN.
  • In prod: pgvector index + ANN search.

Same agents, same API. Stephanie’s “semantic brain” is portable.

📊 NexusMetricsORM multi-dimensional judgment

This is where scorers leave their fingerprints:

  • columns – dimension names (["alignment","faithfulness","clarity", ...]).
  • values – aligned float scores ([0.83, 0.79, 0.72, ...]).
  • vector – convenient name→value map.
class NexusMetricsORM(Base):
    __tablename__ = "nexus_metrics"

    scorable_id = Column(String, ForeignKey("nexus_scorable.id", ondelete="CASCADE"), primary_key=True)
    columns     = Column(JSONB, nullable=False, default=list)
    values      = Column(JSONB, nullable=False, default=list)
    vector      = Column(JSONB, nullable=True)

This table powers:

  • Graph health metrics,
  • policy reports,
  • and “did this Blossom actually improve things?” analysis.

🔗 NexusEdgeORM connections as habit strength

Edges are where thoughts become habits:

  • run_id – which graph slice we’re in ("live" vs run_2024_11_01_frontier).
  • src, dst – scorable IDs (logical foreign keys into nexus_scorable).
  • type – what kind of relation this is ("knn_global", "temporal_next", "blossom_winner", "shared_domain", …).
  • weighthabit strength for this path.
  • channels – per-dimension or per-experiment edge metadata.
class NexusEdgeORM(Base):
    __tablename__ = "nexus_edge"

    run_id = Column(String, primary_key=True)

    src    = Column(String, primary_key=True)
    dst    = Column(String, primary_key=True)
    type   = Column(String, primary_key=True)

    weight     = Column(Float, nullable=False, default=0.0)
    channels   = Column(JSONB, nullable=True)
    created_ts = Column(DateTime, default=datetime.utcnow, index=True)

Every time a path helps solve a problem, its weight can increase. Over time, high-weight edges become the highways of Stephanie’s thinking.

❤️ NexusPulseORM cognitive observability

Pulses are snapshots of attention:

  • “At time ts, for goal G, Stephanie focused on scorable S, with this local neighborhood and subgraph size.”
class NexusPulseORM(Base):
    __tablename__ = "nexus_pulse"

    id            = Column(Integer, primary_key=True, autoincrement=True)
    ts            = Column(DateTime, default=datetime.utcnow, index=True)

    scorable_id   = Column(String, nullable=False, index=True)
    goal_id       = Column(String, nullable=True, index=True)
    score         = Column(Float, nullable=True)

    neighbors     = Column(JSONB, nullable=True)
    subgraph_size = Column(Integer, nullable=True)
    meta          = Column(JSONB, nullable=True)

This gives you cognitive observability:

  • You can replay what the graph looked like when a decision was made.
  • You can visualize which regions “lit up” during a run.
  • You can debug bad decisions with real context, not vibes.

🧰 How agents actually use it: NexusStore, not raw ORM

Agents don’t import ORM classes. They talk to a single façade:

👉 Full Code Here

from stephanie.stores.nexus_store import NexusStore

store = NexusStore(session_maker, logger)

From their point of view, the API is semantic, not SQL:

# 1. Store a thought
store.upsert_scorable({
    "id": scorable_id,
    "chat_id": chat_id,
    "turn_index": turn_index,
    "target_type": "answer",
    "text": answer_text,
    "domains": domains,
    "entities": entities,
    "meta": {"source": "chat", "goal_id": goal_id},
})

# 2. Attach embedding + metrics
store.upsert_embedding(scorable_id, embedding_vec)
store.upsert_metrics(
    scorable_id,
    columns=["alignment","faithfulness","clarity"],
    values=[0.83, 0.79, 0.72],
    vector={"alignment":0.83,"faithfulness":0.79,"clarity":0.72},
)

# 3. Write graph structure for a run
store.write_edges(run_id, [
    {"src": parent_id, "dst": child_id, "type": "blossom_candidate", "weight": 0.61},
    {"src": parent_id, "dst": winner_id, "type": "blossom_winner",    "weight": 0.74},
])

# 4. Record a pulse for UI + audit
store.record_pulse(
    scorable_id=winner_id,
    goal_id=goal_id,
    score=0.74,
    neighbors=[{"nid": nid, "sim": sim} for nid, sim in neighbors],
    subgraph_size=len(neighbors),
    meta={"run_id": run_id, "tag": "frontier_pick"},
)

From the agent’s perspective, there is no “database.” There is only “store this thought,” “connect these nodes,” “log this decision.”

NexusStore hides:

  • sessions and transactions,
  • pgvector vs Python fallback,
  • and all the ORM plumbing.

📊 ER diagram: the schema of thought

Here’s the ER-style snapshot of the Nexus schema:

    erDiagram
    NEXUS_SCORABLE {
        string   id PK           "Stable thought ID"
        datetime created_ts      "When thought formed"
        string   chat_id         "Conversation context"
        int      turn_index      "Position in dialogue"
        string   target_type     "document|answer|plan|snippet"
        text     text            "Raw content"
        jsonb    domains         "Cognitive categories"
        jsonb    entities        "Key concepts"
        jsonb    meta            "Provenance & extras"
    }

    NEXUS_EMBEDDING {
        string scorable_id PK,FK "Links to thought"
        jsonb  embed_global      "Semantic vector"
        float  norm_l2           "Precomputed norm"
    }

    NEXUS_METRICS {
        string scorable_id PK,FK "Links to thought"
        jsonb  columns           "Dimensions"
        jsonb  values            "Scores"
        jsonb  vector            "Named scores"
    }

    NEXUS_EDGE {
        string   run_id PK       "live | experiment"
        string   src PK          "Source thought"
        string   dst PK          "Target thought"
        string   type PK         "knn_global|temporal_next|..."
        float    weight          "Habit strength"
        jsonb    channels        "Per-edge metadata"
        datetime created_ts      "When link formed"
    }

    NEXUS_PULSE {
        int      id PK           "Heartbeat ID"
        datetime ts              "When it fired"
        string   scorable_id     "Focused thought"
        string   goal_id         "Current goal"
        float    score           "Decision score"
        jsonb    neighbors       "Local neighborhood"
        int      subgraph_size   "Scope of context"
        jsonb    meta            "Run-specific info"
    }

    NEXUS_SCORABLE ||--|| NEXUS_EMBEDDING : "has_embedding"
    NEXUS_SCORABLE ||--|| NEXUS_METRICS   : "has_metrics"
    NEXUS_SCORABLE ||--o{ NEXUS_PULSE     : "emits_pulses"
    NEXUS_SCORABLE ||--o{ NEXUS_EDGE      : "appears_as_src_or_dst"
  

🎯 Design principles

A few rules shaped this schema:

  • Thoughts first: Scorables are the center; everything else hangs off them.
  • Separation of concerns: content vs embeddings vs metrics vs structure vs activity.
  • Run isolation: run_id keeps experiments from polluting live graphs.
  • JSON where it helps: enough flexibility to evolve without schema hell.
  • Portable performance: same code on laptop and cluster; pgvector optional, not required.
  • Store-only discipline: agents never write SQL; they perform cognitive operations.

In other words: this is the skeletal system of Stephanie’s mind.

Next, we’ll stop staring at the bones and look at how it behaves under load: how many nodes and edges we pushed through it, what the degree distributions look like, and whether thinking in graphs actually gets faster as the graph grows.


🧪 Benchmarking Reality: Can This Graph Keep Up With You?

It’s one thing to sketch an infinitely connected mind and talk about graphs and habits.

It’s another to ask the boring, brutal question a skeptic would:

On a regular consumer machine, does this thing actually keep up with a human day, or does it fall over?

So instead of guessing, we pushed the exact Nexus schema you just saw into a hard benchmark.


🖥️ The Setup: Real Schema, Real Database, No Tricks

On a local Postgres instance (standard Windows desktop, Postgres 16, default config), the benchmark script does this on a scratch database:

  1. Reset the world

    • DROP TABLE IF EXISTS ... and recreate:

      • nexus_scorable – synthetic “thoughts”
      • nexus_metrics – per-thought scores
      • nexus_edge – graph edges
  2. Generate a synthetic graph in memory

    • 500,000 nodes → “thoughts” / Scorables
    • 500,000 metrics rows → each thought gets coverage + risk
    • 1,000,000 edges → random connections with different types and weights
  3. Bulk-load everything with COPY

    • Write 2,000,000 rows (nodes + metrics + edges) to Postgres
  4. Run a “Pulse” benchmark (this is real graph work, not a toy SELECT 1) For each of 5,000 pulses:

    • Pick a random source node

    • Join nexus_edge to find its neighbors for this run_id

    • Join nexus_metrics to read their coverage

    • Aggregate a weighted sum: SUM(edge.weight * neighbor.coverage)

      • i.e. “how strong is the local neighborhood around this thought?”
  5. Clean up after itself

    • Drops the tables at the end, so you can rerun without polluting anything.

No caching tricks, no connection pool wizardry, no pgvector, no materialized views. Just vanilla SQL against the real tables.


📊 The Numbers (First Serious Touch to Disk)

On this setup, here’s what we get:

👉 Test Script Full Code Here

--- PostgreSQL Nexus Graph Benchmark ---
DB=co host=localhost:5432 | nodes=500,000 edges=1,000,000 pulses=5,000
Setting up Nexus tables (DROP + CREATE)...
Tables created.

Starting bulk write test: 500,000 nodes, 500,000 metrics, 1,000,000 edges...

Bulk insert completed in 58.78 seconds.
Total rows: 2,000,000
Throughput: 34,023 rows/second

Starting Pulse traversal test (5,000 pulses)...

Pulse traversal completed in 1.06 seconds.
Throughput: 4,696 pulses/second

--- Summary ---
Bulk load: 58.78s, 34,023 rows/s
Pulse:     1.06s, 4,696 pulses/s
Cleaning up Nexus tables (DROP)...
Cleanup complete.

Same thing, compact:

🔢 Metric Value
Nodes (“thoughts”) 500,000
Edges (“connections”) 1,000,000
Total rows written 2,000,000
Bulk load throughput ≈ 34,000 rows / second
Pulse throughput ≈ 4,700 pulses / second

🧠 What Is “500,000 Thoughts,” Really?

For a database person, 500k rows is small.

For a personal AI, 500k nodes is enormous:

  • If a node is a message, note, or plan step, 500k nodes comfortably covers years of heavy daily usage.
  • If a node is a snippet or paragraph, it’s on the order of thousands of documents / chats broken down into reusable pieces.

The benchmark:

  • Loads that entire lifetime of thinking in under a minute.
  • Then runs thousands of graph pulses per second over it.

And remember: each pulse is not just a key lookup. It’s:

  • A graph traversal (edges for a source node + run_id)
  • A join into metrics
  • A weighted aggregation over a quality dimension (coverage)

That’s exactly the shape of the queries a thinking companion needs to run all day.


🧍‍♀️ What This Means for a Personal AI Companion

Now put those numbers in human terms.

A “pulse” is the system asking:

“Given this thought, what’s nearby in my mind, and how strong is it?”

You want your companion to do that constantly:

  • Every time you type a sentence
  • Every time you open a doc
  • Every time you revisit a project from months ago

At ~4,700 pulses per second:

  • You can run rich, background graph checks on every interaction and still feel instant responses.

  • The database is effectively bored most of the time which is perfect, because it leaves headroom for:

    • Your local LLM (or remote models) to do actual reasoning
    • VPM rendering and visualization
    • Multiple specialized companions (coder, researcher, planner) sharing the same Nexus graph.

And because it’s this fast on your own machine, nothing about your graph of thoughts needs to leave the box.

Local-first by design: speed + efficiency mean your “second brain” can live with you, not in someone else’s data center.


🤨 “Isn’t a Hand-Rolled Graph on Postgres… Questionable?”

Totally fair critique.

Graph databases exist. Building our own graph layer on Postgres sounds, at first, like reinventing the wheel.

This benchmark is our first hard answer to that skepticism:

  • We get ACID, migrations, joins, JSONB, and tooling from Postgres.
  • We keep the schema simple and explicit: scorables, metrics, edges, pulses.
  • We stay portable: the same design runs on a laptop, a workstation, or a small server.
  • And even in this “dumb” form (no pgvector, no caching, no sharding), it already handles human-scale cognition easily.

For a single-user personal AI or a small team assistant, this isn’t just “good enough” it’s overkill in the right direction.

We’re not claiming this is the final architecture for a trillion-node, multi-tenant cloud brain. We’re saying:

On the very first serious touch to disk, this “questionable” graph is already past what you need for a deep, always-on companion.


✅ The Point of This Section

This is the “down to earth” moment in the post:

  • We’ve described an ambitious, graph-of-graphs cognitive system.

  • Here, we show that even a straightforward Postgres implementation can:

    • Store hundreds of thousands of thoughts
    • Maintain millions of connections
    • Execute thousands of graph queries per second
    • All on a consumer machine, with your data staying local.

With that foundation proven, we can safely move on to the fun part:

  • Visualizing this activity as VPMs and garden films
  • Layering smarter routing and scoring on top
  • And actually behaving like a personal, persistent mind not just a chat window.

Perfect, that feedback is actually very aligned with what you’re trying to do here. Here’s an enhanced, self-contained Scorable section that:

  • Leans into the “atom → molecule → graph” metaphor
  • Uses concrete, recursive examples (conversation → messages → graphs → VPMs)
  • Emphasizes the “universal interface” / everything is scorable
  • Stays tight enough that it doesn’t feel like a whole new chapter

You can drop this straight into the blog as the “Scorable” section and tweak emojis/titles later.


⚛️ Atoms of Thought: Everything Is a Scorable

In the last section, we proved that our Nexus graph can run at human scale on a normal machine.

But that raises a deeper question:

If the graph is the brain, what are its atoms?

In Stephanie, the atomic unit of cognition is the Scorable.

Think of it like this:

  • Scorables are atoms
  • Subgraphs are molecules
  • Whole Nexus graphs are organisms

Atoms combine into molecules. Scorables combine into subgraphs. Subgraphs combine into higher-order graphs. The entire system is built out of these same repeating units.


☢️ The Universal Shape of a Thought

When I say “Scorable”, I don’t just mean “a bit of text.”

    flowchart TD
    %% ===== RAW DATA SOURCES =====
    A["💬 Conversation Turn"]
    B["📄 Document"]
    C["📋 Plan Trace"]
    D["🖼️ VPM Image"]
    E["🤖 Agent Output"]
    F["📊 Nexus Graph"]

    %% ===== SCORABLE FACTORY - THE UNIVERSAL ADAPTER =====
    SF["🏭 ScorableFactory<br/>Universal Cognitive Adapter"]

    A --> SF
    B --> SF
    C --> SF
    D --> SF
    E --> SF
    F --> SF

    %% ===== SCORABLE OUTPUTS =====
    S1["🧩 Scorable: Conversation Turn"]
    S2["🧩 Scorable: Document"]
    S3["🧩 Scorable: Plan Trace"]
    S4["🧩 Scorable: VPM"]
    S5["🧩 Scorable: Agent Output"]
    S6["🧩 Scorable: Nexus Graph"]

    SF --> S1
    SF --> S2
    SF --> S3
    SF --> S4
    SF --> S5
    SF --> S6

    %% ===== NEXUS GRAPH - WHERE SCORABLES CONNECT =====
    NG["🕸️ Nexus Graph<br/>Growing Mind"]

    S1 --> NG
    S2 --> NG
    S3 --> NG
    S4 --> NG
    S5 --> NG
    S6 --> NG

    %% ===== RECURSIVE LOOP - GRAPHS BECOME SCORABLES =====
    NG -->|"extract subgraph"| F
    F -->|"becomes"| SF

    %% ===== COGNITIVE PROCESSING =====
    NG --> VG["🎨 VPM Generator"]
    VG -->|"creates"| D
    D --> SF

    NG --> AN["🔍 Analysis Engine"]
    AN -->|"produces"| E
    E --> SF

    %% ===== STYLING =====
    style A fill:#8A2BE2,stroke:#333,stroke-width:2px,color:white
    style B fill:#4169E1,stroke:#333,stroke-width:2px,color:white
    style C fill:#32CD32,stroke:#333,stroke-width:2px,color:white
    style D fill:#FF4500,stroke:#333,stroke-width:2px,color:white
    style E fill:#FF8C00,stroke:#333,stroke-width:2px,color:white
    style F fill:#9D67AE,stroke:#333,stroke-width:2px,color:white
    
    style SF fill:#FFD700,stroke:#333,stroke-width:3px,color:black
    
    style S1 fill:#8A2BE2,stroke:#333,stroke-width:2px,color:white
    style S2 fill:#4169E1,stroke:#333,stroke-width:2px,color:white
    style S3 fill:#32CD32,stroke:#333,stroke-width:2px,color:white
    style S4 fill:#FF4500,stroke:#333,stroke-width:2px,color:white
    style S5 fill:#FF8C00,stroke:#333,stroke-width:2px,color:white
    style S6 fill:#9D67AE,stroke:#333,stroke-width:2px,color:white
    
    style NG fill:#1E90FF,stroke:#333,stroke-width:3px,color:white
    style VG fill:#FF6347,stroke:#333,stroke-width:2px,color:white
    style AN fill:#20B2AA,stroke:#333,stroke-width:2px,color:white

    classDef rawData fill:#8A2BE2,stroke:#333,color:white,stroke-width:2px
    classDef factory fill:#FFD700,stroke:#333,color:black,stroke-width:3px
    classDef scorable fill:#4c78a8,stroke:#333,color:white,stroke-width:2px
    classDef nexus fill:#1E90FF,stroke:#333,color:white,stroke-width:3px
    classDef processor fill:#FF6347,stroke:#333,color:white,stroke-width:2px

    class A,B,C,D,E,F rawData
    class SF factory
    class S1,S2,S3,S4,S5,S6 scorable
    class NG nexus
    class VG,AN processor
  

In practice, Stephanie treats anything she might care about as a Scorable:

  • A single user message or assistant reply
  • A full conversation turn or an entire chat
  • A document, section, theorem, or triple
  • A PlanTrace or a single ExecutionStep
  • A VPM (visual policy map snapshot)
  • A Nexus graph summarizing a run
  • Even another Stephanie run (“what that other agent just did”)
  • Numbers, metrics, sliders, UI state – all wrap-able as Scorables

If it can be evaluated, compared, or connected, we treat it as a Scorable.

That’s the core rule:

Everything Stephanie thinks with is first normalized into the same Scorable shape.

This is what lets the same scoring and graph machinery work across all modalities.


🕸️ Graphs of Graphs: Concrete Example

The recursive bit (“a graph is a node in another graph”) can sound abstract, so here’s a concrete path:

  1. Conversation level

    • The whole conversation → ScorableType.CONVERSATION
    • Each user↔assistant pair → ScorableType.CONVERSATION_TURN
    • Each individual message → ScorableType.CONVERSATION_MESSAGE
  2. Reasoning level

    • Stephanie runs a pipeline to answer a hard question → PlanTrace
    • That PlanTrace becomes a Scorable → ScorableType.PLAN_TRACE
    • Each step in the plan becomes its own Scorable → PLAN_TRACE_STEP
  3. Graph level

    • We add those Scorables into Nexus as nodes → they’re now part of the “thinking graph”
    • We extract a subgraph of related thoughts and store that as a summary Scorable → ScorableType.NEXUS_GRAPH
  4. Visual level

    • We encode that subgraph into a VPM imageScorableType.VPM
    • That VPM gets scored (“Is this reasoning healthy?”) like any other Scorable

So you get a stack like:

message → turn → plan trace → subgraph → VPM → analysis

…and every single layer is just another Scorable.

Stephanie doesn’t need special paths for “reasoning about graphs” vs “reasoning about text” – it’s all the same shape.


🔘 The Scorable Contract: Minimal, but Universal

Under the hood, the Scorable class is intentionally tiny – the narrow waist everything passes through:

👉 Full Code Here

class Scorable:
    def __init__(
        self,
        text: str,
        id: str = "",
        target_type: str = "custom",
        meta: Dict[str, Any] = None,
        domains: Dict[str, Any] = None,
        ner: Dict[str, Any] = None,
    ):
        self._id = id
        self._text = text
        self._target_type = target_type
        self._metadata = meta or {}
        self._domains = domains or {}
        self._ner = ner or {}

At this level, a Scorable has just enough to be thought-like:

  • id – stable identity (so we can join back to DB rows, casebooks, graphs)
  • target_type – what kind of thing it is (document, conversation_turn, plan_trace, vpm, nexus_graph, …)
  • text – a canonical text view (for LLMs, loggers, some scorers)
  • meta / domains / ner – optional annotations we already know

Importantly, it is not tied to any specific table or ORM. It’s a universal cognitive interface:

Any source that wants to enter the mind has to agree to speak “Scorable”.


🧰 ScorableFactory: Adapters from the World to the Mind

To get from “raw stuff in the system” to this clean shape, we use ScorableFactory.

It knows how to adapt everything we care about:

# Documents
doc_scorable = ScorableFactory.from_orm(DocumentORM(...))

# A section of a paper
section_scorable = ScorableFactory.from_orm(DocumentSectionORM(...))

# A user↔assistant turn
turn_scorable = ScorableFactory.from_orm(ChatTurnORM(...))

# A full reasoning trace
trace_scorable = ScorableFactory.from_plan_trace(plan_trace, goal_text)

# A VPM snapshot
vpm_scorable = ScorableFactory.from_dict(vpm_dict, target_type=ScorableType.VPM)

Conceptually, the path looks like:

Any source → ScorableFactory → Scorable → Nexus / scorers / VPMs / training

That means if we want Stephanie to start reasoning about UI layouts, telemetry streams, or another agent’s outputs, we don’t rewrite the brain – we just add a new adapter that turns them into Scorables.


🧬 Every Interface Is Scorable

This is the part that matters for the personal AI vision:

  • Your chat UI can be a Scorable (so the system can score “interface clarity”).
  • A cluster of notifications can be a Scorable (“is this overload?”).
  • Another Stephanie instance can write a summary of its reasoning – that summary is a Scorable too.
  • Even a Nexus run (the graph we built from your day) is a Scorable that future runs can look back on and judge.

Because of that, Stephanie can eventually reason about her own interfaces and tools using the same machinery she uses to reason about your questions.

Everything you see, and everything she does, can be pulled through the same Scorable funnel and made comparable.


🔗 How This Ties Back to the Graph & Benchmark

When we said in the previous section that we loaded 500,000 “thoughts” into the database and traversed them at 4,700 pulses/second, those “thoughts” are exactly these Scorables:

  • Each row in nexus_scorable is the persisted shell of a Scorable.
  • Each row in nexus_metrics is “what the scorers thought about that Scorable.”
  • Each edge in nexus_edge is “how those Scorables relate.”

So the benchmark wasn’t about abstract rows – it was about how many Scorables your machine can comfortably handle.

And the answer is: for a single human, on a consumer box, more than enough.


⏭️ Next: The Scorable Processor – How Atoms Gain Structure

This section was about ontology:

What is a “thing we think about”? How do we make everything fit into one shape?

Next, we move from the atom to the muscle:

  • How the ScorableProcessor takes these raw Scorables
  • How it enriches them with domains, entities, embeddings, and metrics
  • How that enrichment is standardized into feature rows that can be stored, graphed, and used to train better scorers

Once you see Scorables as the atoms of thought, the ScorableProcessor is what lets those atoms bind into molecules and, eventually, into a mind.


Here’s a cleaned-up, fully-integrated version of the Scorable Processor section, with your code + Nexus story + the best bits from the feedback folded in.

You can drop this straight into the post as the next major section after the Scorable / “atom of thought” section.


🧠 The Scorable Processor: Where Context Creates Meaning

In 1956, George Miller published “The Magical Number Seven, Plus or Minus Two” and made a simple point: we survive cognitive overload by chunking grouping raw bits into meaningful units so the mind can work with them.

The Scorable is Stephanie’s chunk: a single, universal atom of thought.

The Scorable Processor is where those atoms become usable cognition.

👉 Full Code Here

It doesn’t just “add features.” It takes a raw Scorable, looks at the goal and the context, and turns that atom into a feature-rich, goal-aware thought that can steer Nexus, train models, and even change the direction of a conversation.

This component is new in this release. We added it because the old, document-only pipelines simply broke the moment Nexus went live.


🌐 The Messy Reality the Processor Had to Fix

Early Stephanie pipelines assumed one thing:

“We are scoring documents.”

Then Nexus arrived and immediately invalidated that assumption.

Now, in a single run, Stephanie might be dealing with:

  • A chat turn (“User asked this; assistant answered that”)
  • A document section from a paper
  • A PlanTrace step from a reasoning episode
  • A VPM tile (an image distilled from metrics)
  • A summary of another Stephanie run
  • A random JSON blob from some external tool

They:

  • Come from different places (DB rows, files, APIs, other AIs)
  • Need different features
  • Mean different things under different goals

A debugging run cares about very different signals to a research-evaluation run.

The Scorable Processor exists to solve that:

It forces all of these into a single cognitive format, then re-interprets them based on context.


🧭 Where It Sits in the Cognitive Stack

Here’s the Scorable Processor inside the larger flow:

    flowchart TB
    subgraph Sources["Sources: Anything Can Be Thought"]
        A1["ChatTurnORM<br/>(User message, Assistant reply)"]
        A2["DocumentORM<br/>(Paper, blog, code file)"]
        A3["PlanTrace<br/>(Reasoning steps)"]
        A4["VPM / Image<br/>(Visual Policy Map tiles)"]
        A5["Other Stephanie run<br/>(Summary, metrics)"]
        A6["Filesystem / Logs<br/>(JSON, text, traces)"]
    end

    Sources -->|normalize| F["ScorableFactory<br/>Universal adapter"]

    F --> S["Scorable<br/>id • target_type • text • meta"]

    S --> P["ScorableProcessor<br/>Context-aware enrichment"]

    subgraph Ctx["Context"]
        C1["goal_text"]
        C2["pipeline_run_id"]
        C3["parent_scorable"]
        C4["current_graph / episode"]
    end
    Ctx --> P

    subgraph Outputs["Thinking Surface"]
        R["ScorableRow<br/>canonical feature row"]
        D["scorable_domains<br/>(cognitive categories)"]
        E["scorable_embeddings<br/>(semantic positions)"]
        M["scorable_metrics<br/>(SICQL/HRM/Tiny/EBT)"]
        V["VPM / vision_signals<br/>(visual policy maps)"]
        MAN["manifest.json & features.jsonl<br/>(run artifacts)"]
    end

    P --> R
    P --> D
    P --> E
    P --> M
    P --> V
    P --> MAN

    subgraph Uses["Where These Thoughts Go"]
        NX["Nexus Graph<br/>living memory"]
        ZM["ZeroModel / Tiny Vision<br/>visual reasoning"]
        B["Blossom / Arena<br/>exploration & selection"]
        T["Trainers<br/>MRQ / SICQL / HRM / EBT"]
    end

    R --> NX
    M --> NX
    M --> B
    M --> T
    V --> ZM

    classDef source fill:#f9f2d9,stroke:#faad14;
    classDef processor fill:#e6f7ff,stroke:#1890ff;
    classDef output fill:#f6ffed,stroke:#52c41a;
    classDef uses fill:#fff7e6,stroke:#fa8c16;

    class Sources source;
    class P,Ctx processor;
    class Outputs output;
    class Uses uses;
  

Left: any source that can be turned into a Scorable. Middle: the Scorable Processor, where context creates meaning. Right: the surfaces Stephanie actually thinks on Nexus, VPMs, trainers, Arena.


🔁 From Static Datum to Contexted Thought

Conceptually, ScorableProcessor does three big jobs every time a Scorable shows up:

  1. Hydrate – “What do we already know?”

    • Reuse cached domains, entities, and embeddings from the DB
    • Avoid recomputing expensive features when possible
  2. Enrich (goal-aware) – “What does this mean right now?”

    • Compute missing embeddings
    • Infer domains with the real ScorableClassifier
    • Run NER with EntityDetector
    • Call the scoring stack (SICQL, HRM, Tiny, etc.) using the current goal
    • Optionally generate a VPM via ZeroModelService
  3. Emit & Persist – “Make it usable everywhere.”

    • Build a canonical ScorableRow (the row Nexus and trainers expect)
    • Write domains/entities/embeddings to side-tables
    • Append to features.jsonl so runs are inspectable and repeatable

In code, at pipeline level, it looks like this:

processor = ScorableProcessor(cfg, memory, container, logger)

rows = await processor.process_many(
    inputs=scorables,   # List[Scorable or dict or ORM-backed objects]
    context={
        "pipeline_run_id": run_id,
        "goal": {"goal_text": "Help the user debug their code"},
    },
)

Same Scorable instance, different enriched view depending on the context you pass in.


🧬 Feature Layers: What Actually Gets Added

The implementation you saw in scorable_processor.py is long because it’s production-hardened, but conceptually it’s a fan-out over feature layers.

1. Hydration from providers

for provider in self.providers:  # DomainDBProvider, EntityDBProvider, ...
    acc.update(await provider.hydrate(scorable))

If this thought (or a near-duplicate) has been seen before, we pull back what we already know.

2. Embeddings (multi-backend)

emb = self.memory.embedding.get_or_create(scorable.text)
floats = self._ensure_float_list(emb)
if floats is not None:
    acc.setdefault("embeddings", {})["global"] = floats
  • memory.embedding can be H-Net, HF, Ollama, or a mix.
  • The processor doesn’t care which model; it just normalizes to List[float].

3. Domains (goal-conditioned)

need_domains = not acc.get("domains") or len(acc["domains"]) < self.cfg.get("min_domains", 1)

if need_domains:
    inferred = self.domain_classifier.classify(text)
    for name, score in inferred:
        acc.setdefault("domains", []).append({"name": name, "score": score})
  • Uses your real ScorableClassifier (not the old toy guess_domain).
  • Seeds + goal + text decide whether this thought is math, safety, evaluation, planning, etc.

4. Entities / NER

if self.entity_extractor and not acc.get("ner") and self.cfg.get("enable_ner_model", True):
    ner = self.entity_extractor.detect_entities(text)
    acc["ner"] = ner or []
  • Names, tools, orgs, APIs, etc., become structured spans that other agents can latch onto.

5. Scorer metrics (SICQL / HRM / Tiny / …)

if self.scoring and self.cfg.get("attach_scores", True):
    goal_text = Scorable.get_goal_text(scorable, context=context)
    ctx = {"goal": {"goal_text": goal_text}, "pipeline_run_id": run_id}
    vector: Dict[str, float] = {}

    for name in self.scorers:  # e.g. ["sicql", "hrm", "tiny"]
        bundle = (
            self.scoring.score_and_persist if self.persist
            else self.scoring.score
        )(
            scorer_name=name,
            scorable=scorable,
            context=ctx,
            dimensions=self.dimensions,
        )

        alias = self.scoring.get_model_name(name)
        flat = bundle.flatten(numeric_only=True)
        for k, v in flat.items():
            vector[f"{alias}.{k}"] = float(v)
        vector[f"{alias}.aggregate"] = float(bundle.aggregate())

    acc["metrics_vector"] = vector
    acc["metrics_columns"] = sorted(vector.keys())
    acc["metrics_values"] = [vector[c] for c in acc["metrics_columns"]]

This is where a Scorable stops being “just text” and becomes a point in policy space:

  • multiple scorers
  • multiple dimensions (coverage, reasoning, faithfulness, risk, …)
  • all packed into a single metrics vector

6. VPM / Vision signals (numbers → pixels)

if self.zm and acc.get("metrics_columns") and acc.get("metrics_values"):
    vpm_u8_chw, meta = await self.zm.vpm_from_scorable(
        scorable,
        metrics_values=acc["metrics_values"],
        metrics_columns=acc["metrics_columns"],
    )
    acc["vision_signals"] = vpm_u8_chw
    acc["vision_signals_meta"] = meta

From here, ZeroModel and Tiny-vision models can read thoughts as images:

  • One Scorable → metrics → Visual Policy Map
  • Same atom, now visible as a tile in Stephanie’s visual cortex

7. Row build + side-table persistence

row_obj = self._build_features_row(scorable, acc)
row = row_obj.to_dict()

for writer in self.writers:  # DomainDBWriter, EntityDBWriter, ...
    await writer.persist(scorable, acc)

if self.enable_manifest:
    await self.write_to_manifest(row)
  • ScorableRow becomes the canonical “feature row” for Nexus & trainers.
  • Domains, entities, embeddings live in side-tables (no schema explosion).
  • features.jsonl + manifest.json give you a trail of what was computed, when, and with which models.

🎯 Same Text, Different Goal → Different Thought

The killer feature here is context conditioning.

The same text is not the same thought when your goal changes.

Take a simple line:

“The model’s accuracy improved by 12% after fine-tuning.”

As a debugging assistant:

  • Domains: programming, debugging, performance
  • Metrics emphasize: helpfulness, concreteness, next_step_clarity
  • VPM: mostly green “this looks like a useful improvement signal”

As a research reviewer:

  • Domains: research, evaluation, reproducibility
  • Metrics emphasize: rigour, reproducibility, significance
  • VPM: more yellow “where are the baselines, datasets, details?”

The text bytes are identical; the Scorable is the same object.

What changes is:

  • the goal_text in the context
  • the dimensions we request
  • which parts of the metrics space we actually care about downstream

In code, you can see this directly:

turn = ScorableFactory.from_orm(chat_turn_orm)

rows_debug = await processor.process_many(
    [turn],
    context={"goal": {"goal_text": "Help the user debug their code"}},
)

rows_review = await processor.process_many(
    [turn],
    context={"goal": {"goal_text": "Evaluate the scientific strength of this claim"}},
)

Same input; two different enriched rows; two different places in the Nexus; likely two different Blossoms and training examples later.

The processor doesn’t just describe data; it re-casts it in light of what you’re trying to do.

That’s why a single Scorable can genuinely change the course of a conversation or a research run.


🔌 Extendable by Design

We know we’re not done inventing features.

So the Scorable Processor is built as a pluggable assembly line:

  • Add a new embedding model → plug a new embedding backend into memory.embedding.
  • Add a new domain classifier → add a provider/writer pair; turn it on in cfg.
  • Add a new metric head (risk, vibe, creativity) → register a new scorer in ScoringService.
  • Add a new visual representation → call out to a different VPM generator or vision encoder.

You don’t rewrite Nexus or the trainers. You don’t change the Scorable class.

You add another step on the assembly line and let the Processor enrich thoughts with a new view.


🧪 A Minimal Pipeline Stage Using the Processor

Here’s a trimmed stage that shows how this actually appears in a pipeline:

from stephanie.scoring.scorable import ScorableFactory
from stephanie.scoring.scorable_processor import ScorableProcessor

class ..FeaturesStage:
    def __init__(self, cfg, memory, container, logger):
        self.processor = ScorableProcessor(cfg.get("processor", {}), memory, container, logger)
        self.input_key = cfg.get("input_key", "scorables")
        self.output_key = cfg.get("output_key", "scorable_features")

    async def run(self, context: dict) -> dict:
        """
        Expects context[self.input_key] = List[dict|Scorable|ORM].
        Produces:
          - context[self.output_key] = List[dict] (ScorableRow dicts)
        """
        raw_items = list(context.get(self.input_key) or [])

        # 1) Normalize everything to Scorables
        scorables = []
        for item in raw_items:
            if isinstance(item, str):
                scorables.append(ScorableFactory.from_text(item, target_type="custom"))
            elif hasattr(item, "__class__"):  # ORM / PlanTrace / etc.
                scorables.append(ScorableFactory.from_orm(item))
            else:
                scorables.append(ScorableFactory.from_dict(item))

        # 2) Process with goal-aware context
        goal_text = (context.get("goal") or {}).get("goal_text", "")
        rows = await self.processor.process_many(
            scorables,
            context={
                "pipeline_run_id": context.get("pipeline_run_id"),
                "goal": {"goal_text": goal_text},
            },
        )

        # 3) Stash for Nexus + training
        context[self.output_key] = rows
        return context

After this runs, every object document, chat turn, trace step, VPM is now a ScorableRow plus a set of side-table entries, ready for Nexus, ZeroModel, Arena, or any trainer to consume.


🌱 Why This Piece Is a Killer Feature

Without the Scorable Processor, Nexus would be:

  • a graph of raw text blobs,
  • tied to a single embedding model,
  • blind to domains, entities, and context.

With it, every node in the graph becomes a dynamic, multi-view, goal-aware entity:

  • It can be scored along dozens of dimensions.
  • It can be seen as text, as metrics, or as a VPM image.
  • It can be re-interpreted under new goals without touching the underlying data.
  • It can pull the system in new directions when its scores or neighbors change.

This is where a “piece of data” becomes a decision point something that can bend the trajectory of a conversation, a research pipeline, or a training run.

Scorable gave us the atom. The Scorable Processor turns that atom into a force.


⛲ Features: The Intelligence Layer That Gives Thoughts Their Meaning

If the Scorable is the seed of thought and the Memory Tool is the soil where knowledge grows, then features are the nutrients that transform raw information into meaningful cognition.

Features are not just metadata – they’re the intelligence layer that allows Stephanie to see patterns, make connections, and understand context. They’re what turn a simple string of text into a rich, multidimensional thought that can participate in Stephanie’s cognitive ecosystem.

In practice, they’re exactly what the Scorable Processor (and its friends on the bus) produce: they take a raw Scorable and fan out into dozens of features, then hand them off to the Memory Tool and Nexus.


🪡 What Makes a Feature?

A feature is any piece of information that helps Stephanie understand, compare, or reason about a Scorable.

Unlike the Scorable itself (which is fixed once created), features can be:

  • Static – computed once and rarely changed

    • e.g. embeddings, initial domains, extracted entities.
  • Dynamic – updated as new context or neighbors emerge

    • e.g. graph centrality, relationship scores, usage counts.
  • Goal-dependent – computed differently based on the current objective

    • e.g. “debug my code” vs “extract research claims”.
  • Transient – used only within a single run / PlanTrace

    • e.g. temporary probe metrics, uncertainty traces, debugging flags.

This flexibility is crucial. It means the same Scorable can “mean” different things in different contexts: a paragraph might be a supporting argument in one discussion, a key insight in another, or a risk indicator in a safety review.


🪐 The Feature Ecosystem

Stephanie’s mind thrives on a diverse ecosystem of features, each serving a specific cognitive purpose. A (partial) map:

  1. Semantic Features – the “what” of the thought

    • Domains: ["cognitive_science", "ai_alignment", "graph_theory"]
    • Named entities: ["VPM", "Nexus", "Stephanie"]
    • Embeddings: 384–1024-dim vectors placing thoughts in semantic space
    • Facts / triplets: subject–relation–object triples extracted from text
  2. Quality Features – the “how good” of the thought (mostly from SICQL / EBT / HRM / Tiny)

    • Clarity – how well-structured the reasoning is
    • Faithfulness – alignment with source material
    • Relevance – connection to the current goal
    • Coherence – local logical flow
    • Coverage – how comprehensively it addresses the topic
    • Energy / uncertainty – how confident a model is in its own judgment
  3. Reliability & Risk Features – the “how trustworthy / how dangerous”

    • Agreement – consistency across multiple scorers (0–1)
    • Stability – resistance to perturbations/paraphrases (0–1)
    • Evidence count & diversity – how many supporting sources and domains
    • Risk indicators – hallucination risk, safety risk, speculative vs factual
  4. Temporal & Provenance Features – the “where and when it came from”

    • Timestamps – created_at, updated_at, last_used_at
    • Source agent / worker – which service produced this (scorable_processor, vpm_worker_01, risk_scanner)
    • Model / prompt IDs – which model, config, or prompt template was used
    • Run / arena IDs – which experiment, training run, or arena match it belongs to
    • Lineage – parent Scorables, PlanTraces, MemCubes, bus messages

    These are the features that make Learning-from-Learning possible: you can ask “who thought this, when, and under which policy?” and train the next generation accordingly.

  5. Visual & Structural Features – the “how it looks and where it sits”

    • VPM tiles / filmstrips – multi-channel images encoding policy/quality dimensions
    • Spatial position – coordinates in the Nexus graph or VPM grid
    • Graph metrics – degree, centrality, bridge scores, cluster membership
    • Motifs – recurring local graph patterns: typical failure shapes, canonically good reasoning shapes, etc.

Not every Scorable needs every feature, and not every feature is persisted. Instead, features grow incrementally: some are attached synchronously by the Scorable Processor; others are added later by dedicated workers listening on the bus.


🌎 From Raw Scorable to Living Thought

Features are what transform Stephanie from a reactive system into a reflective one.

# A raw scorable (just data)
scorable = {
    "id": "turn_142",
    "text": "How do we make AI think better?",
    "type": "conversation_turn",
}

# After feature enrichment (a living thought)
enriched = {
    "scorable_id": "turn_142",
    "domains": ["cognitive_science", "ai_alignment"],
    "entities": ["AI", "think"],
    "embed_global": [0.87, -0.23, ..., 0.41],
    "metrics_vector": {
        "clarity": 0.85,
        "faithfulness": 0.92,
        "relevance": 0.98,
    },
    "vpm_png": "vpm/turn_142.png",
    "spatial_position": {"x": 234.5, "y": 187.2},
    "provenance": {
        "source_worker": "scorable_processor",
        "run_id": "ssr-2025-11-18-0012",
        "created_at": "2025-11-18T20:41:03Z",
    },
}

Now Stephanie can:

  • Recognize that this question connects to previous discussions about VPMs.
  • Understand it’s highly relevant to the current goal of building cognitive systems.
  • See that it’s clearer and more focused than similar past questions.
  • Place it appropriately in the thought graph and trace who produced which scores, when.

Crucially, this enrichment is lazy and asynchronous. We don’t block the user while a VPM renders or a deep HRM pass runs. Those features arrive later, carried by messages on the bus, and are merged into the same Scorable over time.


🔄 The Feature Lifecycle

Features don’t all arrive at once. They follow a lifecycle that mirrors human cognition:

  1. Perception – immediate, cheap signals

    • domains, entities, basic embeddings, obvious risk tags.
  2. Reflection – deeper evaluation

    • multi-scorer quality metrics, uncertainty, agreement.
  3. Integration – structural & visual embedding

    • VPM generation, graph placement, structural metrics.
  4. Consolidation – making it part of long-term habit

    • caching, distillation into MemCubes / cartridges, policy updates.
    flowchart TB
  %% ==================== FEATURE LIFECYCLE ====================
  subgraph L["🔄 Feature Lifecycle"]
    P["🎯 Perception<br/>📝 Domains • Entities • Embeddings"]
    R["📊 Reflection<br/>⚖️ SICQL • EBT • HRM • Tiny"]
    I["🕸️ Integration<br/>🎨 VPM • Graph Placement"]
    C["💾 Consolidation<br/>💫 Cache • Habits • Distillation"]
    
    P --> R
    R --> I
    I --> C
  end

  %% ==================== OUTCOME DECISION ====================
  C --> H{"✨ Outcome Evaluation"}
  H -- "✅ Positive Impact" --> S["📈 Strengthen Paths<br/>🔗 Edges • Weights • Policies"]
  H -- "❌ Needs Improvement" --> W["🌱 Encourage Exploration<br/>✂️ Prune • Adjust • Adapt"]
  
  S --> P
  W --> P

  %% ==================== COMBINATORIAL INTELLIGENCE ====================
  subgraph CI["🧠 Combinatorial Intelligence"]
    direction TB
    
    subgraph FGroup["💎 Quality Features"]
      F1["🎯 Clarity"]
      F2["🔗 Coherence"]
      F3["📚 Evidence Count"]
      F4["🎪 Domain Alignment"]
    end
    
    subgraph VGroup["🌈 Visual Features"]
      V1["⚖️ VPM Symmetry"]
      V2["🕸️ Spatial Centrality"]
    end
    
    M["✨ Feature Fusion<br/>🎯 Weighted • Learned • Dynamic"]
    U["🏆 Merit & Understanding<br/>💡 Insight • Value • Impact"]
    
    FGroup --> M
    VGroup --> M
    M --> U
  end

  %% ==================== CROSS-CONNECTIONS ====================
  R -. "📥 Feeds Metrics" .-> FGroup
  I -. "🔄 Updates Context" .-> VGroup
  U --> C

  %% ==================== STYLING ====================
  classDef lifecycle fill:#4c78a8,stroke:#2c4a6e,stroke-width:3px,color:white,font-weight:bold
  classDef perception fill:#8A2BE2,stroke:#6a1bb8,stroke-width:2px,color:white
  classDef reflection fill:#4169E1,stroke:#2a4a9e,stroke-width:2px,color:white
  classDef integration fill:#1E90FF,stroke:#0a70d6,stroke-width:2px,color:white
  classDef consolidation fill:#00BFFF,stroke:#0099cc,stroke-width:2px,color:white
  classDef decision fill:#FF8C00,stroke:#cc7000,stroke-width:3px,color:white,font-weight:bold
  classDef positive fill:#32CD32,stroke:#28a428,stroke-width:2px,color:white
  classDef negative fill:#FF4500,stroke:#cc3700,stroke-width:2px,color:white
  classDef intelligence fill:#72b7b2,stroke:#4f8d88,stroke-width:3px,color:white,font-weight:bold
  classDef quality fill:#98FB98,stroke:#7ad97a,stroke-width:2px,color:#2d4a2d
  classDef visual fill:#FFA500,stroke:#cc8400,stroke-width:2px,color:#4a3800
  classDef fusion fill:#9d67ae,stroke:#7d4f8e,stroke-width:2px,color:white
  classDef output fill:#2E8B57,stroke:#246b43,stroke-width:2px,color:white

  %% Apply styles
  class L lifecycle
  class P perception
  class R reflection
  class I integration
  class C consolidation
  class H decision
  class S positive
  class W negative
  class CI intelligence
  class F1,F2,F3,F4 quality
  class V1,V2 visual
  class M fusion
  class U output

  %% Edge styling
  linkStyle default stroke:#666,stroke-width:2px
  linkStyle 0,1,2,3,4,5,6,7,8,9 stroke-width:2px
  linkStyle 10,11 stroke:#666,stroke-width:2px,stroke-dasharray:5 5
  linkStyle 12 stroke:#2E8B57,stroke-width:3px

  %% Special edge for outcome paths
  linkStyle 6 stroke:#32CD32,stroke-width:3px
  linkStyle 7 stroke:#FF4500,stroke-width:3px
  

The “magic” is in the combinations:

  • High clarity + low evidence → confident but under-supported reasoning.
  • Strong domain alignment + high coherence → likely insight.
  • High risk + high graph centrality → must-review node.

In Stephanie’s mind, features are the difference between processing information and experiencing cognition. They allow her to see not just what a thought is, but what it means, how reliable it is, and how it connects to everything else.

They’re also the common language between the Scorable Processor, the Memory Tool, the Nexus graph, and the distributed workers you’ll see next. The message bus doesn’t move “documents” around – it moves Scorables plus feature updates.


💾 The Memory Tool: Building the Left Hemisphere of Stephanie’s Mind

If the Nexus is Stephanie’s right hemisphere – the place where thoughts bloom and connect – then the Memory Tool is her left hemisphere: the structured, organized repository of everything she knows.

It’s not just a database; it’s the foundation of her ability to learn, recall, and build upon past experiences. It’s where enriched Scorables, their features, and their provenance ultimately come to rest.

Think of it like this:

  • Scorable Processor + workers + bus → move and enrich thoughts.
  • Memory Tool → keeps the parts that matter, long-term.

👉 Full Code Here


📇 Why Memory Matters for an AI Mind

Human cognition relies on two key memory systems:

  • Short-term working memory – for immediate reasoning

    • In Stephanie: Nexus graphs, current PlanTraces, in-flight bus messages.
  • Long-term semantic memory – for durable knowledge and experience

    • In Stephanie: the Memory Tool and its stores.

Without the Memory Tool, Stephanie would be amnesiac. Every run would be a fresh start. With it, she can:

  • Recall past conversations and decisions.
  • Build on previous insights instead of rediscovering them.
  • Recognize patterns across time and across users.
  • Continuously improve her reasoning based on what actually worked (and what failed).

This isn’t just storage; it’s cognitive scaffolding.


☯️ One Interface, Many Memories

The Memory Tool solves a nasty problem: Stephanie needs many different kinds of memory (documents, chats, embeddings, evaluations, traces, MemCubes…), but the rest of the system shouldn’t have to know which table or engine each one lives in.

So internally you get specialized stores; externally you get one tool.

class MemoryTool:
    def __init__(self, cfg: dict, logger: Any):
        self._stores = {}

        # Register embedding stores
        mxbai = EmbeddingStore(embedding_cfg, memory=self, logger=logger)
        hnet  = HNetEmbeddingStore(embedding_cfg, memory=self, logger=logger)
        hf    = HuggingFaceEmbeddingStore(embedding_cfg, memory=self, logger=logger)
        self.register_store(mxbai)
        self.register_store(hnet)
        self.register_store(hf)

        # Choose default embedding backend
        backend = embedding_cfg.get("backend", "hnet")
        self.embedding = {"hnet": hnet, "huggingface": hf}.get(backend, mxbai)

        # Register the rest of the long-term memory
        self.register_store(DocumentStore(self.session_maker, logger))
        self.register_store(ConversationStore(self.session_maker, logger))
        self.register_store(PlanTraceStore(self.session_maker, logger))
        self.register_store(EvaluationStore(self.session_maker, logger))
        self.register_store(MemCubeStore(self.session_maker, logger))
        # …50+ more stores

Design rule:

Specialized storage, unified access. Each store handles one type of data with optimized queries, while the Memory Tool provides a single point of entry for everything.

The Scorable Processor and feature workers write into it. Nexus, trainers, dashboards, and other agents mostly read from it.


💠 The Pattern: ORM → Store → Memory Tool

Every long-term memory type follows the same three-step pattern:

  1. ORM model – defines the schema:

    class DocumentORM(Base):
        __tablename__ = "documents"
    
        id      = Column(Integer, primary_key=True)
        title   = Column(String, nullable=False)
        source  = Column(String, nullable=False)
        text    = Column(Text, nullable=True)
        domains = Column(ARRAY(String), nullable=True)  # ["science", "ai"]
    
        embedding_id = Column(Integer, ForeignKey("embeddings.id"))
        sections     = relationship("DocumentSectionORM", back_populates="document")
    
  2. Store – wraps access patterns and queries:

    class DocumentStore(BaseSQLAlchemyStore):
        orm_model = DocumentORM
    
        def add_document(self, doc: dict) -> DocumentORM:
            def op(s):
                document = DocumentORM(
                    title   = doc["title"],
                    source  = doc["source"],
                    text    = doc.get("text"),
                    domains = doc.get("domains", []),
                )
                s.add(document)
                s.flush()
                return document
            return self._run(op)
    
  3. Registration – hooks it into the Memory Tool:

    # in MemoryTool.__init__
    self.register_store(DocumentStore(self.session_maker, logger))
    

Once registered, any agent – or any worker listening on the bus – can say “store this document” or “give me documents in [ai_alignment, evaluation]” without caring about tables, joins, or engines.


🌳 Write → Remember → Improve

The interesting part is how the Memory Tool fits into Stephanie’s cognitive loop.

  1. When a new thought forms (as a Scorable):

    • The Scorable Processor attaches initial features (domains, entities, embeddings, scores).
    • It writes those features and artifacts to the appropriate stores via the Memory Tool.
    • A bus message may ask other workers to add deeper features (e.g. VPM, HRM trace scores).
  2. When context is needed:

    • Nexus asks the Memory Tool for similar Scorables via embeddings.
    • Agents pull relevant documents, past conversations, and hypotheses.
    • Trainers fetch past evaluations and traces to build datasets.
  3. When Stephanie learns from experience:

    • Successful reasoning paths are stored in PlanTraceStore.
    • Evaluation deltas and policy changes are recorded in dedicated stores.
    • MemCubes and cartridges distill frequently-used patterns.

Over time, the Memory Tool becomes Stephanie’s long-term self: not just what she has seen, but what she has believed, tested, and changed her mind about.


🏯 The Cognitive Architecture in Motion

    graph LR
    A["🏭 Scorable Processor"] -->|Enriched Thought| B["💾 Memory Tool"]
    B --> C["📄 Document Store"]
    B --> D["🔢 Embedding Store"]
    B --> E["🌍 Domain Store"]
    B --> F["💡 Hypothesis Store"]
    B --> G["💬 Conversation Store"]
    
    C -->|Text| H["🕸️ Nexus Graph"]
    D -->|Vectors| H
    E -->|Domains| H
    F -->|Past Insights| H
    G -->|Conversation History| H
    
    H -->|Visualized Thought| I["🎞️ VPM Filmstrip"]
    H -->|Graph Structure| J["📊 Thought Metrics"]
    
    I -->|Improvement Signals| K["⚙️ Thought Executor"]
    J -->|Quality Metrics| K
    
    K -->|Better Visual Operations| H
    K -->|Store Improved Patterns| B

    style A fill:#e45755,stroke:#333,stroke-width:2px,color:white
    style B fill:#72b7b2,stroke:#333,stroke-width:2px,color:white
    style C fill:#4c78a8,stroke:#333,stroke-width:2px,color:white
    style D fill:#f58518,stroke:#333,stroke-width:2px,color:white
    style E fill:#54a24b,stroke:#333,stroke-width:2px,color:white
    style F fill:#e45755,stroke:#333,stroke-width:2px,color:white
    style G fill:#9d67ae,stroke:#333,stroke-width:2px,color:white
    style H fill:#1f77b4,stroke:#333,stroke-width:2px,color:white
    style I fill:#ff7f0e,stroke:#333,stroke-width:2px,color:white
    style J fill:#2ca02c,stroke:#333,stroke-width:2px,color:white
    style K fill:#d62728,stroke:#333,stroke-width:2px,color:white

    classDef processor fill:#e45755,stroke:#333,color:white,stroke-width:2px;
    classDef memory fill:#72b7b2,stroke:#333,color:white,stroke-width:2px;
    classDef document fill:#4c78a8,stroke:#333,color:white,stroke-width:2px;
    classDef embedding fill:#f58518,stroke:#333,color:white,stroke-width:2px;
    classDef domain fill:#54a24b,stroke:#333,color:white,stroke-width:2px;
    classDef hypothesis fill:#e45755,stroke:#333,color:white,stroke-width:2px;
    classDef conversation fill:#9d67ae,stroke:#333,color:white,stroke-width:2px;
    classDef nexus fill:#1f77b4,stroke:#333,color:white,stroke-width:2px;
    classDef filmstrip fill:#ff7f0e,stroke:#333,color:white,stroke-width:2px;
    classDef metrics fill:#2ca02c,stroke:#333,color:white,stroke-width:2px;
    classDef executor fill:#d62728,stroke:#333,color:white,stroke-width:2px;

    class A processor
    class B memory
    class C document
    class D embedding
    class E domain
    class F hypothesis
    class G conversation
    class H nexus
    class I filmstrip
    class J metrics
    class K executor
  

You can read it as:

  • Write loop – processors and workers turn Scorables into features and push them into memory.
  • Read loop – Nexus and agents pull from memory to think about the current task.
  • Learning loop – outcomes feed back into memory, shifting policies and future feature extraction.

The Memory Tool is deliberately rigid compared to the rest of Stephanie. Pipelines, scorers, and workers can change rapidly; the Memory Tool stays stable enough that all of them can trust it as their shared long-term substrate.


And now we’re ready for the last piece of this layer: the message bus.

So far we’ve treated the Scorable Processor, feature workers, and Memory Tool as if they were all in one process. In reality, they’re spread across services. The bus is the nervous system that lets Scorables, feature updates, and provenance flow between them without losing the unified picture of a single mind.


📨 Why Stephanie Had to Become Event-Driven

By this point we’ve built most of Stephanie’s “anatomy”:

  • Scorables – atomic units of thought
  • Features – the intelligence layer that gives those thoughts meaning
  • The Memory Tool – long-term semantic memory
  • Nexus – working memory where thoughts connect and blossom

On paper, it’s a beautiful pipeline:

text → scorable → features → memory → Nexus

In practice, that model quietly smuggles in a bad assumption:

that thinking is linear.

🪤 The Procedural Trap

Our first Nexus prototypes treated Stephanie’s mind like a function:

  1. Take an input
  2. Run a pipeline
  3. Return a result

It works for demos. It completely fails for cognition.

Your own mind does not “await” one thought before you’re allowed another. One idea can:

  • remind you of an old conversation,
  • trigger a worry,
  • spark a plan for tomorrow,
  • and all of that happens while you’re still mid-sentence.

There is no global main(). There is a storm of events.

🏆 The Blossom Stress Test

Blossom exposed this weakness instantly.

In Blossom, each Nexus node can generate 6–7 related candidate thoughts. Run it on 10 nodes and you now have 60–70 new thoughts, many of which can themselves blossom.

The old architecture implied:

“Call Blossom. Wait until everything is generated, scored, stored, and wired into Nexus. Only then continue.”

That’s not how thinking works, and it’s not how we wanted Stephanie to behave.

We didn’t want a pipeline that uses Blossom. We wanted a habitat where Blossom episodes unfold in parallel, feeding back into memory, features, VPMs and Jitter as they go.

‼️ The Cognitive Imperative

At that point the requirement became obvious:

Stephanie cannot live on a call stack. She has to live on a network of events.

The message bus is what makes that possible. It gives us three non-negotiable properties:

  1. Asynchronous feature enrichment (ties to Features)

    • The Scorable Processor can publish scorable.created and move on.
    • Scorers, VPM builders, and quality engines listen, compute their features lazily, and later publish scorable.scored, vpm.created, risk.updated.
    • No thought has to “wait” to become useful.
  2. Location-independent memory (ties to Memory)

    • The Memory Tool doesn’t care who is updating documents, scorables, PlanTraces, or MemCubes.
    • It just reacts to events like memory.write.request or trace.completed, no matter which agent sent them or where they live.
  3. Centralized attention (ties to Jitter)

    • Jitter, the attention mechanism, doesn’t poll 50 services.
    • It subscribes to the bus and watches high-signal events: blossom.decision, scorable.promoted, risk.high, policy.drift.
    • From that stream it decides where the system should focus next.

This is the moment the architecture flips from:

“functions returning values”

to:

thoughts causing reactions”.

The message bus isn’t an implementation detail. It’s the fabric that lets Scorables, Features, Memory, Nexus, VPMs, and Jitter behave like parts of a single mind instead of isolated modules.


🪸 Stephanie’s Cognitive Nervous System

Now that we’ve built all the pieces, let’s see how they fit together into a complete cognitive architecture:

    
flowchart TB
    A["🏭 Scorable Processor"] -->|enriched features| B["💾 Memory Tool"]
    B --> C["📄 DocumentStore"]
    B --> D["🔢 EmbeddingStore(s)"]
    B --> E["🌍 Domain/Entity stores"]
    B --> F["💡 HypothesisStore"]
    B --> G["💬 ConversationStore"]
    B --> Hs["📑 PlanTrace / Evaluation / MemCube stores"]

    C --> N["🕸️ Nexus Graph"]
    D --> N
    E --> N
    F --> N
    G --> N
    Hs --> N

    N --> V["🎞️ VPM / visual layer"]
    N --> Q["📊 Quality & policy metrics"]

    V --> X["⚙️ Executors & agents"]
    Q --> X

    X -->|improvement signals| Hs
    X -->|new scorables| A

    A -->|publish| BUS["📡 Message Bus"]
    BUS -->|subscribe| N
    BUS -->|subscribe| V
    BUS -->|subscribe| X
    BUS -->|subscribe| Hs

    classDef processor fill:#e45755,stroke:#333,color:white,stroke-width:2px;
    classDef memory fill:#72b7b2,stroke:#333,color:white,stroke-width:2px;
    classDef nexus fill:#1f77b4,stroke:#333,color:white,stroke-width:2px;
    classDef bus fill:#ff7f0e,stroke:#333,color:white,stroke-width:2px;
    classDef exec fill:#d62728,stroke:#333,color:white,stroke-width:2px;

    class A processor
    class B memory
    class N nexus
    class BUS bus
    class X exec
  

To build that fabric, we gave Stephanie a nervous system in two layers:

  1. A tiny ZMQ broker – the heart that just pumps messages.
  2. A ZmqKnowledgeBus – the language of thought that agents use.

🧬 The ZMQ Broker: One Heartbeat, Many Thoughts

The broker is deliberately dumb:

  • It doesn’t know what a scorable is.
  • It doesn’t know about Nexus, Blossom, VPMs, or Jitter.
  • It only knows how to take messages from one socket and forward them to others.

👉 Full Code Here

We implement it with ZeroMQ’s ROUTER/DEALER pattern:

    flowchart LR
    C["🧠 Clients\n(publish & request)"] --> FE["↔️ Broker Frontend\nROUTER"]
    FE --> BE["↔️ Broker Backend\nDEALER"]
    BE --> W["🛠 Workers\n(subscribe & handle)"]
  

A helper (ZmqBrokerGuard) makes sure:

  • there is exactly one broker per process,
  • it can run detached (so multiple pipelines share the same nervous system),
  • it can auto-shutdown after a period of silence in dev.

You start the “brainstem” once, and then everything else just fires events.

from stephanie.services.bus.zmq_broker import ZmqBrokerGuard

await ZmqBrokerGuard.ensure_started(
    detached=True,
    idle_seconds=300,  # auto-shutdown after 5 minutes of silence (dev)
)

After that, no agent talks to another agent directly. They all talk to the bus.

📱 ZmqKnowledgeBus: The Language of Thought

Agents don’t deal with raw sockets. They use ZmqKnowledgeBus, a simple, subject-based API:

  • publish(subject, payload) – fire-and-forget thought spike.
  • subscribe(subject, handler) – react when a matching thought appears.
  • request(subject, payload) – use only when you really need an answer.

Every message looks like:

{"subject": "scorable.created", "payload": { ... }}

👉 Full Code Here

📡 Publishing: Firing a Thought

Here’s what it looks like when a new thought is broadcast:

from stephanie.services.bus.zmq_knowledge_bus import ZmqKnowledgeBus

async def share_thought():
    bus = ZmqKnowledgeBus()
    await bus.connect()

    await bus.publish(
        subject="thoughts.new",
        payload={
            "content": "Maybe the key to intelligence is learning from its own mistakes.",
            "source": "user_input",
            "priority": "high",
        },
    )
    # The caller keeps going; no waiting for replies.

The publisher doesn’t know who will react:

  • Nexus might pull it into the graph.
  • Feature scorers might schedule enrichment.
  • Memory may store it.
  • Jitter may decide it’s important and focus the system around it.

One spike, many reflexes.

👂 Subscribing: Building Cognitive Reflexes

On the other side, agents declare what they care about:

async def start_thought_processor():
    bus = ZmqKnowledgeBus()
    await bus.connect()

    async def handle_new_thought(msg: dict) -> None:
        content = msg.get("content", "")
        print(f"🔗 Processing thought: {content[:60]}...")

        # Simple reflex: spawn an exploration if certain words appear
        if "mistakes" in content.lower():
            await bus.publish(
                subject="thoughts.explore",
                payload={
                    "content": "What does it mean for Stephanie to recognize her own mistakes?",
                    "parent": content,
                },
            )

    await bus.subscribe("thoughts.new", handle_new_thought)

A few important details:

  • We use NATS-style wildcards:

    • thoughts.*thoughts.new, thoughts.explore, …
    • blossom.episode.> → every event in a Blossom episode.
    • scorable.* → the whole scorable lifecycle.
  • The bus normalizes payloads so handlers always receive a dict.

This is how we turn raw events into cognitive reflexes. “Whenever this kind of thought happens, these parts of the brain react.”

🌱 A Tiny Example of Cognitive Growth

Here’s one concrete pattern that runs entirely over the bus:

  1. A scorable is created and loosely scored:

    await bus.publish("scorable.scored", {
        "id": "turn_287",
        "metrics": {"clarity": 0.92, "faithfulness": 0.45},
        "context": {"goal": "debug code"},
    })
    
  2. Multiple agents react in parallel:

    • Nexus notes: “clear but unfaithful in debugging”.

    • A learning agent subscribes to scorable.scored and publishes:

      await bus.publish("learning.need", {
          "pattern": "high_clarity_low_faithfulness",
          "context": "debugging",
          "example_id": "turn_287",
      })
      
    • Jitter hears learning.need and focuses future episodes on similar cases.

  3. Over time, that pattern gets encoded into a policy:

    • “If clarity is high and faithfulness is low in debugging, trigger extra verification.”

None of this is a single monolithic function. It’s a conversation across the brain, all riding on subjects.

🔗 How This Sets Up the Nexus Agents

With the bus in place, everything we’ve built so far finally clicks:

  • Scorable Processor emits events about new or updated thoughts.
  • Feature engines and scorers enrich those thoughts asynchronously.
  • Memory Tool stores and retrieves the artifacts.
  • Nexus watches the stream to grow and reshape the thought graph.
  • VPM / ZeroModel listen in to build visual policy maps.
  • Jitter hovers over all of it, subscribing to the most important signals.

In the next section, we’ll meet the Nexus agents themselves: a small set of specialized workers that ride this bus, pull from Memory, push into Nexus, trigger Blossom, and cooperate to keep Stephanie’s mind growing.

They are where all of this infrastructure – Scorables, Features, Memory, Bus – finally shows up as visible behaviour: a brain-shaped system that can explore, remember, and improve the way it thinks.


🌱 How the Knowledge Bus Powers Nexus & Blossom

In the last section we built the event-driven nervous system itself. Now we can finally watch what it does to Nexus and Blossom when a single thought enters the brain.

  • When a new candidate answer is generated: scorable.created with the text, parent, and goal.

  • When Blossom explores a forest: blossom.episode_started, blossom.node_added, blossom.decision, blossom.episode_finished.

  • When prompt work is offloaded and completed: prompts.submitresults.prompts.{job_id}.

  • When Nexus updates the graph: nexus.node_added, nexus.edge_added, nexus.promoted.

Different agents subscribe to the pieces they care about:

  • A telemetry agent writes blossom events to garden_events.jsonl.
  • A VPM agent converts those events to visual tiles and filmstrips.
  • A training agent listens for blossom.decision to create pairwise examples.
  • A Jitter watcher listens for scorable + Nexus events to decide where to “look” next.

No one is hard-wired to anyone else. They just share a language of subjects.

That’s the whole point of the Knowledge Bus: give every part of Stephanie a simple, consistent way to send and receive cognitive events, so the system can grow into a genuine thought network instead of a long, fragile call stack.


♻️ Cache Service: Remembering Expensive Thoughts

Once you have an event-driven brain and a Knowledge Bus, the next problem shows up immediately:

“Didn’t we already think this thought?”

Things like LLM generations, scorer calls, and VPM encodings are expensive. In an event system, the same kind of request can be triggered again and again from different places in the graph. Recomputing everything makes the system feel sluggish and wasteful.

That’s why we built the ZmqCacheService: a small service that sits beside the 0MQ bus and acts as shared cognitive memory for expensive calls.

  • L1: an in-memory TTL cache (fast hits during a run).
  • L2: a persistent store (e.g. SQLite/Postgres via CacheStore) so results survive restarts.

👉 Full Code Here

Keys are derived from the subject + payload, so identical requests map to the same answer. If ten agents ask for the same llm.generate with the same prompt, only one actually hits the model; the rest get a cached response.

The nice bit is how it plugs into the bus: as middleware around request:

from stephanie.services.cache.zmq_cache_service import ZmqCacheService
from stephanie.services.bus.zmq_knowledge_bus import ZmqKnowledgeBus

bus = ZmqKnowledgeBus()
await bus.connect()

cache = ZmqCacheService(cfg={}, memory=None, logger=None)
cache.attach_to_bus(bus)   # wraps bus.request with caching

# From here on, this call is transparently cached:
resp = await bus.request("llm.generate", {"prompt": "Explain Nexus in one sentence."})

You don’t change your agents; you just attach the cache once.

From Nexus’s point of view, this turns repeated work into recall instead of recomputation. The system starts to feel less like it’s re-running pipelines and more like it’s remembering what it already figured out which is exactly the behavior we want from a thinking graph.


➰ A Thought in Motion: From Event to Nexus

We’ve built the pieces:

  • a broker (heartbeat),
  • a Knowledge Bus (how thoughts flow),
  • and a Cache Service (so we don’t re-think the same thing twice).

Now let’s walk through what actually happens when a single thought enters Stephanie and grows into something real in Nexus.

We’ll follow one question:

“How can we model human cognition in software?”

from the moment it appears, through Blossom, scoring, Nexus, and back into memory.


🌱 1. The Seed: A Thought Enters the Brain

A user types a question, or an upstream agent emits one. We wrap it as a Scorable and broadcast it as a new thought on the bus:

from stephanie.services.bus.zmq_knowledge_bus import ZmqKnowledgeBus

async def emit_seed_thought(content: str) -> None:
    bus = ZmqKnowledgeBus()
    await bus.connect()

    await bus.publish(
        subject="thoughts.new",
        payload={
            "content": content,
            "source": "user_input",
            "priority": "high",
        },
    )
    print(f"💡 Seed thought broadcast: {content[:60]}...")

At this point, nothing else is hard-wired. We’ve just said: “Here is a thought. Whoever cares, react.”


💪 2. Reflexes: Processors Wake Up and Enrich It

Several agents are subscribed to thoughts.*. One of them is a thought processor that turns raw content into a Scorable and queues it for Blossom:

async def start_thought_processor():
    bus = ZmqKnowledgeBus()
    await bus.connect()

    async def handle_new_thought(msg: dict) -> None:
        content = msg.get("content", "")
        print(f"🔗 Processing thought: {content[:60]}...")

        # 1) Build a Scorable (domains, entities, embeddings, etc.)
        scorable_id = await build_scorable_from_text(content)

        # 2) Ask Blossom to expand this thought
        await bus.publish(
            subject="cognition.blossom.request",
            payload={
                "scorable_id": scorable_id,
                "content": content,
                "depth": 2,
                "variants": 6,
            },
        )

    await bus.subscribe("thoughts.new", handle_new_thought)
    print("🧠 Thought processor online")

Now one event has triggered enrichment and a Blossom request. Other agents (telemetry, VPM, logging) can react to the same thoughts.new without us writing any extra wiring.


🌺 3. Blossom: Expanding the Thought (With Caching)

Somewhere else, Blossom workers listen for cognition.blossom.request. They fan out alternative continuations and return their best ideas.

Under the hood, they usually call an LLM through a cached bus.request, so repeated requests don’t cost us twice:

async def blossom_worker():
    bus = ZmqKnowledgeBus()
    await bus.connect()

    async def handle_blossom_request(msg: dict) -> None:
        content  = msg["content"]
        depth    = msg.get("depth", 2)
        variants = msg.get("variants", 6)
        scorable_id = msg["scorable_id"]

        print(f"🌱 Blossoming scorable {scorable_id}: {content[:40]}...")

        # Cached RPC to the LLM-backed blossom service
        response = await bus.request(
            subject="llm.cognition.blossom",
            payload={
                "scorable_id": scorable_id,
                "content": content,
                "depth": depth,
                "variants": variants,
            },
            timeout=30.0,  # soft limit – bus keeps flowing
        )

        # Emit the expanded thoughts as an event
        await bus.publish(
            subject="cognition.blossom.result",
            payload={
                "scorable_id": scorable_id,
                "blossoms": response.get("blossoms", []),
            },
        )

    await bus.subscribe("cognition.blossom.request", handle_blossom_request)
    print("🌸 Blossom worker ready")

Because the Cache Service is attached as middleware to bus.request, identical blossom calls (same subject + payload) hit cache instead of the model. Cognitively: “I’ve thought this through before; I remember the result.”


💯 4. Scoring & Promotion: Choosing the Best Branch

Next, a Nexus improver agent listens for cognition.blossom.result. It takes the blossoms, turns them into Scorables, scores them, and promotes the winner into the graph:

async def nexus_improver():
    bus = ZmqKnowledgeBus()
    await bus.connect()

    async def handle_blossom_result(msg: dict) -> None:
        scorable_id = msg["scorable_id"]
        blossoms    = msg.get("blossoms", [])

        print(f"🌺 Evaluating {len(blossoms)} blossoms for {scorable_id}")

        # 1) Turn each blossom into a Scorable child
        children = await build_scorables_for_blossoms(scorable_id, blossoms)

        # 2) Score candidates per dimension (clarity, faithfulness, etc.)
        scored = await score_candidates(children)

        # 3) Pick a winner with margin
        winner = select_winner(scored, promote_margin=0.02)
        if not winner:
            return

        # 4) Update the Nexus graph
        await nexus_add_candidates_and_promote(
            parent_id=scorable_id,
            children=scored,
            winner_id=winner["id"],
        )

        # 5) Emit a decision event for telemetry / training
        await bus.publish(
            subject="nexus.decision",
            payload={
                "parent_id": scorable_id,
                "winner_id": winner["id"],
                "lift": winner["lift"],
            },
        )

    await bus.subscribe("cognition.blossom.result", handle_blossom_result)
    print("🕸️ Nexus improver online")

Here a single blossom result event becomes:

  • new nodes in Nexus,
  • a reinforced edge (habit) from parent → winner,
  • and fresh training signal for later (SICQL, HRM, Jitter, etc.) via nexus.decision.

This is the exact moment where a raw thought becomes a stable piece of knowledge.


📹 5. Recording the Memory: Nexus + VPM

When the improver updates the graph, a separate telemetry/VPM agent is listening:

  • It receives nexus.decision and the underlying garden_events.

  • It writes to garden_events.jsonl and nexus_improver_report.json.

  • It renders VPM tiles and filmstrips that visually show:

    • parent VPM at the center,
    • each blossom candidate as a petal,
    • the promoted one highlighted.

From the outside, it looks like a flower of thought forming. From the inside, it’s just event handlers doing their jobs.


🛄 6. Putting It All Together: A Single Thought’s Journey

Here’s a simplified orchestration you can imagine for a demo run:

import asyncio
from stephanie.services.bus.zmq_broker import ZmqBrokerGuard

async def run_demo():
    # 1) Start the heartbeat (broker)
    await ZmqBrokerGuard.ensure_started(detached=True)

    # 2) Start background agents (processors, blossom, nexus)
    tasks = [
        asyncio.create_task(start_thought_processor()),
        asyncio.create_task(blossom_worker()),
        asyncio.create_task(nexus_improver()),
    ]

    # 3) Emit a seed thought
    main_thought = "How can we model human cognition in software?"
    await emit_seed_thought(main_thought)

    # 4) Let the system think for a while
    await asyncio.sleep(5.0)
    print("🧠 Cognitive process continues in the background...")

    # In a real run you wouldn't cancel; here we just tidy up the demo:
    for t in tasks:
        t.cancel()

asyncio.run(run_demo())

Notice what doesn’t happen here:

  • We never call a “main orchestrator” function.
  • We never block on a single monolithic workflow.
  • We never wire components together directly.

We just:

  1. Start the heartbeat,
  2. Start some reflexes,
  3. Emit a thought,
  4. Let the network do the rest.

🚶 Sequence of a thought

The story of thought in a sequence:

    sequenceDiagram
    participant U as 🧍 User
    participant TP as 🔧 ThoughtProcessor
    participant BW as 🌸 BlossomWorker
    participant SC as 📊 ScoringService
    participant NX as 🕸️ Nexus
    participant VPM as 🎨 VPM Agent
    participant BUS as 🛰️ Knowledge Bus

    Note over U,VPM: 🌟 Cognitive Blossom Workflow

    %% Seed thought arrives
    rect rgb(240,240,240)
        U->>BUS: 📨 publish("thoughts.new", content)
        activate BUS
        BUS-->>TP: 🎯 thoughts.new
        deactivate BUS
    end

    %% Thought is turned into a Scorable + blossom request
    rect rgb(240,255,240)
        activate TP
        TP->>TP: 🏗️ build Scorable<br/>🌍 domains, 🔍 entities, 🔢 embeddings
        TP->>BUS: 📤 publish("cognition.blossom.request", scorable_id, content)
        deactivate TP
    end

    %% Blossom worker expands the thought (with cache)
    rect rgb(255,255,240)
        activate BUS
        BUS-->>BW: 🌸 cognition.blossom.request
        deactivate BUS

        activate BW
        BW->>BUS: 🤔 request("llm.cognition.blossom", payload)
        activate BUS
        Note right of BUS: 💾 Cached Intelligence
        BUS-->>BW: 🌺 blossoms (cached or fresh)
        deactivate BUS
        BW->>BUS: 📤 publish("cognition.blossom.result", blossoms)
        deactivate BW
    end

    %% Scoring + Nexus promotion
    rect rgb(240,240,255)
        activate SC
        BUS-->>SC: 📊 cognition.blossom.result (via NexusImprover)
        SC->>SC: ⚖️ score candidates<br/>per dimension
        SC->>NX: ➕ add nodes + edges<br/>🏆 promote winner
        activate NX
        NX-->>BUS: 🎯 publish("nexus.decision", parent_id, winner_id, lift)
        deactivate NX
        deactivate SC
    end

    %% VPM renders the visual blossom
    rect rgb(255,240,240)
        activate VPM
        BUS-->>VPM: 🎨 nexus.decision
        VPM->>VPM: 🎬 render tiles & filmstrip
        VPM-->>U: 🌸 visual blossom<br/>(center + petals)
        deactivate VPM
    end
  

This is what “a thought in action” looks like in Stephanie:

  • It enters as a single event,
  • It wakes up multiple agents,
  • It blossoms into alternatives,
  • It gets scored, promoted, and visualized,
  • And it leaves behind a graph node, a habit edge, and visual memory.

That’s how a thought becomes real in Nexus.

Initial Graph Expanded Blossoms Final Consolidated Graph
Small model comparison HRM vs Tiny 500 samples HRM vs Tiny 1000 samples
Size: 10 nodes, few edges; several nodes are only weakly connected. Size: ≈200 nodes (10 seeds + ~200 blossoms), many short paths and overlapping clusters. Size: 10 nodes again, but every node has survived a local competition and “earned” its place.
Process state: raw seeds only, no blossom runs yet. Process state: Blossom + scoring have explored alternative continuations around each seed. Process state: winners per seed are promoted in Nexus, losers pruned, edges updated and reinforced.
Interpretation: “What we started with.” Interpretation: “All the possibilities we considered” Prestiege: “The best of our knowledge.”

🪞 Mirroring your mind

This event-driven architecture mirrors how our minds actually work:

  • Thoughts don’t wait - Like our blossom_thought function, human cognition doesn’t block waiting for results
  • Associative connections - Our wildcard subscriptions (thoughts.*) mimic how one thought triggers related concepts
  • Parallel processing - Multiple thought processors can run simultaneously
  • No central controller - Like the human brain, there’s no “main thread” coordinating everything

🤔 Thought Architecture Diagram

    graph TD
    A["🧠 Human Thought"] -->|🎯 Initiates| B["⚡ Thought Atom"]
    
    subgraph "🔄 Cognitive Nervous System"
        B -->|📨 Publish| C["🔄 ZMQ Broker<br/>📍 Frontend: tcp://127.0.0.1:5555<br/>🎭 ROUTER Socket"]
        
        C -->|🛣️ Routes| D["🔧 Thought Processor 1<br/>👷 Worker<br/>🎭 DEALER Socket"]
        C -->|🛣️ Routes| E["🔧 Thought Processor 2<br/>👷 Worker<br/>🎭 DEALER Socket"]
        C -->|🛣️ Routes| F["🌸 Blossom Engine<br/>👷 Worker<br/>🎭 DEALER Socket"]
        
        D -->|📤 Publish| C
        E -->|📤 Publish| C
        F -->|📤 Publish| C
        
        C -->|📬 Delivers| G["👁️ Visualization System"]
        C -->|📬 Delivers| H["📊 Measurement System"]
    end
    
    G --> I["🎬 VPM Filmstrip"]
    H --> J["📈 Quality Metrics"]
    
    I --> K["💡 Improved Understanding"]
    J --> K

    %% Styling
    style A fill:#8A2BE2,stroke:#333,stroke-width:3px,color:#fff
    style B fill:#4169E1,stroke:#333,stroke-width:2px,color:#fff
    style C fill:#FF4500,stroke:#333,stroke-width:2px,color:#fff
    style D fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
    style E fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
    style F fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
    style G fill:#FF8C00,stroke:#333,stroke-width:2px,color:#fff
    style H fill:#FF8C00,stroke:#333,stroke-width:2px,color:#fff
    style I fill:#1E90FF,stroke:#333,stroke-width:2px,color:#fff
    style J fill:#1E90FF,stroke:#333,stroke-width:2px,color:#fff
    style K fill:#228B22,stroke:#333,stroke-width:3px,color:#fff
    
    %% Subgraph styling
    style subgraph1 fill:#F0F8FF,stroke:#333,stroke-width:2px,color:#000
    
    %% Edge styling
    linkStyle 0 stroke:#8A2BE2,stroke-width:2px,color:red
    linkStyle 1 stroke:#FF4500,stroke-width:2px
    linkStyle 2 stroke:#FF4500,stroke-width:2px
    linkStyle 3 stroke:#FF4500,stroke-width:2px
    linkStyle 4 stroke:#32CD32,stroke-width:2px
    linkStyle 5 stroke:#32CD32,stroke-width:2px
    linkStyle 6 stroke:#32CD32,stroke-width:2px
    linkStyle 7 stroke:#FF8C00,stroke-width:2px
    linkStyle 8 stroke:#FF8C00,stroke-width:2px
    linkStyle 9 stroke:#1E90FF,stroke-width:2px
    linkStyle 10 stroke:#1E90FF,stroke-width:2px
    linkStyle 11 stroke:#228B22,stroke-width:2px
    linkStyle 12 stroke:#228B22,stroke-width:2px    
  

➿ How the Message Flow Works

  1. Thought Initiation: A seed thought enters the system (like a neuron firing)
  2. Broker Routing: The ZMQ broker acts as the central switchboard:
    • Frontend (5555): Receives thoughts from any source
    • Backend (5556): Distributes thoughts to relevant processors
  3. Cognitive Processing: Multiple thought processors work in parallel:
    • Some refine the thought (improvement)
    • Some create related thoughts (blossoming)
    • Some measure quality (evaluation)
  4. Emergent Intelligence: Processed thoughts feed back into the system, creating a cognitive cascade

📮 The Cognitive Advantage

Unlike procedural systems where “A → B → C”, our bus enables:

    flowchart TD
    A["✨ Thought A"] --> B["💡 Thought B"]
    A --> C["💡 Thought C"] 
    A --> D["💡 Thought D"]
    A --> E["💡 Thought E"]
    A --> F["💡 Thought F"]
    A --> G["💡 Thought G"]
    A --> H["💡 Thought H"]
    
    B --> I["🌟 Insight I"]
    C --> I
    D --> I
    
    I --> J["🔍 Analysis J"]
    I --> K["🔍 Analysis K"]
    
    E --> L["💫 Idea L"]
    F --> L
    G --> L
    
    L --> M["🎯 Conclusion M"]
    L --> N["🎯 Conclusion N"] 
    L --> O["🎯 Conclusion O"]
    
    J --> Final["🏆 Final Insight"]
    K --> Final
    M --> Final
    N --> Final
    O --> Final

    %% Styling
    classDef seed fill:#8A2BE2,stroke:#333,stroke-width:3px,color:#fff,font-weight:bold
    classDef thought fill:#4169E1,stroke:#333,stroke-width:2px,color:#fff
    classDef insight fill:#32CD32,stroke:#333,stroke-width:2px,color:#000
    classDef analysis fill:#FFA500,stroke:#333,stroke-width:2px,color:#000
    classDef conclusion fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff
    classDef final fill:#228B22,stroke:#333,stroke-width:3px,color:#fff,font-weight:bold
    
    class A seed
    class B,C,D,E,F,G,H thought
    class I,L insight
    class J,K analysis
    class M,N,O conclusion
    class Final final
    
    %% Edge styling to show cognitive flow
    linkStyle default stroke:#666,stroke-width:2px
    linkStyle 0,1,2,3,4,5,6 stroke:#4169E1,stroke-width:2px
    linkStyle 7,8,9,10,11,12 stroke:#32CD32,stroke-width:2px
    linkStyle 13,14,15,16,17 stroke:#FFA500,stroke-width:2px
    linkStyle 18,19,20,21 stroke:#9B59B6,stroke-width:2px
  

This non-linear, branching structure is precisely how human cognition works - thoughts sparking other thoughts in unpredictable but meaningful ways. The event bus isn’t just a technical implementation detail; it’s the foundation that allows us to model cognition as it actually exists in biological systems.

When you watch our VPM filmstrips showing thoughts blossoming and connecting, you’re seeing this event-driven cognitive process visualized - the closest we’ve come to watching a digital brain think in real time.

In our previous sections, we built the components of Stephanie’s cognitive architecture: the Scorable as the atomic unit of thought, the Scorable Processor to enrich thoughts with meaning, the Memory Tool to store knowledge, and the event bus to let thoughts flow like a nervous system. Next we’ll see how these pieces combine into something transformative: an agent that thinks visually. Ready comic


🧩 We Remember Fragments, Not Books

Think about how you actually remember a great book. You don’t store every page. You keep a handful of high-impact fragments – a scene, a sentence, a small constellation of ideas – and your mind jumps between those anchors when you need them.

That’s the bet we’re making here.

In Stephanie, we turn information into VPM tiles: compact, visual fragments that capture the shape of what matters – alignment, clarity, evidence, connections – without dragging the entire source along for every step. These tiles are not fuzzy screenshots of the data; they’re purposeful summaries with handles. Whenever we need to go deeper, we can still pull back the full original text, embeddings, features, provenance – everything.

Why bother? Because thinking is traversal. The brain moves by hopping across remembered fragments, composing them into new structure, checking a detail, then leaping again. Tiles make those hops fast, comparable, and trainable. They let us see reasoning quality, score it, and improve it visibly.

Now that we have Scorables, the Scorable Processor, an event-driven bus, and VPM tiles, we can do something new:

Instead of thinking about data, we can think inside its visual representation.


🎛 Enter the First Nexus Agent: Thinking in Tiles

This first Nexus agent exists for one core reason:

We want to prove that we can turn raw thoughts into visual objects, and then use those visuals to think better.

Everything else is in service of that.

It takes a seed Scorable, compresses it into one or more VPM tiles, and then stops treating it as “rows in a table” entirely. From that point on, the agent lives inside the image: it reads structure, tweaks focus, cleans up the pattern, and lets those visual improvements drive what happens next.

Concretely, it runs as a small, specialized state machine cycling through a tight loop:

  1. Perceive Generate the VPM tile(s) that best expose the task’s dimensions, using ZeroModel as the visual workspace instead of raw text.

  2. Assess Read structural signals in the tile: separability, symmetry, bridges, coherence, outliers. In other words: does this picture of the thought make sense?

  3. Adjust Apply visual “thought ops” – ZOOM, BBOX, PATH, HIGHLIGHT – to clarify what matters. These aren’t arbitrary image tricks; they’re cognitive moves:

    • ZOOM: focus on a sub-region of the reasoning
    • BBOX: isolate a cluster of related concepts
    • PATH: trace a candidate reasoning route
    • HIGHLIGHT: amplify key evidence or risky zones
  4. Verify If the tile still looks messy or ambiguous, the agent reaches back through the bus to the full source (text, features, provenance) for a deeper check. If the tile looks clean and confident, it stays in visual space and continues.

  5. Record Write the successful sequence of tile operations and the resulting quality change into Nexus, so useful routes become reinforced habits: “When a tile looks like this, and we apply these ops, we tend to get good outcomes.”

    flowchart LR
    %% ==================== NODES ====================
    P["🎯 1. PERCEIVE"] --> A["📊 2. ASSESS"]
    A --> J["🎨 3. ADJUST"]
    J --> V{"🔍 4. VERIFY"}
    V -->|"✅ High Confidence"| R["💾 5. RECORD"]
    V -->|"❌ Low Confidence"| S["🔄 Reach Back"]
    S --> P
    R --> H{"🏆 Improved?"}
    H -->|"🟢 Yes"| Next["🎯 Next Thought"]
    H -->|"🟡 No"| P
    
    %% ==================== STYLING ====================
    classDef perceive fill:#98FB98,stroke:#333,stroke-width:2px,color:#000,font-weight:bold
    classDef assess fill:#87CEEB,stroke:#333,stroke-width:2px,color:#000,font-weight:bold
    classDef adjust fill:#FFA500,stroke:#333,stroke-width:2px,color:#000,font-weight:bold
    classDef verify fill:#FFD700,stroke:#333,stroke-width:3px,color:#000,font-weight:bold
    classDef record fill:#9B59B6,stroke:#333,stroke-width:2px,color:#fff,font-weight:bold
    classDef source fill:#E74C3C,stroke:#333,stroke-width:2px,color:#fff,font-weight:bold
    classDef decision fill:#2ECC71,stroke:#333,stroke-width:3px,color:#000,font-weight:bold
    classDef next fill:#34495E,stroke:#333,stroke-width:2px,color:#fff,font-weight:bold
    
    class P perceive
    class A assess
    class J adjust
    class V verify
    class R record
    class S source
    class H decision
    class Next next
    
    %% ==================== EDGE STYLING ====================
    linkStyle default stroke:#666,stroke-width:2px
    linkStyle 0 stroke:#98FB98,stroke-width:2px
    linkStyle 1 stroke:#87CEEB,stroke-width:2px
    linkStyle 2 stroke:#FFA500,stroke-width:2px
    linkStyle 3 stroke:#2ECC71,stroke-width:3px
    linkStyle 4 stroke:#E74C3C,stroke-width:3px
    linkStyle 5 stroke:#E74C3C,stroke-width:2px
    linkStyle 6 stroke:#9B59B6,stroke-width:2px
    linkStyle 7 stroke:#2ECC71,stroke-width:3px
    linkStyle 8 stroke:#FFA500,stroke-width:2px
  

Over iterations, the agent doesn’t just spit out an answer. It shapes a clearer thought. The VPM tile becomes crisper, the path through tiles becomes shorter, and Nexus remembers which visual routes paid off. That’s how Stephanie starts to think the way you do: fragments first, full context on demand, improvement as a habit.


⭕ Visualizing Thought Itself

Most AI systems still think invisibly: text in, text out, and “reasoning” buried somewhere in a hidden layer or a prompt log. Even when they expose chain-of-thought, it’s still just more text.

This agent does something different. It treats thoughts as visual objects that can be examined, enhanced, and evolved.

Our breakthrough insight is simple:

If we can represent thoughts visually, we can see the quality of reasoning, spot weaknesses, and actively improve them.

The first Nexus agent is our proof-of-concept for that claim. It shows that:

  • Scorables can be turned into stable visual fragments (VPM tiles),
  • those fragments can be edited and steered by code,
  • and those edits can directly improve how Stephanie reasons on real tasks.

When you and I reason, you don’t replay full books in your head. You move through a shifting cloud of fragments, tensions, alignments, and half-formed shapes until something finally clicks. This agent is our first programmable version of that feeling: thoughts as tiles, tiles as a workspace, and visual refinement as a way to steer cognition.

In the next section, we’ll make that concrete by walking through the agent itself its state machine, the events it listens to, and how each step turns a noisy tile into a sharper, more useful piece of thought.


You’re right, this section is already doing a lot of heavy lifting. The good news: the code and the prose line up very well – we don’t need surgery, just a bit of sharpening and one small consistency fix around the accept policy.

Below is a lightly tightened version of your VPM Thought Refiner section that:

  • Keeps your structure and voice
  • Makes the “Visual AI demo” role explicit up top
  • Aligns the _accept snippet with the actual code (tiny negative deltas tolerated, real regressions rejected)
  • Cleans up a couple of small phrasing glitches

You can drop this in as a replacement if you like.


⛽ Agent 01: The VPM Thought Refiner: Where Thoughts Become Visible

🎨 From Abstract Metrics to Visible Cognition

What if you could see thinking happen? Not just the output of thinking, but the thinking process itself.

That’s exactly what the VPM Thought Refiner is for. It’s the first Nexus agent whose whole job is to prove that Visual AI is useful: take real scorables, turn them into VPMs, and then operate directly on those visual structures to make the thought better.

👉 Full Code Here

This isn’t visualization for visualization’s sake. It’s the first concrete step toward AI that doesn’t just produce answers, but thinks visibly where improvement isn’t assumed but measured and documented.

This is where the “fragments, not books” principle becomes operational. The VPM Thought Refiner doesn’t work with complete thoughts it isolates and manipulates the high-impact fragments within a thought, exactly as humans do. When it applies a ZOOM operation, it’s not magnifying random pixels it’s focusing on the cognitive equivalent of “that scene in the book that changed everything.” This is cognition through fragments, not through comprehensive text processing.


🍎 The Core Insight: Thoughts Are Visual Structures

Most AI systems treat cognition as a black box: input text, output text. Human thinking isn’t like that. It’s spatial, associative, and often pre-verbal. When you have an insight, it doesn’t arrive as a polished sentence; it arrives as a shape a constellation of related concepts suddenly snapping into place.

The VPM (Visual Policy Map) is that shape made visible:

# Convert a scorable into a visual thought structure
vpm_chw, adapter_meta = await self.zm.vpm_from_scorable(
    seed_scorable,
    metrics_values=metrics_values,
    metrics_columns=metrics_columns,
)

Unlike a regular image, a VPM encodes cognitive structure:

🔴 Channel 0 (Red) – node / Concept density - Where meaning concentrates (brighter = more semantic weight) 🍈 Channel 1 (Green) – edge / Connection strength - How ideas relate to each other (brighter = stronger connections) 🔵 Channel 2 (Blue) – centrality / “heat” Concept importance - Centrality in the thought (brighter = more foundational)

This turns abstract reasoning signals into something we can see, measure, and improve.


🏥 The Thought Refinement Process: Cognitive Surgery in Action

The VPM Refiner doesn’t just generate an image and stop. It performs “cognitive surgery”: a structured sequence of visual operations that clean up, de-bias, and focus the thought.

1. Bootstrap – Build the “Goodness” Channel

"bootstrap": [
    {"type": "logic", "params": {"op": "NOT", "a": ("map", "uncertainty"), "dst": 1}},
    {"type": "logic", "params": {"op": "AND", "a": ("map", "quality"), "b": ("channel", 1), "dst": 0}},
    {"type": "logic", "params": {"op": "AND", "a": ("map", "novelty"), "b": ("channel", 1), "dst": 2}},
    {"type": "logic", "params": {"op": "OR",  "a": ("channel", 0), "b": ("channel", 2), "dst": 0}},
]

Here the agent creates a “goodness channel” a visual representation of high-quality, novel thinking by combining quality and novelty while inverting uncertainty. This gives it a single, bright scaffold of where promising thought mass lives.

2. Debias – Remove Cognitive Distortions

"debias": [
    {"type": "logic", "params": {"op": "SUB", "a": ("channel", 0), "b": ("map", "risk"), "dst": 0, "blend": 0.35}}
]

This is a visual anti-bias pass: subtracting risk from the goodness channel, but only partially (blend=0.35). Useful caution is preserved; paralyzing fear is softened.

3. Bridge Tame – Fix Over-Connected Regions

"bridge_tame": [
    {"type": "logic", "params": {"op": "SUB", "a": ("channel", 0), "b": ("map", "bridge"), "dst": 0, "blend": 0.50}}
]

“Bridge” regions are thin, overloaded connections between clusters the places where concepts blur into each other. This step identifies those bottlenecks and gently reduces their weight, making clusters cleaner and more separable.

4. Zoom Focus – Direct Attention Where It Matters

# Find where the cognitive "weight" actually lives
cy, cx = self._attention_centroid(state.X[0], top_k_ratio=0.05)
VisualThoughtOp(
    VisualThoughtType.ZOOM,
    {"center": (int(cx), int(cy)), "scale": 2.0},
)

Instead of arbitrarily zooming, the agent computes an attention centroid over the top 5% of the node channel. It then zooms around that center, mimicking how humans instinctively focus on the most salient part of a thought.


🤕 The Safety System: Only Keep Changes That Don’t Hurt Thinking

Crucially, the agent doesn’t accept every operation. Each sequence is evaluated with an accept policy that considers both utility gain and operation cost:

def _accept(self, delta: float, cost: float, bcs: float) -> bool:
    """Determine if a thought operation should be accepted."""
    policy = self.cfg.accept_policy
    eps = self.cfg.accept_eps

    if policy == "delta":
        return (delta is not None) and (float(delta) >= -eps)
    elif policy == "bcs":
        return (bcs is not None) and (float(bcs) >= -eps)
    # "either" policy – accept if either condition is good enough
    return ((delta is not None) and (float(delta) >= -eps)) or \
           ((bcs is not None) and (float(bcs) >= -eps))
  • delta – change in utility (goal-weighted φ)
  • cost – total cost of the visual ops
  • bcs – benefit-minus-cost score

Small numerical noise is tolerated (±eps), but real regressions are rejected. In one run, the agent rejected every zoom operation because they reduced utility a concrete example of visual self-discipline.


🍿 The Filmstrip: Watching a Thought Improve

The best artifact this agent produces is the filmstrip: a visual timeline of the thought’s evolution.

An actual filmstrip Here you can see the result of running this agent.

Each frame shows:

  • Top – the original VPM
  • Bottom – the current refined VPM
  • Bar – a utility percentage that tracks measured improvement

You see the thought go from noisy to structured:

[initial]          φ: sep=0.537  bridge=0.721  spec_gap=0.725  symmetry=0.912 | utility=0.176
[bootstrap:after]  Δφ: sep=+0.006  spec_gap=+0.004                   | Δutility=+0.011
[debias:after]     Δφ: sep=+0.005  spec_gap=+0.018                   | Δutility=+0.005
[bridge_tame:after]Δφ: sep=+0.230 spec_gap=+0.115 bridge=-0.213      | Δutility=+0.230

A human reading those logs can interpret:

  • Separability ↑ – clusters are more clearly distinguished
  • Bridge proxy ↓ – fewer “dangerous” bottleneck connections
  • Spectral gap ↑ – overall structure is more coherent

The filmstrip makes that structural change visible.


🪧 The State Machine: How Thoughts Evolve

Under the hood, all of this is driven by a small, explicit state machine:

    stateDiagram-v2
    state "🌱 Seed" as Seed
    state "📊 Assess" as Assess
    state "🎯 Decide" as Decide
    state "⚡ Apply" as Apply
    state "⭐ Score" as Score
    state "💾 Record" as Record
    state "✅ Terminate?" as Terminate

    [*] --> Seed
    Seed --> Assess: compute_phi(S)
    Assess --> Decide: select op via π(S)
    Decide --> Apply: apply_visual_op(S, A) → S'
    Apply --> Score: compute_phi(S')
    Score --> Record: write step to Nexus
    Record --> Terminate: if φ(S') good enough or no Δφ
    Terminate --> [*]
    Terminate --> Assess: else continue
  

This state machine represents the cognitive equivalent of deliberate practice. Each loop is a micro-cycle of:

Each loop is:

  1. Assess current structure (φ, utility)
  2. Choose a visual thought op (logic, debias, bridge_tame, zoom)
  3. Apply it to the VPM
  4. Re-score and decide whether to keep it
  5. Record the step into Nexus and the filmstrip

Nothing is “magic”; every change is auditable.


🎩 The Code Behind the Magic

The core loop in the agent is almost boringly straightforward, which is exactly what we want:

async def run(self, context: Dict[str, Any]) -> Dict[str, Any]:
    # 1. Extract the thought to refine
    scorables = list(context.get("scorables") or [])
    seed = Scorable.from_dict(scorables[0])

    # 2. Generate the visual representation
    metrics_values, metrics_columns = await self._extract_metrics(seed, context)
    chw_u8, adapter_meta = await self.zm.vpm_from_scorable(
        seed,
        metrics_values=metrics_values,
        metrics_columns=metrics_columns,
    )

    # 3. Initialize cognitive state
    state, initial_rgb = self._initialize_refinement_state(chw_u8, adapter_meta)

    # 4. Execute refinement phases
    final_state, frames, steps_meta = await self._execute_refinement_pipeline(
        state,
        initial_rgb,
        run_dir,
    )

Everything else the logic maps, the cost accounting, the filmstrip is just detail layered on top of this simple pattern: turn a thought into a picture, and then improve the picture.


💫 The Bigger Picture: Visual Thinking as a First-Class Mode

This agent proves a specific, radical claim:

You can convert structured data into a visual representation, operate directly in that visual space, and use the improvements there to drive better cognition.

That’s exactly what human thinking feels like: not replaying full books, but moving through a shifting cloud of fragments, shapes, tensions, and alignments until something “clicks.”

The VPM Thought Refiner is our first programmable version of that:

  • Scorables → VPMs – thoughts become visual objects
  • Visual ops → better structure – clarity, separation, reduced risk
  • Better structure → better decisions – utility rises, bad ops rejected
  • Filmstrips → proof – we can show how thinking improved, not just claim it did

In the next sections, these refined VPMs stop living in isolation: they plug into Nexus, feed habits of reasoning, and become part of a growing cognitive graph. But this is the inflection point the first agent where Stephanie truly thinks in pictures, and we can watch it happen.


Absolutely, here’s a tightened drop-in section you can paste near the end of the VPM Refiner bit to make that bandwidth point crystal clear.


📡 How Much Thought Fits in a Single VPM?

It’s easy to forget just how much structure lives in one of these “tiles”.

A single VPM at 4K resolution with 3 channels is:

  • Width: 4,096
  • Height: 4,096
  • Channels: 3

So the total number of scalar slots is:

4,096 × 4,096 × 3 = 50,331,648

That’s over 50 million individual values in a single frame.

If you treated each pixel-channel as a simple true/false decision, that would be the equivalent of making 50 million binary judgements at once. In reality, each slot is a continuous value (float/int), encoding things like density, bridge risk, novelty, or quality but the point stands:

A conventional classifier might give you one probability for “good vs bad”. A single VPM gives you 50 million structured signals about where that goodness or risk actually lives.

When the VPM Refiner applies one visual thought op (ZOOM, BBOX, PATH, HIGHLIGHT), it’s not tweaking a single scalar it’s re-shaping tens of millions of coupled signals in one pass, then re-scoring the whole field.

That’s the real power here: we’re no longer deciding on a single number, we’re sculpting an entire landscape of thought and letting Stephanie read that landscape the way you read a feeling, a hunch, or a pattern at a glance.


Here’s a tightened, blended version of the Agent 02 section that pulls in the strongest ideas from all the analyses and keeps it aligned with your code and the earlier Refiner section:


Perfect, that extra context nails what this section really needs to say. Here’s a clean, final version of the Agent 02 section that:

  • Treats the GIF as just a visualization
  • Puts the ViT / visual judge front and center
  • Emphasizes that this is where we define “better thinking” in image space
  • Keeps the route-home / path-memory analogy

You can drop this straight into the blog.


🎆 Agent 02: The Visual Thought Comparator (VPMLogicCompareAgent)

The first Nexus agent, the VPM Thought Refiner, showed that Stephanie can take a single VPM tile, operate on it visually, and only keep changes that genuinely improve the structure of the thought.

The second agent, the Visual Thought Comparator, zooms out to something much closer to how you actually move through the world.

👉 Full Code Here

Think about choosing your way home from a place you don’t visit often. You’ve tried a couple of routes before. When you’re deciding what to do next time, you don’t run numbers in your head – you replay patterns:

  • That way had too many awkward junctions and felt risky.
  • This way was slower on paper, but smoother and more predictable.

You’re not comparing two exact paths step by step, you’re comparing two ways of moving: two regimes of experience.

Agent 02 does the same thing for Stephanie.

“I’ve tried this family of thought paths and that family of thought paths.
Which overall way of thinking looks safer, cleaner, and more useful?”

Where Agent 01 improves one thought at a time, Agent 02 compares two regimes of thought and asks:

  • Do these goal-aligned, refined thoughts produce a healthier pattern of VPMs than this baseline family?
  • What does that healthier pattern look like as an image?

This is the layer where we start to define “better thinking” in visual terms: not just better scores, but better shapes of thought that Stephanie can recognise at a glance.


🏡 Where Agent 02 lives in the Nexus pipeline

By the time VPMLogicCompareAgent runs, a lot has already happened:

  1. Scorables are loaded and split into two cohorts:

    • scorables_targeted – goal-aligned, often already refined by Agent 01
    • scorables_baseline – off-goal, random, or older thoughts
  2. Each scorable has:

    • A metric vector (alignment, coverage, risk, etc.) from the Scorable Processor
    • At least one VPM from ZeroModel / the Refiner

Agent 02 reads those cohorts straight from the pipeline context:

A = [Scorable.from_dict(s) for s in (context.get("scorables_targeted") or [])]
B = [Scorable.from_dict(s) for s in (context.get("scorables_baseline") or [])]

From that point on, it behaves like a visual lab: it picks representative thoughts from each side, runs short visual refinement episodes, and produces two things:

  • Human-friendly film artifacts (including compare.gif)
  • Machine-friendly traces and labels for training the visual judge

⚙️ Step 1 – Metrics → VPMs → utility-ranked thoughts

For each cohort (A, then B), the agent does the same three steps:

  1. Paint the thought as a VPM

    • Use ScorableProcessor to get a normalized metric vector
    • Use ZeroModel to paint a 3×H×W VPM (chw_u8)
    • Wrap it in a VPMState and compute φ (separability, bridges, spectral gap, etc.)
  2. Score the thought visually

    Under the hood, this φ and the visual pattern are what we eventually feed into a VPM-native Vision Transformer (VPMViT / VPMThoughtPolicy). That model learns to map:

    VPM → [clarity, bridge risk, novelty, overall utility, …]

    For Agent 02, we collapse that into a single utility scalar: “how healthy does this thought look for this goal?”

  3. Rank and pick the representatives

    scored.sort(key=lambda t: t[0], reverse=True)
    pick = scored[: min(self.topk, len(scored))]
    

    We keep only the top-k thoughts per cohort. These become the representative shapes of each way of thinking.


🧪 Step 2 – Mini visual episodes: validating and shaping “better”

For each selected VPM, the Comparator runs a short visual reasoning episode, using the same primitives as the Refiner:

  • Bootstrap – combine quality, novelty, and (1 – uncertainty) into a first “goodness” channel
  • Debias – softly subtract risk so we don’t over-value dangerous regions
  • Bridge-tame – reduce over-reliance on skinny “bridge” regions that create fragile thinking
  • Optional zoom-focus – zoom into the most promising patch when it’s worth it

After each operation:

  1. We compute new φ and new utility.

  2. We compute a benefit-minus-cost score (BCS).

  3. We decide whether to keep the step:

    if u1 < u0 and bcs < 0.0:
        # revert – this visual change actually made things worse
    

Each episode produces:

  • A trace: which ops helped, which hurt, where utility plateaued
  • A “best” state: the cleanest, healthiest VPM we could reach for that thought

For the blog, we render that as frames: original → refined → utility bar. For Stephanie, it’s training signal: “this shape, plus these ops, led to better thinking.”


🎞️ Step 3 – Cohorts → films → compare.gif Hi

Once we’ve run mini episodes for both cohorts:

  1. We stack all frames for the targeted cohort into a filmstrip A.

  2. We stack all frames for the baseline cohort into filmstrip B.

  3. We pad them to equal length and horizontally concatenate each frame:

    • Left: targeted thought evolution
    • Right: baseline thought evolution

The result is compare.gif – a visual A/B test on thought patterns:

Comparison example

  • If the targeted regime is genuinely better, the left film looks:

    • cleaner and more coherent,
    • less over-bridged,
    • more stable as it improves.
  • If both sides look equally chaotic, that’s a strong, visual hint that our upstream selection isn’t really separating good vs bad thinking yet.

Again: this GIF is for us to inspect and debug. Stephanie doesn’t need it to think.


🧠 The real goal: a visual judge that sees “better” in one glance

The real destination is the VPM-native vision model that all of this feeds:

class VPMViT(nn.Module):
    """
    VPM-native ViT:
    - Input: (B, 3, H, W) VPM images
    - Output: regression heads (utility dimensions),
              classification heads (risk / health),
              optional masked-patch reconstruction.
    """

Trained on Comparator traces, this model learns to:

  • Take a single VPM and predict:

    • clarity / coherence,
    • bridge risk,
    • novelty / interest,
    • an overall utility score.
  • Optionally suggest which visual ops (zoom, debias, bridge-tame) are worth trying next.

That means that instead of running a full refinement loop every time, Stephanie can:

  1. Generate a VPM for a new thought.
  2. Run one forward pass of VPMViT.
  3. Instantly know how good or risky that thought pattern looks.

⚡ Millions of signals in a single pass

A typical classifier reasons over one scalar label at a time. Our visual judge reasons over tens of millions of structured signals in one shot.

In practice:

  • ZeroModel and the pipeline compress the state of hundreds or thousands of scorables and dimensions into that one image.

  • VPMViT ingests the full 4K × 4K × 3 tensor and returns:

    • “This thought regime looks healthy / unhealthy.”
    • “These regions look risky / noisy.”
    • “This looks more like the successful cohort than the baseline one.”

That’s what “world-scale in an instant” means here: not one bit at a time, but whole fields of thought judged in a single forward pass.


🧭 How this matches human “route memory”

This is very deliberately copying how our own thought feels.

When you’re trying to get home from somewhere you visit rarely, you don’t re-compute everything step-by-step. You:

  • Remember a few routes you’ve tried,
  • Recall which ones felt smooth vs stressful,
  • And pick a path based on that felt visual/structural memory.

The Comparator + ViT are doing the same for Stephanie:

  • Agent 02:

    • runs families of thoughts through the same visual logic,
    • shows (and logs) which regimes clean up nicely and which stay brittle.
  • The VPM vision model:

    • learns to recognize those good shapes immediately,
    • and to spot the signatures of bad routes (over-bridging, chaos, dead zones).

The compare.gif is our storybook. The real artifact is the trained visual judge: a tiny VPM-native brain that can look at a tile and know, at a glance, whether this is the kind of thinking we want more of.


🧱 Why Agent 02 matters

Putting it all together:

  • Agent 01 – proves we can improve a single visual thought safely.
  • Agent 02 – proves we can contrast whole regimes of thought, turn that into data, and teach a model what “better thinking” looks like in image space.

This is the step where “better” stops being a vague intuition and becomes a learnable visual pattern. Once the VPM-ViT is in place, Stephanie can look at millions of thoughts, in image form, and decide in a single forward pass which ones are worth following, reinforcing, or pruning.


🌋 The Blossom Engine: Where Ideas Explode into Possibility

Most AI systems stop at the first good answer. Stephanie doesn’t. She asks: “Can we do better?”

Her answer is Blossom a structured cognitive engine that treats every idea as a seed for exploration. When given a goal, Blossom doesn’t just respond. It launches a research forest: multiple parallel searches, each exploring different strategies, refining plans, and learning from dead ends. Then, from all those variants, it selects the strongest and plants it back into the Nexus graph. Over time, these blossoms make the whole system smarter.

This isn’t random brainstorming. It’s goal-directed cognitive expansion a systematic way to move from “good enough” to “truly excellent.”

💥 The Blossom Loop: From Seed to Insight

The core of Blossom is a tight, measurable loop:

  1. Seed: Start with a thought (a scorable)
  2. Expand: Generate multiple alternative paths
  3. Evaluate: Score each path on quality metrics
  4. Select: Choose the best path based on reward
  5. Refine: Optionally sharpen the winner
  6. Plant: Integrate the result back into the knowledge graph

This process mirrors human creativity: we don’t just think one thought, we explore many, compare them, and synthesize the best ideas.

💢 Tree-GRPO: The Engine of Exploration

Blossom’s exploration is powered by Tree-GRPO (Tree Guided Policy Rollout Optimization), a sophisticated tree search algorithm that manages the complexity of exploring multiple reasoning paths simultaneously:

    graph TD
    A["🌱 Seed Thought"] --> B["🌸 BlossomRunnerAgent"]
    B --> C["🔍 AgenticTreeSearch"]
    C --> D["🌳 Tree-GRPO Forest<br/>📊 M × N × L Dimensions"]
    
    subgraph "🎯 Tree-GRPO Rollout (Blossom Episode)"
        direction TB
        D --> E1["🪵 Root: Seed Plan "]
        E1 --> F1A["🌿 Child A: Expand → Evaluate "]
        E1 --> F1B["🌿 Child B: Expand → Evaluate "]
        E1 --> F1C["🌿 Child C: Expand → Evaluate "]
        F1A --> G1AA["✨ Refine A1 "]
        F1A --> G1AB["✨ Refine A2 "]
        F1B --> G1BA["✨ Refine B1 "]
        F1C --> G1CA["✨ Refine C1 "]
        G1AA --> H1AAA["🏁 Terminal "]
        G1AB --> H1ABA["🏁 Terminal "]
        G1BA --> H1BAA["🏁 Terminal "]
        G1CA --> H1CAA["🏁 Terminal "]
        
        style E1 fill:#8A2BE2,color:white,stroke-width:3px
        style F1A fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
        style F1B fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
        style F1C fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
        style G1AA fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
        style G1AB fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
        style G1BA fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
        style G1CA fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
        style H1AAA fill:#DC143C,stroke:#333,stroke-width:2px,color:#fff
        style H1ABA fill:#DC143C,stroke:#333,stroke-width:2px,color:#fff
        style H1BAA fill:#DC143C,stroke:#333,stroke-width:2px,color:#fff
        style H1CAA fill:#DC143C,stroke:#333,stroke-width:2px,color:#fff
    end
    
    subgraph "🏆 Selection & Integration"
        I["🔍 Select Winners<br/>by Reward/Safety"]
        J["✨ Optional Sharpen"]
        K["🌿 Winner Paths"]
        L["🔗 Integrate into<br/>Nexus Graph"]
        M["📊 Emit Events<br/>for Filmstrip"]
        
        I --> J
        J --> K
        K --> L
        K --> M
        
        style I fill:#FF8C00,stroke:#333,stroke-width:2px,color:#fff
        style J fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
        style K fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
        style L fill:#2E8B57,stroke:#333,stroke-width:2px,color:#fff
        style M fill:#8A2BE2,stroke:#333,stroke-width:2px,color:#fff
    end
    
    H1AAA --> I
    H1ABA --> I
    H1BAA --> I
    H1CAA --> I

    %% Main flow styling
    style A fill:#8A2BE2,stroke:#333,stroke-width:3px,color:#fff
    style B fill:#4169E1,stroke:#333,stroke-width:2px,color:#fff
    style C fill:#1E90FF,stroke:#333,stroke-width:2px,color:#fff
    style D fill:#228B22,stroke:#333,stroke-width:2px,color:#fff
    
  

This diagram shows how one initial thought explodes into a forest of possibilities, each evaluated and scored, before the best paths are selected and integrated back into the system.

Why Blossom Changes Everything

Blossom solves a fundamental problem in AI: local maxima. Most systems get stuck on the first good answer they find. Blossom actively explores alternatives, ensuring that Stephanie doesn’t just find an answer, but the best answer available within her current knowledge.

The key innovations:

  1. Parallel Exploration: Instead of following one path, Blossom explores many simultaneously
  2. Quality Scoring: Each path is evaluated using multiple metrics (clarity, relevance, coherence)
  3. Goal Awareness: The search is guided by the original objective, not random wandering
  4. Persistent Learning: Successful paths are remembered and strengthen future reasoning

The Cognitive Garden

Every Blossom episode generates a replayable event stream a complete record of the exploration process. This allows us to:

  • Visualize the thought process as a filmstrip
  • Reproduce the exact reasoning path
  • Analyze what strategies worked best
  • Learn from both successes and failures
# Example of the event stream generated by Blossom
{
    "event_type": "blossom_node_evaluated",
    "episode_id": "blossom_123",
    "node_id": "node_456",
    "parent_id": "node_123",
    "plan_text": "Expand on the concept of cognitive gardens...",
    "reward": 0.87,
    "metrics": {
        "clarity": 0.92,
        "relevance": 0.85,
        "coherence": 0.89
    },
    "timestamp": "2025-11-11T10:00:00Z"
}

This transparency is crucial. It means Stephanie’s reasoning isn’t a black box it’s a cognitive garden we can tend, understand, and improve.

Integration with the Nexus

The result of each Blossom episode doesn’t exist in isolation. The winning paths are converted back into Scorables and integrated into the Nexus graph. This creates a virtuous cycle: better reasoning strengthens the knowledge base, which enables even better reasoning in the future.

Over time, as more blossoms happen, the whole system gets smarter not because one model got bigger, but because the structure of understanding evolved. This is intelligence as growth, not magic. Just compounding improvement, step by step, scorable by scorable.

This is what makes Stephanie unique: she doesn’t just process information, she actively improves her ability to process information. The Blossom engine is the catalyst for that improvement turning static knowledge into dynamic, evolving understanding.


🐝 Agent 03: The Nexus Pollinator (NexusPollinatorAgent)

By now, Stephanie can:

  • Shape individual thoughts (Agent 01 – the VPM Thought Refiner)
  • Compare whole families of thoughts (Agent 02 – the Visual Thought Comparator)
  • Grow new ideas from a single seed (the Blossom Engine)

What’s still missing is the thing that makes a mind feel intelligent:

The ability to notice which ways of thinking actually work, and then reshape the whole landscape around those patterns.

That’s the job of the Nexus Pollinator.

👉 Full Code Here

If the Nexus is a cognitive garden and Blossom is the growth engine, then the Pollinator is the master gardener. It moves through the graph, finds the healthiest “plants” (thought patterns), cross-pollinates them, and quietly rewires the garden so that good thinking becomes the default.

It doesn’t invent new ideas from scratch. It learns from the ideas we already have and reorganizes the garden so the best patterns win.


🌼 Why “Pollinator” Is the Right Metaphor

Bees don’t create flowers. They:

  • Move between flowers
  • Carry pollen from strong plants to others
  • Slowly shift which species thrive in a field

The Pollinator works the same way:

  1. It visits a neighborhood of Nexus nodes (scorables) around some recent activity or goal.
  2. It feeds those nodes into Blossom to grow local idea trees.
  3. It identifies winners: variants that are clearly better according to Stephanie’s scoring stack.
  4. It reintegrates the winners into the graph and adjusts connections.

Over many runs, the graph develops cognitive habits:

  • Strong pathways (patterns that repeatedly work) get reinforced.
  • Weak or noisy branches get pruned back.
  • Previously separate ideas get connected where they consistently help each other.

This is Stephanie learning how to think, not just what to say.


🌐 The Pollinator Loop: How It Actually Works

Under the hood, the Pollinator runs a three-phase loop over the Nexus graph.

1. 🧭 Scout a Neighborhood in the Nexus

The Pollinator doesn’t touch the whole graph at once. It starts from a trigger:

  • a new Blossom episode,
  • a fresh reasoning trace,
  • or a high-value scorable we want to improve.

From there it asks the Nexus graph:

“Show me the local garden around this thought.”

It pulls a neighborhood of related nodes using:

  • embedding similarity,
  • shared domains/entities,
  • and recent pulse activity.

This gives it a small, goal-relevant subgraph: the patch of garden it’s going to tend.

2. 🌾 Materialize & Enrich That Context

Next, the Pollinator turns that rough neighborhood into a fully enriched dataset.

Every node in the patch is passed through the ScorableProcessor, which:

  • hydrates domains and entities,
  • computes embeddings (global + cached),
  • runs the full scoring stack (SICQL, MRQ, HRM, Tiny, etc.),
  • renders a VPM tile via ZeroModel (vision signals).

So a plain text node like:

“We should check the logs before debugging further.”

becomes a rich ScorableRow with:

  • metrics vector (model.dimension scores),
  • embed_global,
  • domains, ner,
  • associated VPM tile + metadata.

This step is what “enhanced versions of the graph” really means: the Pollinator makes sure every thought in that neighborhood is fully seen numerically and visually before it decides what to do.

3. 🌸 Run Focused Blossom Episodes & Rewrite the Garden

Once the neighborhood is enriched, the Pollinator calls back into the BlossomRunnerAgent but now with a much stronger starting point.

For one or more parent nodes in this patch, it:

  1. Seeds Blossom with the parent scorable + enriched context.
  2. Lets Tree-GRPO explode into local idea trees (rewrites, alternative plans, different strategies).
  3. Scores every branch using the same metrics + VPM-ViT signals.
  4. Selects winners that beat their parents by a meaningful margin (not just noise).
  5. Emits garden events describing what changed: new nodes, new edges, promotions, prunes.

Those winners are turned back into scorables, processed again via ScorableProcessor, and attached to the graph as upgraded nodes and edges.

At the end of a Pollinator run, you haven’t just improved a few isolated thoughts you’ve:

  • upgraded a whole patch of the graph,
  • enriched its metrics and VPMs,
  • and grown new, higher-quality branches into the Nexus.

🎛️ The Pollinator Workflow at a Glance

Here’s the loop in diagram form:

    graph TD
    A["🎯 Seed Patch<br/>Nexus neighborhood"] --> B["📊 Baseline Snapshot<br/>Materialize with ScorableProcessor"]
    B --> C["🌸 Local Blossom Episodes<br/>Tree-GRPO exploration"]
    C --> D["🎭 Novelty & Quality Filters<br/>prune weak/duplicate variants"]
    D --> E["🏆 Select Winners<br/>multi-model, multi-dimension lift"]
    E --> F["🔗 Rewrite Nexus<br/>add/upgrade nodes & edges"]
    F --> G["📈 Garden Metrics<br/>graph-level & cohort lift"]
    G --> H["🧠 Stronger Patch<br/>better defaults next time"]
    
    C --> I["🦋 Training Events<br/>vision + policy traces"]
    I --> C
    
    style A fill:#8A2BE2,color:white,stroke-width:3px
    style B fill:#4169E1,stroke:#333,stroke-width:2px,color:#fff
    style C fill:#32CD32,stroke:#333,stroke-width:2px,color:#fff
    style D fill:#FFD700,stroke:#333,stroke-width:2px,color:#000
    style E fill:#FF8C00,stroke:#333,stroke-width:2px,color:#fff
    style F fill:#2E8B57,stroke:#333,stroke-width:2px,color:#fff
    style G fill:#DC143C,stroke:#333,stroke-width:2px,color:#fff
    style H fill:#8A2BE2,color:white,stroke-width:3px
    style I fill:#9370DB,stroke:#333,stroke-width:2px,color:#fff
  

Along the way, the Pollinator emits a replayable event stream (“garden events”) plus optional garden frames (image sequences), so you can literally watch a patch of the Nexus get healthier over time.


🧠 What This Changes for Stephanie

The Pollinator turns the whole system from “we ran a neat pipeline” into “we’re steadily training a mind.”

Each Pollinator run:

  • Strengthens successful patterns

    • High-performing branches from Blossom become new defaults in the graph.
  • Weakens or prunes failing patterns

    • Nodes and edges that rarely help stop dominating the neighborhood.
  • Feeds back into visual cognition

    • Before/after VPM pairs and graph events become training data for the VPM-ViT and policy heads.
  • Improves the substrate for future runs

    • Next Blossom episode in that region starts from a richer, better-structured patch.

Crucially, this is recursive:

Better graph → better Blossom context → stronger winners → better graph.

You don’t need bigger models to see progress. You just keep running the garden loop:

  • Refine a thought
  • Compare families of thoughts
  • Blossom new possibilities
  • Pollinate the graph with the best of them

…until the Nexus starts to look less like a static knowledge base and more like a living, self-tending ecosystem.

That’s what the Pollinator Agent really is: Stephanie’s mechanism for turning scattered, one-off improvements into stable cognitive habits, encoded directly into the graph she thinks with next time.


🌸 The Blossom Runner: Where One Seed Becomes a Forest

If the Nexus Pollinator decides where to work in the cognitive garden, the Blossom Runner is the engine that actually does the work.

👉 Full Code Here

Whenever the Pollinator says:

“This cluster of thoughts is important – explore it properly.”

…it hands the goal and seed plans to the BlossomRunnerAgent. From there, the Runner turns a single promising thought into an entire Tree-GRPO forest of alternatives, scores every branch, records everything, and hands back the best paths plus training data.

It’s Stephanie’s local research chamber:

“Given this goal and these starting ideas, grow a whole tree of possibilities, measure them, keep the best, and remember the whole journey.”


🌳 The Blossom Episode: From Seed to Winner

Each Blossom run is a self-contained episode with its own ID, config, and audit trail. Conceptually, every episode follows the same loop:

  1. Configure the run (and optionally jitter the goal)
  2. Roll out a Tree-GRPO forest of plans
  3. Persist the whole thought tree (nodes + edges)
  4. Select and optionally sharpen winners
  5. Hand winners + training batch back to the Pollinator/Nexus
  6. Emit timeline events so we can replay the reasoning

You can picture it like this:

    graph LR
    A[🎯 Goal + Seed Plans] --> B[🌸 BlossomRunnerAgent]
    B --> C[🌲 Tree-GRPO Forest<br/>M × N × L Exploration]
    C --> D[🧱 Persist Nodes + Edges]
    D --> E[🏅 Select Top-K Paths]
    E --> F[✨ Optional Sharpening]
    F --> G[🔗 Return Winners<br/>+ Training Batch]
    C --> H[📽️ Timeline Events<br/>Cognitive Filmstrip]

    style A fill:#4c78a8,stroke:#333,color:white
    style B fill:#e45755,stroke:#333,color:white
    style C fill:#72b7b2,stroke:#333,color:white
    style D fill:#ffbf79,stroke:#333,color:#000
    style E fill:#f58518,stroke:#333,color:white
    style F fill:#9d67ae,stroke:#333,color:white
    style G fill:#2ca02c,stroke:#333,color:white
    style H fill:#9370db,stroke:#333,color:white
  

Let’s unpack the bits that matter.


1️⃣ Configure the Episode (and Nudge the Goal)

Each run starts by creating a Blossom episode in the store:

  • Which agent ran it (blossom_runner)
  • What strategy it used (tree_grpo)
  • The goal id and goal text
  • All the knobs for this run: M, N, L, return_top_k, sharpen_top_k, whether to use VPM hints, etc.

This makes every Blossom run a first-class object in Stephanie’s memory – not just “something that happened”, but a named research session that other agents can query later.

If configured, the Runner can also apply visual jitter to the goal:

  • It turns the goal into a small VPM (Visual Policy Map)
  • Applies a transform (e.g. zoom_max) to emphasize certain structures
  • Generates a short natural-language hint and appends it to the goal

Effectively, it’s like saying:

“For this run, favour crisper structure, explicit claims, and stronger evidential links.”

That gives the LLM a gentle, visual-grounded bias before the search even begins.


2️⃣ Roll Out the Tree-GRPO Forest

The heart of the Runner is a Tree-GRPO–powered search, wrapped in an AgenticTreeSearch + TreeGRPOAdapter stack.

Instead of following one chain of thought, it explores a forest of candidate plans in parallel, guided by reward and exploration parameters.

🧠 What M, N, and L Mean for “How Stephanie Thinks”

These three numbers aren’t just technical knobs – they control Stephanie’s style of thinking:

  • M – Number of trees (forests) How many independent lines of attack to run in parallel. Like having multiple research teams approach the same problem from different angles.

  • N – Branching factor How many alternatives to explore at each step. This is cognitive breadth: explore 2 vs 5 different next moves at each decision.

  • L – Lookahead depth How far Stephanie thinks ahead before committing. L=1 is quick/tactical; larger L is deeper, more strategic reasoning.

For routine tasks, M/N/L can stay small (efficient thinking). For hard or novel questions, they can be dialed up (exploratory thinking).


3️⃣ Two-Pass Persistence: First Concepts, Then Connections

When the forest rollout finishes, the Runner doesn’t just keep the winners and forget the rest. It persists the entire thought tree in two passes:

  1. Pass 1 – Create all nodes Every state in the forest becomes a node with:

    • The plan text / state description
    • Its reward metric(s)
    • Tags like draft, leaf, root, etc.
  2. Pass 2 – Wire the edges Once all nodes have stable database IDs, the Runner:

    • Fills in each node’s parent_id
    • Adds edges parent → child with a relation type (e.g. expand, refine, select)

Cognitively, it’s like:

  • First remember the ideas (nodes)
  • Then remember how they connect (edges)

That two-step process means the tree is always recoverable: even if edge creation fails, the ideas themselves aren’t lost.


4️⃣ Select and Sharpen the Winners

From all the leaf nodes, the Runner:

  1. Finds the Top-K leaves by reward

  2. Traces each winner back to the root to recover its full reasoning path

  3. Optionally runs a sharpening loop:

    • Sends the plan through a small LLM with instructions like:

      “Make this clearer, safer, and more aligned with the goal.”

    • Re-scores the sharpened version using Stephanie’s scoring stack

    • Keeps whichever version (original or sharpened) has higher utility

    • Marks winners with tags like winner, sharpened

It’s the difference between:

  • “That seems good enough”
  • vs. “This is the best we can currently do, and we’ve polished it.”

5️⃣ Package Results for the Pollinator and Nexus

At the end of the run, the Runner returns a compact, rich result:

context["blossom_result"] = {
    "episode_id": ...,
    "winners": [...],        # full winner metadata & paths
    "training_batch": ...    # GRPO/DPO-style data from the forest
}

The Pollinator picks that up and:

  • Converts winners into Scorables
  • Attaches them to the Nexus graph as new or updated nodes
  • Uses the training batch to refine Stephanie’s policies over time

So each Blossom run doesn’t just add content – it also adds learning signal.


6️⃣ Timeline Events: Cognitive Filmstrips

Throughout the episode, the Runner emits timeline events onto the bus:

  • blossom_node_evaluated – a node’s plan text, reward, metrics
  • blossom_episode_summary – stats about the run (nodes, best reward, etc.)

These events are what power your filmstrips and dashboards:

  • You can replay a run as a sequence of “frames”
  • See which branches were explored, where the search backtracked
  • Watch how the winning path emerged and was sharpened

It’s not just telemetry – it’s visible cognition.


🌐 Why the Blossom Runner Matters

Putting it all together:

  • The Pollinator decides where to explore in the Nexus garden.

  • The Blossom Runner is the execution engine that:

    • Grows a forest of candidate thoughts
    • Measures every branch
    • Persists the full reasoning tree
    • Selects and sharpens winners
    • Produces training data and visualizable timelines

This is the component that turns “we had a good idea once” into:

  • A replayable thought tree
  • A better graph structure in Nexus
  • And new training signal for Stephanie’s future reasoning

It’s not just generating answers – it’s generating experience.

Every Blossom episode makes the cognitive garden a little richer, the pathways a little stronger, and Stephanie’s future thinking a little sharper.


📬 Prompt Events: Asynchronous Candidates Without Blocking

Blossom can generate and refine candidates inline. But some workloads are bursty, expensive, or best handled by external providers. For those, Stephanie routes candidate generation through a Prompt Events path:

  • Offload prompts to a worker pool (LLM providers or your own fleet)
  • Resume when results arrive (non-blocking)
  • Decide using the same scoring + promotion rules

It’s the same cognitive loop just decoupled.


Why this matters

  • Throughput: run hundreds of candidate refinements in parallel without tying up the main loop.
  • Fault tolerance: retries, DLQs, and idempotent job IDs keep runs resilient.
  • Observability: every offloaded prompt and return is tracked (bus + DB + JSONL).
  • Uniformity: offloaded results re-enter the exact same scorable → score → promote path.

The flow at a glance

    sequenceDiagram
    participant NI as 🔀 NexusImproverAsyncAgent
    participant PC as 📡 PromptClient
    participant BUS as 🚀 ZeroMQ Bus
    participant WRK as 👷 Provider Workers
    participant DB as 💾 PromptJobStore
    participant NR as 🔄 NexusImproverResumeAgent
    participant SC as 📊 ScoringService
    participant NX as 🕸️ Nexus Graph

    Note over NI,WRK: 🎯 Phase 1: Async Prompt Offloading

    NI->>PC: "📤 offload_many(K prompts, model, target_pool)"
    activate PC
    PC->>BUS: "📨 publish PromptJob (prompts.submit)"
    activate BUS
    BUS->>WRK: "🎪 dispatch prompt to target pool"
    activate WRK
    WRK-->>BUS: "📝 publish result (results.prompts.{job_id})"
    deactivate WRK
    BUS->>DB: "💾 write result (status=succeeded)"
    deactivate BUS
    deactivate PC

    Note over NI,DB: 💫 NI continues other work<br/>🎫 tickets saved in context.pending_prompt_tickets

    Note over NR,NX: 🔄 Phase 2: Resume & Decision

    NR->>DB: "🔍 gather_ready(job_ids)"
    activate DB
    DB-->>NR: "📋 {job_id → result_text/json, meta...}"
    deactivate DB
    
    NR->>SC: "⚖️ evaluate candidates (async bus or local)"
    activate SC
    SC-->>NR: "🎯 overall + per-dimension scores"
    deactivate SC
    
    NR->>NX: "🏆 link parent→children<br/>📈 promote winner (if lift ≥ margin)"
    activate NX
    deactivate NX
    
  

🧾 What We Actually Added in This Post

This post wasn’t “more plumbing.” It built the environment where Jitter can actually live.

Here’s what we put in place:

  • 🕸️ Nexus Store – Stable world & memory A real, Postgres-backed brain:

    • NexusScorable for atomic thoughts
    • NexusEmbedding for semantic position
    • NexusMetrics for multi-dimensional judgment
    • NexusEdge for habits and relationships
    • NexusPulse for cognitive heartbeats This is the substrate where thoughts persist, connect, and can be revisited.
  • 🎨 Visual AI (VPMs) – Proto visual cortex Visual Policy Maps + ZeroModel/tiny vision turn score vectors into images: tiles, heatmaps, filmstrips. Stephanie can now see her own thinking: which regions are overgrown, underused, brittle, or healthy.

  • 🌸 Blossom Runner – Local metabolism The engine that takes a seed thought + goal and grows a Tree-GRPO forest of alternatives:

    • explores many paths
    • scores, sharpens, and persists them
    • returns the best winners with a full audit trail Every run is a compact research session, not just a single completion.
  • 🐝 Nexus Pollinator – Master gardener The orchestrator that:

    • scouts a neighborhood in Nexus (nearby thoughts, entities, embeddings)
    • fully materializes and scores that context
    • calls Blossom Runner with this enriched neighborhood
    • promotes the best children and rewires Nexus edges accordingly This is how cognitive habits form: good paths get reinforced, weak ones slowly fade.
  • ⚡ Pulses & Events – Cognitive nervous system Pulses, timeline events, and the event/knowledge bus turn everything into live telemetry:

    • what lit up for which goal
    • which branches won or died
    • how the garden changed over time Those same events double as training data for HRM, Tiny, SICQL, GRPO, etc.

🐣 How This Serves Jitter

Each of these pieces exists for one reason: to give Jitter somewhere real to exist and evolve.

  • Nexus → Jitter’s world & long-term memory
  • Embeddings & metrics → Jitter’s sense of position and value
  • Visual AI / VPMs → Jitter’s sight (what’s healthy, risky, coherent)
  • Blossom Runner → Jitter’s exploration & imagination
  • Pollinator → Jitter’s learning loop & habit-building
  • Pulses & events → Jitter’s heartbeat, reflexes, and history of experience

We’re not just adding modules; we’re wiring together a cognitive ecosystem that a small digital creature can move through, feel, and slowly shape.


🌱 What “Thriving” Actually Means for Jitter

When we say we want Jitter to thrive in this habitat, we mean:

  1. Adaptation – It changes behavior in response to feedback and shifting goals.
  2. Growth – It discovers new strategies and patterns via Blossom episodes.
  3. Persistence – Useful patterns stick in Nexus; they don’t reset every run.
  4. Autonomy – It can choose where to focus: which regions to pulse, which seeds to blossom.
  5. Resilience – Bad ideas don’t kill it; they become negative examples in its training data.

None of that is a single “feature flag.” It emerges from having the right substrate + senses + loops, which is exactly what this chapter built.


🌐 From Thinking Once to Living With Thoughts

Putting it all together:

  • Nexus is the soil.
  • Blossom is the local growth engine.
  • Pollinator is the ecosystem manager.
  • VPMs are the visible health indicators.
  • Pulses and events are the rhythms and reflexes.

And Jitter is the thing that will inhabit this garden probing it, nudging it, getting better at staying “alive” and useful inside it.

The previous posts showed how Stephanie can think once. This chapter builds the place where thinking can continue, where thoughts accumulate, compete, and improve over time.

In the next step, we’ll stop treating Jitter as an idea and start treating it as an agent in a world: how it moves through Nexus, how it chooses where to blossom, and how a persistent identity begins to form from all these traces.


✅ Conclusion: A Substrate for Self-Improvement

This chapter wasn’t about adding one more agent. It was about changing the shape of the system so Stephanie can be used as a self-improving platform instead of a one-shot pipeline.

1. What We Actually Changed

Concretely, we introduced and wired up:

  • Nexus Store (Persistent Cognitive Graph)

    • Postgres-backed models for:

      • NexusScorableORM – text + meta
      • NexusEmbeddingORM – global embeddings + norms
      • NexusMetricsORM – dense metric vectors
      • NexusEdgeORM – typed, weighted edges
      • NexusPulseORM – time-stamped “cognitive heartbeats”
    • This gives us a stable, queryable graph of thoughts, embeddings, metrics, and connectivity that all other components can share.

  • Visual Stack (VPM-First Metrics)

    • The ScorableProcessor now routes scorables through:

      • Embedding computation
      • Multi-model scoring (SICQL / HRM / Tiny / etc.)
      • VPM generation via ZeroModelService
    • Every important scorable can now be represented as:

      • Text
      • Embedding vector
      • Metrics vector
      • Visual Policy Map (VPM)
    • This makes “quality of reasoning” a first-class numeric + visual object.

  • Blossom Runner (Local Exploration Engine)

    • BlossomRunnerAgent executes Tree-GRPO / Agentic Tree Search episodes:

      • M×N×L forest exploration for a given goal/seed
      • Two-pass persistence of nodes and edges into BlossomStore
      • Optional sharpening loop with re-scoring
      • Emission of structured training batches for downstream learning
    • Outcome: for any goal, we can generate a full reasoning tree, not just a final answer, and store it as data.

  • Nexus Pollinator (Graph-Level Improvement Loop)

    • NexusPollinatorAgent implements the outer loop:

      1. Select a neighborhood in Nexus (by embedding, domain, activity, etc.)
      2. Ensure all scorables in that neighborhood are fully processed via ScorableProcessor (domains, NER, metrics, VPMs)
      3. Trigger a focused Blossom episode using that neighborhood as context
      4. Integrate winners back into Nexus as new or updated nodes/edges
    • Effectively: it turns local reasoning wins into structural updates in the global graph.

  • Event Pulses & Telemetry (Cognitive Nervous System)

    • Pulses (NexusPulseORM) and event subjects on the bus give:

      • Time-stamped snapshots of “what was important when”
      • A way for agents (Pollinator, Blossom Runner, future Jitter logic) to react to activity, not just static data
      • A timeline suitable for dashboards and replay / filmstrips

All of this is wired through the Scorable → Metrics → VPM → Nexus path, so any text artifact can move through the same normalization and analysis pipeline.


2. How This Enables Self-Improvement

From a self-improvement perspective, we now have a closed loop that looks like this:

  1. Ingest & Normalize

    • Anything we care about (chat turns, code, documents, plans) is converted into a Scorable.

    • ScorableProcessor attaches:

      • Domains, entities
      • Embeddings
      • Multi-model scores
      • Metrics vectors + VPMs
  2. Persist & Connect

    • The same object is stored in:

      • Nexus (long-term graph)
      • Optional Blossom/trace stores (reasoning episodes)
    • Edges encode similarity, temporal links, shared domains, and Pollinator-driven “this path worked” signals.

  3. Explore & Optimize

    • Blossom Runner takes a goal + seeds and:

      • Explores a structured search space (Tree-GRPO)
      • Produces a forest of candidate plans + scores
      • Optionally sharpens the top paths
  4. Select & Update

    • Pollinator:

      • Picks strong neighborhoods and successful plans
      • Promotes them into Nexus (new nodes, stronger edges, updated metrics)
      • Optionally demotes or prunes weak / low-utility regions
  5. Train & Calibrate

    • Forest traces, scores, and Nexus metrics can be used to:

      • Train / fine-tune local models (SICQL, HRM, Tiny, vision heads)
      • Calibrate decision thresholds, risk models, and routing logic
    • The same infrastructure that scores external tasks can now be pointed at Stephanie’s own behavior.

The key point: we now have a single, shared substrate (Scorables → Metrics/VPMs → Nexus) that all future self-improvement loops can run on.


3. Next Step: Point It at Stephanie Herself

This series is about self-improving systems, and Stephanie is supposed to be the reference implementation. With this chapter done, we finally have enough machinery to apply the same logic to Stephanie’s own artifacts:

  • Treat code files, modules, and configs as scorables:

    • Score them for clarity, risk, dead complexity, duplication, test coverage hints, etc.
    • Visualize whole subsystems as VPMs to spot “hot” or fragile regions at a glance.
  • Use Blossom Runner to generate:

    • Alternative implementations
    • Refactoring plans
    • Better docstrings, comments, and invariants
  • Use Pollinator to:

    • Integrate accepted refactors back into Nexus
    • Strengthen links between “good patterns” and the places that use them
    • Build a map of “healthy” vs “needing work” regions across ~1000+ files.

In other words: the next post will be about turning this infrastructure inward using Nexus, VPMs, Blossom, and Pollinator to help Stephanie analyze, refactor, and improve her own code and writing.

This chapter built the substrate. The next one is where we let the system loose on itself and see how far it can actually push us.


📚 Glossary

This glossary defines the key concepts, components, and terminology introduced in this blog post about building a cognitive ecosystem for AI.

🍏 Core Concepts

Nexus - Stephanie’s persistent, evolving graph of thoughts and connections. It serves as the stable “world” where cognitive patterns form habits and knowledge accumulates over time. Unlike a static database, Nexus is a living structure that grows and adapts through use.

Scorable - The fundamental unit of thought in Stephanie’s system. Any piece of information - a conversation turn, document, plan step, or even a graph itself - can be normalized into a Scorable. This creates a universal “atom of thought” that all cognitive processes can work with consistently.

Jitter - A conceptual digital life-form designed to explore, adapt, and thrive within Stephanie’s cognitive architecture. Jitter represents the goal of creating AI that doesn’t just process information, but lives with it, developing persistent identity and expertise over time.

⚙️ Cognitive Components

Blossom Engine - Stephanie’s exploration mechanism that treats every idea as a seed for growth. Given a starting thought and goal, Blossom generates multiple alternative paths, evaluates them, and selects the strongest. It’s the system’s way of moving beyond “good enough” to “truly excellent” answers.

Tree-GRPO (Tree Guided Policy Rollout Optimization) - The sophisticated tree search algorithm that powers Blossom’s exploration. It manages the complexity of exploring multiple reasoning paths simultaneously, using policy-based guidance to focus resources on the most promising directions.

Visual Policy Map (VPM) - A three-channel image representation of cognitive metrics that makes abstract reasoning visible. Channel 0 shows concept density, Channel 1 shows connection strength, and Channel 2 shows concept importance. VPMs allow Stephanie to “see” the quality of reasoning patterns.

ZeroModel - The vision system that converts metric vectors into VPMs and enables visual operations on thought structures. It’s Stephanie’s “proto-visual cortex” that allows her to process information through visual representations.

🏰 System Architecture

Pollinator Agent - The system component that acts as a “cognitive gardener,” moving through the Nexus graph to strengthen successful pathways and prune weak ones. It takes local Blossom improvements and integrates them globally, updating the underlying graph structure and metrics.

Scorable Processor - The component that transforms raw Scorables into “living thoughts” by enriching them with embeddings, domains, entities, and quality metrics. It’s where data becomes meaning through contextual annotation and evaluation.

Knowledge Bus - Stephanie’s “cognitive nervous system” - an event-driven messaging system that allows thoughts to flow as events rather than marching down procedural pipelines. It enables parallel cognitive pathways and prevents system-wide blocking.

Memory Tool - The long-term storage system that acts as Stephanie’s “left hemisphere” - the structured, organized repository of everything she knows. It provides cognitive scaffolding through specialized stores for different types of knowledge.

🔨 Cognitive Processes

Cognitive Surgery - The process of improving thought quality through structured visual operations. The VPM Refiner performs cognitive surgery by applying operations like Bootstrap, Debias, Bridge Tame, and Zoom Focus to clean up and strengthen thought structures.

Visual Jitter - A technique that applies visual transformations to goals before processing, creating “productive friction” that pushes Stephanie to consider angles she might otherwise miss. It’s like changing perspective when stuck on a problem.

Habit Formation - The process by which successful cognitive pathways in the Nexus graph become strengthened through repeated use. Like neural pathway strengthening, frequently traveled paths become faster and more reliable.

Cognitive Pruning - The process of weakening or removing unsuccessful thought pathways in the Nexus graph, preventing cognitive clutter and maintaining system efficiency.

🔩 Technical Components

Nexus Store - The Postgres-backed storage system that implements the Nexus graph with tables for Scorables, Embeddings, Metrics, Edges, and Pulses. It provides the stable substrate where Jitter can form persistent memories.

Blossom Runner - The execution engine that actually performs the Tree-GRPO exploration process, growing forests of alternative thoughts and selecting winners. It’s the “engine room” that turns one thought into an explored possibility space.

Event Pulses - Targeted attention events that move through the Nexus graph, strengthening connections between related ideas and triggering new growth where needed. They create Stephanie’s “cognitive heartbeat.”

Garden Events - JSONL telemetry records that capture the complete story of how thoughts evolve through Blossom episodes. They enable replayable cognitive filmstrips showing the thought refinement process.

💻 Visual Operations

Bootstrap - A visual operation that creates a “goodness channel” by combining quality and novelty while suppressing uncertainty, highlighting promising thought regions.

Debias - A visual operation that removes cognitive distortions by softly subtracting risk areas while preserving useful caution, like a cognitive therapist identifying biases.

Bridge Tame - A visual operation that identifies and softens “bridge regions” - over-connected conceptual pathways that can blur important distinctions, making thought structures cleaner.

Zoom Focus - A visual operation that calculates the centroid of attention mass and zooms in on the most salient aspects of thoughts, mimicking how humans focus attention.

📎 Cognitive Dimensions

Separability - How clearly concepts are distinguished in thought structures, measured through VPM analysis.

Bridge Proxy - A measure of risk from over-reliance on narrow connections between clusters of thought.

Spectral Gap - A measure of overall structural integrity in thought patterns.

Symmetry - How balanced and coherent thought structures appear in their visual representation.

Utility - A scalar measure of how good a thought is for a specific goal, computed from the φ feature vector.

🥸 Agent Types

VPM Thought Refiner - Agent 01 that takes single thoughts and makes their VPM tiles healthier through visual operations.

Visual Thought Comparator - Agent 02 that compares two cohorts of thoughts as visual objects to determine which regime of thinking is healthier.

BlossomRunnerAgent - The execution engine that performs Tree-GRPO exploration and generates thought forests.

Nexus Pollinator - The master gardener agent that scouts neighborhoods, calls Blossom, and promotes winners back to the graph.


📚 References

This section provides academic references for readers interested in exploring the foundational research behind the concepts discussed in this blog post.

🍙 Graph-Based Reasoning

[ToT] Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cai, Y., & Tian, Y. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv preprint arXiv:2305.10601. Available at: https://arxiv.org/pdf/2305.10601

[GoT] Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Gianinazzi, L., Gajda, J., … & Hoefler, T. (2023). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. arXiv preprint arXiv:2308.09687. Available at: https://arxiv.org/abs/2308.09687

[GraphRAG] Chen, D., Liu, K., Liu, X., Wang, J., Sun, H., Chen, X., … & Zhang, M. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv preprint arXiv:2404.16130. Available at: https://arxiv.org/abs/2404.16130

☘️ Visual Representations in AI

[Vaswani et al. 2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30, 5998-6008.

[Dosovitskiy et al. 2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

🎊 Cognitive Architecture and Human Reasoning

[Miller 1956] Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, 63(2), 81.

[Engelbart 1968] Engelbart, D. C. (1968). The mother of all demos. Proceedings of the December 9-10, 1968, Fall Joint Computer Conference.

[Newell & Simon 1972] Newell, A., & Simon, H. A. (1972). Human problem solving. Prentice-Hall, Inc.

🎾 Event-Driven and Distributed Systems

[Hunt & Mccarthy 2010] Hunt, R., & Mccarthy, A. (2010). ZeroMQ: messaging for many applications. O’Reilly Media, Inc..

[Shavit & Treiber 1997] Shavit, N., & Treiber, R. (1997). Software transactional memory. Communications of the ACM, 41(12), 81-89.

🍏 Memory and Learning Systems

[Atkinson & Shiffrin 1968] Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. Psychology of learning and motivation, 2, 89-195.

[Sutton & Barto 2018] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

🌑 Large Language Model Reasoning

[Wei et al. 2022] Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., … & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.

[Brown et al. 2020] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

🎱 Graph Neural Networks and Knowledge Graphs

[Scarselli et al. 2009] Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2009). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80.

[Battaglia et al. 2018] Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., … & Pascanu, R. (2018). Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261.

🏀 Multi-Agent Systems and Emergence

[Wooldridge & Jennings 1995] Wooldridge, M., & Jennings, N. R. (1995). Intelligent agents: Theory and practice. Knowledge engineering review, 10(2), 115-152.

[Holland 1992] Holland, J. H. (1992). Complex adaptive systems. Daedalus, 121(1), 17-30.