The Self-Aware Pipeline: Empowering AI to Choose Its Own Path to the Goal

The Self-Aware Pipeline: Empowering AI to Choose Its Own Path to the Goal

🔧 Summary

Modern AI systems require more than just raw processing power they need contextual awareness, strategic foresight, and adaptive learning capabilities. In this post, we walk through how we implemented a self-aware pipeline system inspired by the Devil’s Advocate paper.

Unlike brittle, static workflows, this architecture empowers agents to reflect on their own steps, predict failure modes, and adapt their strategies in real time.


🧠 Grounding in Research

Devil’s Advocate (ReReST)

ReReST: Devil's Advocate: Anticipatory Reflection for LLM Agents introduces a self-training framework for LLM agents. The core idea is to have a “reflector” agent anticipate failures and revise the original plan before executing a powerful method for reducing hallucinations and improving sample quality. Our implementation draws heavily on these ideas to enable dynamic planning and feedback loops within the pipeline.

General Reasoner: The smarter Local Agent

General Reasoner: The smarter Local Agent

🔧 Summary

The General Reasoner paper shows how we can train LLMs to reason across domains using diverse data and a generative verifier. In this post, I walk through our open-source implementation showing how we built a modular reasoning agent capable of generating multiple hypotheses, evaluating them with an LLM-based judge, and selecting the best answer.


🧠 What We Built

We built a GeneralReasonerAgent that:

  • Dynamically generates multiple hypotheses using different reasoning strategies (e.g., cot, debate, verify_then_answer, etc.)
  • Evaluates each pair of hypotheses using either a local LLM judge or our custom MR.Q evaluator
  • Classifies the winning hypothesis using rubric dimensions
  • Logs structured results to a PostgreSQL-backed system

All of this was integrated with our existing co_ai framework, which includes:

Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia

Building a Self-Improving Chain-of-Thought Agent: Local LLMs Meet the CoT Encyclopedia

Most AI systems generate answers. Ours examines how they think. This isn’t just prompt engineering this is structured reasoning at scale.

🔧 Summary

Large Language Models are transforming every field, yet their internal reasoning remains a formidable black box. We can get brilliant outputs, but without understanding how those conclusions were reached, we’re left guessing how to improve, debug, or even trust them. This opacity limits our ability to build truly reliable and self-improving AI systems.

Self-Improving Agents: Applying the Sharpening Framework to Local LLMs

Self-Improving Agents: Applying the Sharpening Framework to Local LLMs

This is the second post in a 100-part series, where we take breakthrough AI papers and turn them into working code building the next generation of AI, one idea at a time.

🔧 Summary

In my previous post, I introduced co_ai a modular implementation of the AI co-scientist concept, inspired by DeepMind’s recent paper Towards an AI Co-Scientist.

But now, we’re going deeper.

This isn’t just about running prompts through an agent system it’s about building something radically different:

Building Clipper: An AI Image Generator You Control

Building Clipper: An AI Image Generator You Control

“If you’ve ever pasted 50 prompts into an image generator one-by-one, this is for you. I hit my limit and built Clipper to solve it.”

📖 Summary

In the previous blog post I wrote a research paper: Cross-Modal Cognitive Mapping. This paper is about turning your conversations into images to gradually map your thought patterns. The implementation of this paper is an application called Prism.

A component of this app is image generation from prompts or your conversations. All of the Foundation models support this but it’s a pretty janky process where you have to generate the prompt paste it into a text box and download the image. I just went through a week of doing this while building a prompt toolkit. While I was doing this I kept wishing I built the app which I’m going to share with you now.

Cross-Modal Cognitive Mapping: A Technical Overview

Cross-Modal Cognitive Mapping

A Technical Overview of System Design and Implementation

Author: Ernan Hughes
Published: April 2025


Abstract

Cross-Modal Cognitive Mapping is a new framework designed to extend traditional text-based cognition modeling into multimodal representations.
This system combines text prompts, visual generation, human selection behavior, and semantic memory retrieval to better understand and track human conceptual architectures.

This post presents a technical overview of the core architecture, database design, embedding workflows, search functionality, and resonance mapping built during the initial research phase.

Uncovering Reasoning in LLMs with Sparse Autoencoders

Summary

Large Language Models (LLMs) like DeepSeek-R1 show remarkable reasoning abilities, but how these abilities are internally represented has remained a mystery. This paper explores the mechanistic interpretability of reasoning in LLMs using Sparse Autoencoders (SAEs) — a tool that decomposes LLM activations into human-interpretable features. In this post, we’ll:

• Explain the SAE architecture used • Compute and visualize ReasonScore • Explore feature steering with sample completions • Provide live visualizations using Python + Streamlit

Optimizing Prompt Generation with MARS and DSPy

🕒 TL;DR

  • We explore MARS, a multi-agent prompt optimizer using Socratic dialogue.
  • We implement it using DSPy + Fin-R1 + EDGAR giving us an end-to-end financial reasoning pipeline.
  • We deploy the whole thing to Hugging Face Spaces with a Gradio UI.

🌟 Introduction

Prompt engineering has become the defining skill of the Large Language Model (LLM) era a delicate balance between science and art. Crafting the perfect prompt often feels like an exercise in intuition, trial, and error. But what if we could take the guesswork out of the process? What if prompts could optimize themselves?

Fin-R1: a Financial Reasoning LLM with Reinforcement Learning and CoT

Introduction

Fin-R1 is a new model specifically fine-tuned for financial reasoning, with performance that beats much larger models like DeepSeek-R1.

This post will use this model and compare it with phi3 across various tasks.

  • phi3 for comparison

Phi-3: a lightweight, general-purpose model known for its efficiency and strong reasoning performance at smaller parameter scales. It serves as a great baseline for assessing how domain-specific tuning in Fin-R1 improves financial understanding and response structure.