• Thoughts of Algorithms

    How a self-evolving AI learns to reflect, score, and rewrite its own reasoning

    đź§Ş Summary

    What if an AI could think not just solve problems, but reevaluate its beliefs in the face of new information?

    In this post, we introduce a system that does exactly that. At the core of our pipeline is a lightweight scoring model called MR.Q, responsible for evaluating ideas and choosing the best ones. But when it encounters a new domain, a new goal, or a shift in task format, it doesn’t freeze it adapts.

  • General Reasoner: The smarter Local Agent

    đź”§ Summary

    The General Reasoner paper shows how we can train LLMs to reason across domains using diverse data and a generative verifier. In this post, I walk through our open-source implementation showing how we built a modular reasoning agent capable of generating multiple hypotheses, evaluating them with an LLM-based judge, and selecting the best answer.


    đź§  What We Built

    We built a GeneralReasonerAgent that:

    • Dynamically generates multiple hypotheses using different reasoning strategies (e.g., cot, debate, verify_then_answer, etc.)
    • Evaluates each pair of hypotheses using either a local LLM judge or our custom MR.Q evaluator
    • Classifies the winning hypothesis using rubric dimensions
    • Logs structured results to a PostgreSQL-backed system

    All of this was integrated with our existing stephanie framework, which includes: