Skip to content
AlphaForge

AI agent

AlphaForge

Quant Strategy ,Two Claude models in a 24/7 plan-execute loop. Canary tests guard the engine.

XJB

ProOpen

@xjb

7 views2 likesno comments
Built with
Claude
Stack
PythonClaude CodeAnthropic APIPostgreSQL
Dev time
80 hours

Report this content

Tell us briefly what's wrong (spam, harassment, copyright, illegal content, etc.). A moderator will review.

5–1000 characters. Reports are private — only moderators see them.

Buy AlphaForge — $29

AlphaForge is an autonomous quant strategy research loop where two Claude models collaborate to iterate — one plans, one executes — running 24/7 with canary tests guarding the engine.

Why two models

Splitting plan and execute across two models fixes the context-switching cost of single-model loops:

  • Opus = planner. Reads recent results, decides the next hypothesis, picks parameters to sweep. No code. Pure judgment.
  • Sonnet = executor. Takes the plan, modifies the research code, runs the backtest, commits if green / rolls back if red. No planning. Pure execution.

Each model stays inside its own context window. Both are "thinking deeply" about different layers of the same problem.

The /auto loop

A scheduler triggers every 20 minutes. Each tick:

  1. Opus reads the latest backtest output + research log
  2. Opus emits a plan (next hypothesis, parameter ranges, expected outcome)
  3. Sonnet picks up the plan, writes the code, runs the backtest
  4. Result is judged against the canary tests
  5. All canaries green → commit. Any red → rollback + log why.

Hard cap of 8 steps per batch. Auto-locks during execution.

Canary tests

The biggest risk in autonomous research is silent regression — a refactor that passes type checks but breaks a core invariant nobody notices. AlphaForge has 14 canaries covering Layer 1 (factor library, feature engine, meta-labeling). Any Layer 1 change must pass all 14 before it ships.

The difference between "agent that writes code" and "agent that ships safely."

Stack

Python + Claude Code + Anthropic API (planner + executor calls). Walk-forward validation + bootstrap resampling. Postgres for run history.

What I learned

  • Plan-execute splits beat single-model loops on long-horizon research
  • Canaries are the only thing between "autonomous" and "autonomous and broken"
  • 20-minute cadence > continuous (model fatigue is real)
  • The framework outlasted any single hypothesis I tested with it

AlphaForge 是一個自動化量化策略開發Agent:兩個 Claude 模型分工接力,一個規劃、一個執行,每天 24 小時不停轉,canary 測試把守引擎。

為什麼用兩個模型

把規劃跟執行拆給兩個模型,解掉單模型循環的 context 切換成本:

  • Opus = planner。讀最近的結果,決定下一個假說,挑哪些參數要 sweep。不寫 code,純判斷。
  • Sonnet = executor。接 plan,改研究 code,跑 backtest,綠了就 commit、紅了就 rollback。不規劃,純執行。

兩個模型各自待在自己的 context 視窗裡,都在「深度思考」同一個問題的不同層。

/auto 循環

scheduler 每 20 分鐘觸發一次。每一 tick:

  1. Opus 讀最新的 backtest 輸出 + 研究日誌
  2. Opus 產出 plan(下一個假說、參數範圍、預期結果)
  3. Sonnet 接 plan,寫 code,跑 backtest
  4. 結果丟進 canary 測試判定
  5. 全綠 → commit。有紅 → rollback,記下原因。

每批硬上限 8 步。執行時自動上鎖。

Canary金絲雀 測試

自動研究最大的風險是 silent regression — 一次 refactor 通過 type check,但默默把一個 core invariant 弄壞,沒人察覺。AlphaForge 有 14 個 canary 守住 Layer 1(因子庫、特徵引擎、meta-labeling)。任何 Layer 1 改動都得先過 14 個才放行。

這就是「會寫 code 的 agent」和「能安全發佈的 agent」的差別。

技術棧

Python + Claude Code + Anthropic API(planner + executor 兩端)。Walk-forward validation + bootstrap resampling。Postgres 存執行紀錄。

學到的事

  • Plan-execute 分模型在長期研究任務上贏單模型循環
  • Canary 是「自動」和「自動但壞掉」之間唯一的防線
  • 20 分鐘的節奏 > 連續執行(模型疲勞是真的)
  • 框架本身比我用它測的任何單一假說都活得久

Be the first to comment

Markdown supported — **bold**, _italic_, `code`, lists, links.

Cookies — your call.

Strictly necessary cookies keep you signed in and remember your language. Analytics cookies (PostHog) help us see which features are useful. You decide. More detail