Shipped, not promised

Changelog

What shipped, and when. Most entries link to the live tool or page; a few are infrastructure (eval coverage, publish prep) that don't have their own URL but still count.

  1. May 2026

    Workflow Replay — flame-graph timeline at /workflow/replay

    feature
    • Visualise any workflow JSON trace as a Gantt-style timeline — parallel batches share an x-axis range so fan-out is literally visible.
    • Three metric cards show wall-clock total vs sum of per-step latencies vs the effective speedup multiplier.
    • Click any bar to drill into that step's output, assertion results, and error message.
  2. May 2026

    /api/workflow/run — workflows as REST APIs

    feature
    • POST a workflow JSON, get a typed trace back. Same step semantics as the browser runner and workflow-cli.
    • Rate-limited, SSRF-guarded, 20-step cap, 600-token output cap per step.
    • A new "Run via API" button on /workflow invokes the endpoint inline and surfaces the curl equivalent for external callers.
  3. May 2026

    Workflow cost estimator with model picker

    feature
    • Per-step token-count + USD estimate, switchable between Gemini Flash Lite / Flash / Pro, Claude Haiku 4.5 / Sonnet 4.6, GPT-5 Mini / GPT-5.
    • Toolbar pill shows the typical-to-max cost range for the whole workflow before you click Run.
    • Flags runIf-conditional steps as may-not-fire and excludes them from the headline total.
  4. May 2026

    commit-changelog-cli — git log → polished changelog

    release
    • Two modes: a free heuristic that groups Conventional Commits, and `--ai` that hands the commits to an LLM to rewrite each entry as a user-facing one-liner.
    • Reads `git log` directly or accepts a piped log from stdin. Writes to stdout or a file.
    • Slot into release CI: --since=<previous tag> --until=<new tag> --ai --out=NOTES.md.
  5. May 2026

    workflow-cli — same workflows, on the command line

    release
    • New OSS package that runs the exact JSON shape the browser builder produces — design in /workflow, ship the JSON to CI.
    • Supports every step type and modifier (llm/fetch/assert · parallel · runIf · contains/regex/equals/llm-judge).
    • Non-zero exit on assertion failure so it drops into any GitHub Action / Jenkins / GitLab CI.
  6. May 2026

    Workflow Builder: self-improving + trace export + AI generator

    feature
    • Describe a workflow in plain English; an LLM emits the JSON, validated against the schema, and loads into the builder.
    • After a run, "Suggest improvements" sends the workflow + actual run trace back to an LLM that proposes a revised JSON; a diff summary previews the changes before applying.
    • Share a customised workflow as a base64url ?w= link; export a run as a markdown post-mortem (prompts + outputs + latencies + assertion results); auto-rendered Mermaid flow diagram updates with every edit.
  7. May 2026

    Workflow Builder v0.4 — parallel step execution

    feature
    • Any step can declare `parallel: true` — consecutive parallel-flagged steps run as a single Promise.allSettled batch.
    • New preset: three reviewers (correctness · security · style) fan out concurrently, then a synthesiser sees all three outputs.
    • UI shows wall-clock vs would-be-sequential time so the speedup is visible.
  8. May 2026

    Prompt A/B Tester — new AI Lab

    feature
    • Test two system prompts side by side against the same input. Both stream in parallel; differing tokens are highlighted.
    • Optional LLM-as-judge declares a winner with a one-sentence reason.
  9. May 2026

    /api-to-mcp now ships a downloadable .zip

    feature
    • Goes from "paste an OpenAPI URL" to "unzip && npm install && run" in two clicks.
    • Bundles src/index.ts + package.json (with SDK pinned) + tsconfig + README with Claude Desktop snippet + .gitignore.
    • Backed by lib/build-zip.ts — a dependency-free stored-mode ZIP encoder with a real CRC-32 table.
  10. May 2026

    PR Reviewer eval coverage — 4/4 dimensions

    infra
    • Added security, style, and tests suites alongside the existing correctness suite.
    • Each suite tests both detection (does it catch real bugs?) and lane discipline (does it refuse to comment on out-of-scope issues?).
    • Every reviewer dimension in components/pr-reviewer.tsx now has a regression net.
  11. May 2026

    Workflow Builder v0.3 — runIf conditional steps

    feature
    • Any step can declare a `runIf` condition referencing a prior output — false condition → skipped.
    • Two mutually-exclusive conditions in sequence form an if/else branch without needing a DAG.
    • New preset: classify an email then route to apology / answer / thanks based on the label.
  12. May 2026

    /toolkit landing page + live embedded demo

    feature
    • A single narrative page tying every AI tool into one end-to-end story (design → evaluate → review → integrate → embed).
    • Embedded a real 2-step workflow that runs in-browser so visitors see the chaining principle working, not just described.
  13. May 2026

    Four OSS packages — npm-publish ready

    infra
    • ai-eval, create-mcp-server, @amitshrivastava/portfolio-mcp, @siteask/widget now have repository, bugs, homepage, prepack, and publishConfig fields.
    • @siteask/widget tsup config moved to tsup.config.ts so outExtension is a function (newer tsup API).
    • Root PUBLISH.md documents the end-to-end runbook.
  14. May 2026

    Workflow Builder v0.2 — fetch + assert step types

    feature
    • Added `fetch` step (server proxy with SSRF guards, 8s timeout, 50KB cap, per-IP rate limit).
    • Added `assert` step with contains / not-contains / regex / equals / llm-judge assertions.
    • Added "Export as ai-eval YAML" so a designed workflow can be regression-tested in CI.
  15. May 2026

    Self-dogfooded eval suites

    infra
    • Started running ai-eval against this portfolio's own system prompts — Ask Amit grounding, PR Reviewer correctness, Commit Message Generator format.
    • Same harness shipped as OSS now catches its own author's regressions.
  16. April 2026

    Workflow Builder v0.1 — initial release

    release
    • Three curated presets (Research → Summarise → Tweet · Code → Review → Refactor · Email → Classify → Reply).
    • Per-step streaming, in-place editing of prompts and inputs, {{stepId.output}} substitution.
  17. April 2026

    ai-eval — open-source LLM eval harness

    release
    • CLI + GitHub Action + web viewer. YAML test cases, five assertion types, OpenRouter-backed.
    • Non-zero exit on failure for CI gating.
  18. April 2026

    Multi-Agent PR Reviewer

    release
    • Four specialist reviewers (correctness · security · style · tests) run in parallel on any public PR.
    • A lead reviewer synthesises a structured verdict (LGTM / LGTM_WITH_NITS / NEEDS_CHANGES / BLOCK).
  19. April 2026

    AI Commit Message Generator

    release
    • Paste a git diff, get a Conventional Commits message with the right type, scope, subject, and a body explaining the WHY.
  20. March 2026

    create-mcp-server — scaffolder CLI

    release
    • `npm create mcp-server@latest` walks you through naming and template selection, then writes a complete TypeScript MCP server.
    • Two templates: blank (one example tool) and markdown-rag (exposes ./content/ as MCP resources with a keyword search tool).
  21. March 2026

    portfolio-mcp — real working MCP server

    release
    • Exposes this portfolio's blog posts to Claude Desktop, Cursor, and Cline via stdio MCP.
    • Companion to the MCP Inspector demo at /ai-labs/mcp-inspector.
  22. March 2026

    SiteAsk — drop-in AI chat widget

    release
    • Browser-side RAG with all-MiniLM-L6-v2 embeddings. BYOK LLM. Source-citable answers. One <script> tag.
    • No vector DB, no per-chat SaaS fees.
  23. March 2026

    AI Labs — 15 browser-runnable demos

    release
    • Multi-Agent Orchestrator · Tool-Use Simulator · Agent Memory · Reasoning Visualizer · RAG Playground · Hallucination Detector · …
    • All demos run client-side (or through the rate-limited /api/demo/generate proxy).

The whole story is on /toolkit

This changelog is one timeline; the toolkit page arranges the same pieces by role — design, evaluate, review, integrate, embed.

Open the toolkit