Building an Agent That Leverages Throwaway Code

Building an Agent That Leverages Throwaway Code
好的，我现在需要帮用户总结一篇英文文章，控制在100字以内。首先，我得仔细阅读文章内容，理解主要观点。文章主要讲的是用代码生成来解决非编码问题，特别是使用Pyodide这个工具。Pyodide是一个在WebAssembly上运行的Python解释器，作者提到它安装简单，适合在Node环境中使用。此外，文章还提到了文件系统的重要性，以及如何通过虚拟文件系统安全地访问外部资源。还有持久执行的概念，即如何在中断后恢复任务状态，避免重复工作。作者还讨论了除了代码之外的其他工具，比如描述工具和帮助工具，并举了一个生成IP地址图片的例子。最后，文章提到这种方法已经被Anthropic和Cloudflare采用，并提供了进一步阅读的资源。总结时需要抓住关键点：代码生成、Pyodide、文件系统、持久执行、工具扩展和实际应用案例。确保语言简洁明了，不超过100字。 </think> 文章探讨了利用代码生成解决非编码问题的方法，介绍了Pyodide作为Python解释器在WebAssembly中的应用，并强调虚拟文件系统和持久执行的重要性。通过示例展示了如何让AI编写代码完成任务，并提及其他工具扩展的可能性。 2025-10-17 00:0:0 Author: lucumr.pocoo.org(查看原文) 阅读量:0 收藏

written on October 17, 2025

In August I wrote about my experiments with replacing MCP (Model Context Protocol) with code. In the time since I utilized that idea for exploring non-coding agents at Earendil. And I’m not alone! In the meantime, multiple people have explored this space and I felt it was worth sharing some updated findings. The general idea is pretty simple. Agents are very good at writing code, so why don’t we let them write throw-away code to solve problems that are not related to code at all?

I want to show you how and what I’m doing to give you some ideas of what works and why this is much simpler than you might think.

Pyodide is the Dark Horse

The first thing you have to realize is that Pyodide is secretly becoming a pretty big deal for a lot of agentic interactions. What is Pyodide? Pyodide is an open source project that makes a standard Python interpreter available via a WebAssembly runtime. What is neat about it is that it has an installer called micropip that allows it to install dependencies from PyPI. It also targets the emscripten runtime environment, which means there is a pretty good standard Unix setup around the interpreter that you can interact with.

Getting Pyodide to run is shockingly simple if you have a Node environment. You can directly install it from npm. What makes this so cool is that you can also interact with the virtual file system, which allows you to create a persistent runtime environment that interacts with the outside world. You can also get hosted Pyodide at this point from a whole bunch of startups, but you can actually get this running on your own machine and infrastructure very easily if you want to.

The way I found this to work best is if you banish Pyodide into a web worker. This allows you to interrupt it in case it runs into time limits.

A big reason why Pyodide is such a powerful runtime, is because Python has an amazing ecosystem of well established libraries that the models know about. From manipulating PDFs or word documents, to creating images, it’s all there.

File Systems Are King

Another vital ingredient to a code interpreter is having a file system.

Not just any file system though. I like to set up a virtual file system that I intercept so that I can provide it with access to remote resources from specific file system locations. For instance, you can have a folder on the file system that exposes files which are just resources that come from your own backend API. If the agent then chooses to read from those files, you can from outside the sandbox make a safe HTTP request to bring that resource into play. The sandbox itself does not have network access, so it’s only the file system that gates access to resources.

The reason the file system is so good is that agents just know so much about how they work, and you can provide safe access to resources through some external system outside of the sandbox. You can provide read-only access to some resources and write access to others, then access the created artifacts from the outside again.

Now actually doing that is a tad tricky because the emscripten file system is sync, and most of the interesting things you can do are async. The option that I ended up going with is to move the fetch-like async logic into another web worker and use Atomics.wait to block. If your entire Pyodide runtime is in a web worker, that’s not as bad as it looks.

That said, I wish the emscripten file system API was changed to support stack swiching instead of this. While it’s now possible to hide async promises behind sync abstractions within Pyodide with call_sync, the same approach does not work for the emscripten JavaScript FS API.

I have a full example of this at the end, but the simplified pseudocode that I ended up with looks like this:

// main thread: wrap a worker so fetch() looks synchronous
fetch(url) {
  const signalBuffer = new SharedArrayBuffer(4);
  const signal = new Int32Array(signalBuffer);
  const { port1, port2 } = new MessageChannel();
  this.worker.postMessage({url, signalBuffer, port: port2}, [port2]);

  Atomics.wait(signal, 0, 0);                   // park until worker flips the signal
  const message = receiveMessageOnPort(port1);  // MessageChannel gives the payload
  port1.close();

  if (message.message.status !== "ok") {
    throw new Error(message.message.error.message);
  }
  return message.message.data;
}

// worker thread: perform async fetch, then wake the main thread
parentPort.on("message", async ({ url, signalBuffer, port }) => {
  const signal = new Int32Array(signalBuffer);
  try {
    const bytes = await fetch(url).then(r => {
      if (!r.ok) throw new Error(`HTTP ${r.status}`);
      return r.arrayBuffer();
    });
    port.postMessage({ status: "ok", data: new Uint8Array(bytes) });
    Atomics.store(signal, 0, 1);          // mark success
  } catch (error) {
    port.postMessage({ status: "error", error: serialize(error) });
    Atomics.store(signal, 0, -1);         // mark failure
  } finally {
    Atomics.notify(signal, 0);            // unblock the waiting main thread
    port.close();
  }
});

Durable Execution

Lastly now that you have agents running, you really need durable execution. I would describe durable execution as the idea of being able to retry a complex workflow safely without losing progress. The reason for this is that agents can take a very long time, and if they interrupt, you want to bring them back to the state they were in. This has become a pretty hot topic. There are a lot of startups in that space and you can buy yourself a tool off the shelf if you want to.

What is a little bit disappointing is that there is no truly simple durable execution system. By that I mean something that just runs on top of Postgres and/or Redis in the same way as, for instance, there is pgmq.

The easiest way to shoehorn this yourself is to use queues to restart your tasks and to cache away the temporary steps from your execution. Basically, you compose your task from multiple steps and each of the steps just has a very simple cache key. It’s really just that simple:

function myAgenticLoop(taskID, initialState) {
  let stepCount = 0;
  let state = initialState;
  while (stepCount < MAX_STEPS) {
    let cacheKey = `${taskID}:${stepCount}`;
    let cachedState = loadStateFromCache(cacheKey);
    if (cachedState !== null) {
      state = cachedState.state;
    } else {
      state = runAgenticStep(state);
      storeStateInCache(cacheKey, state);
    }
    stepCount++;
    if (reachedEndCondition(state)) {
      break;
    }
  }
  return state;
}

You can improve on this greatly, but this is the general idea. The state is basically the conversation log and whatever else you need to keep around for the tool execution (e.g., whatever was thrown on the file system).

What Other Than Code?

What tools does an agent need that are not code? Well, the code needs to be able to do something interesting so you need to give it access to something. The most interesting access you can provide is via the file system, as mentioned. But there are also other tools you might want to expose. What Cloudflare proposed is connecting to MCP servers and exposing their tools to the code interpreter. I think this is a quite interesting approach and to some degree it’s probably where you want to go.

Some tools that I find interesting:

Describe: a tool that just lets the agent run more inference, mostly with files that the code interpreter generated. For instance if you have a zip file it’s quite fun to see the code interpreter use Python to unpack it. But if then that unpacked file is a jpg, you will need to go back to inference to understand it.
Help: a tool that just … brings up help. Again, can be with inference for basic RAG, or similar. I found it quite interesting to let the AI ask it for help. For example, you want the manual tool to allow a query like “Which Python code should I write to create a chart for the given XLSX file?” On the other hand, you can also just stash away some instructions in .md files on the virtual file system and have the code interpreter read it. It’s all an option.

Putting it Together

If you want to see what this roughly looks like, I vibe-coded a simple version of this together. It uses a made-up example but it does show how a sandbox with very little tool availability can create surprising results: mitsuhiko/mini-agent.

When you run it, it looks up the current IP from a special network drive that triggers an async fetch, and then it (usually) uses pillow or matplotlib to make an image of that IP address. Pretty pointless, but a lot of fun!

What an agent run looks like

Prompt: Figure out the current ip address and make me a picture of it.

Step 1:
Stop reason: tool_use
Tool call: execute_python {
  code: '\n' +
    '# Read the current IP address\n' +
    "with open('/network/current-ip', 'r') as f:\n" +
    '    ip_address = f.read().strip()\n' +
    '    \n' +
    'print(f"Current IP Address: {ip_address}")\n'
}
Tool result: {
  stdout: 'Current IP Address: 255.255.255.255',
  stderr: '',
  success: true
}
[Cache] Saved state to: agent-cache/task-0-step-1.json

Step 2:
Stop reason: tool_use
Tool call: execute_python {
  code: '\n' +
    'import matplotlib.pyplot as plt\n' +
    'import matplotlib.patches as patches\n' +
    'from matplotlib.patches import FancyBboxPatch\n' +
    'import numpy as np\n' +
    '\n' +
    '# Create figure and axis\n' +
    'fig, ax = plt.subplots(1, 1, figsize=(12, 8))\n' +
    'ax.set_xlim(0, 10)\n' +
    'ax.set_ylim(0, 10)\n' +
    "ax.axis('off')\n" +
    '\n' +
    '# Add background gradient\n' +
    'gradient = np.linspace(0, 1, 256).reshape(1, -1)\n' +
    'gradient = np.vstack([gradient] * 256)\n' +
    "ax.imshow(gradient, extent=[0, 10, 0, 10], aspect='auto', cmap='cool', alpha=0.3)\n" +
    '\n' +
    '# Title\n' +
    "ax.text(5, 9, 'Current IP Address', fontsize=28, weight='bold', \n" +
    "        ha='center', va='top', color='#1a1a1a')\n" +
    '\n' +
    '# IP Address Box\n' +
    'fancy_box = FancyBboxPatch((0.5, 4.5), 9, 2.5, \n' +
    '                           boxstyle="round,pad=0.2", \n' +
    "                           edgecolor='#2196F3', facecolor='#E3F2FD',\n" +
    '                           linewidth=3)\n' +
    'ax.add_patch(fancy_box)\n' +
    '\n' +
    '# IP Address Text\n' +
    "ax.text(5, 5.75, ip_address, fontsize=48, weight='bold', \n" +
    "        ha='center', va='center', color='#1976D2', family='monospace')\n" +
    '\n' +
    '# Decorative elements - network nodes\n' +
    'circle_positions = [(2, 2), (8, 2), (2, 8), (8, 8), (5, 1), (5, 9)]\n' +
    'for pos in circle_positions:\n' +
    "    circle = patches.Circle(pos, 0.3, color='#FF5722', alpha=0.7)\n" +
    '    ax.add_patch(circle)\n' +
    '\n' +
    '# Connection lines\n' +
    'for i in range(len(circle_positions)):\n' +
    '    for j in range(i+1, len(circle_positions)):\n' +
    '        x = [circle_positions[i][0], circle_positions[j][0]]\n' +
    '        y = [circle_positions[i][1], circle_positions[j][1]]\n' +
    "        ax.plot(x, y, color='#2196F3', alpha=0.2, linewidth=1)\n" +
    '\n' +
    '# Info text\n' +
    "ax.text(5, 3.8, '🌐 Your Network Location 🌐', fontsize=14, \n" +
    "        ha='center', va='top', color='#555', style='italic')\n" +
    '\n' +
    'plt.tight_layout()\n' +
    "plt.savefig('/output/ip_address.png', dpi=150, bbox_inches='tight', facecolor='white')\n" +
    'print("✅ Picture saved to /output/ip_address.png")\n' +
    'plt.close()\n' +
    '\n' +
    'print(f"\\n📍 Your IP Address: {ip_address}")\n'
}
Tool result: {
  stdout: 'Loading Pillow, contourpy, cycler, fonttools, kiwisolver, matplotlib, numpy, packaging, pyparsing, python-dateutil, pytz, sixLoaded Pillow, contourpy, cycler, fonttools, kiwisolver, matplotlib, numpy, packaging, pyparsing, python-dateutil, pytz, six',
  stderr: 'Traceback (most recent call last):\n' +
    '  File "/lib/python313.zip/_pyodide/_base.py", line 597, in eval_code_async\n' +
    '    await CodeRunner(\n' +
    '    ...<9 lines>...\n' +
    '    .run_async(globals, locals)\n' +
    '  File "/lib/python313.zip/_pyodide/_base.py", line 411, in run_async\n' +
    '    coroutine = eval(self.code, globals, locals)\n' +
    '  File "<exec>", line 8, in <module>\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 1613, in subplots\n' +
    '    fig = figure(**fig_kw)\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 934, in figure\n' +
    '    manager = new_figure_manager(\n' +
    '        num, figsize=figsize, dpi=dpi,\n' +
    '        facecolor=facecolor, edgecolor=edgecolor, frameon=frameon,\n' +
    '        FigureClass=FigureClass, **kwargs)\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 464, in new_figure_manager\n' +
    '    _warn_if_gui_out_of_main_thread()\n' +
    '    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 441, in _warn_if_gui_out_of_main_thread\n' +
    '    canvas_class = cast(type[FigureCanvasBase], _get_backend_mod().FigureCanvas)\n' +
    '                                                ~~~~~~~~~~~~~~~~^^\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 280, in _get_backend_mod\n' +
    '    switch_backend(rcParams._get("backend"))  # type: ignore[attr-defined]\n' +
    '    ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/pyplot.py", line 342, in switch_backend\n' +
    '    module = importlib.import_module(cbook._backend_module_name(newbackend))\n' +
    '  File "/lib/python313.zip/importlib/__init__.py", line 88, in import_module\n' +
    '    return _bootstrap._gcd_import(name[level:], package, level)\n' +
    '           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n' +
    '  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import\n' +
    '  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load\n' +
    '  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked\n' +
    '  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked\n' +
    '  File "<frozen importlib._bootstrap_external>", line 1026, in exec_module\n' +
    '  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed\n' +
    '  File "/lib/python3.13/site-packages/matplotlib/backends/backend_webagg.py", line 23, in <module>\n' +
    '    from js import document\n' +
    "ImportError: cannot import name 'document' from 'js' (unknown location)\n",
  success: false
}
[Cache] Saved state to: agent-cache/task-0-step-2.json

Step 3:
Stop reason: tool_use
Tool call: execute_python {
  code: '\n' +
    'from PIL import Image, ImageDraw, ImageFont\n' +
    'import os\n' +
    '\n' +
    '# Read the current IP address\n' +
    "with open('/network/current-ip', 'r') as f:\n" +
    '    ip_address = f.read().strip()\n' +
    '\n' +
    '# Create a new image with a nice gradient-like background\n' +
    'width, height = 1200, 800\n' +
    "image = Image.new('RGB', (width, height), color='white')\n" +
    'draw = ImageDraw.Draw(image)\n' +
    '\n' +
    '# Draw background with gradient effect using rectangles\n' +
    'for i in range(height):\n' +
    '    ratio = i / height\n' +
    '    r = int(30 + (100 * ratio))\n' +
    '    g = int(100 + (150 * ratio))\n' +
    '    b = int(200 + (55 * ratio))\n' +
    '    draw.rectangle([(0, i), (width, i+1)], fill=(r, g, b))\n' +
    '\n' +
    '# Draw main box\n' +
    'box_margin = 100\n' +
    'box_top = 250\n' +
    'box_height = 300\n' +
    'draw.rectangle(\n' +
    '    [(box_margin, box_top), (width - box_margin, box_top + box_height)],\n' +
    '    fill=(240, 248, 255),\n' +
    '    outline=(33, 150, 243),\n' +
    '    width=5\n' +
    ')\n' +
    '\n' +
    '# Draw rounded corners effect\n' +
    'for i in range(20):\n' +
    '    draw.arc([(box_margin, box_top), (box_margin + 40, box_top + 40)], \n' +
    '             0, 90, fill=(33, 150, 243), width=5)\n' +
    '\n' +
    '# Add decorative network nodes\n' +
    'node_radius = 15\n' +
    'nodes = [(150, 150), (1050, 150), (150, 650), (1050, 650), (600, 100), (600, 700)]\n' +
    'for node in nodes:\n' +
    '    draw.ellipse(\n' +
    '        [(node[0] - node_radius, node[1] - node_radius),\n' +
    '         (node[0] + node_radius, node[1] + node_radius)],\n' +
    '        fill=(255, 87, 34),\n' +
    '        outline=(200, 50, 0),\n' +
    '        width=2\n' +
    '    )\n' +
    '\n' +
    '# Draw connection lines\n' +
    'for i, node1 in enumerate(nodes):\n' +
    '    for node2 in nodes[i+1:]:\n' +
    '        draw.line([node1, node2], fill=(33, 150, 243, 50), width=1)\n' +
    '\n' +
    '# Try to use a large default font\n' +
    'try:\n' +
    '    title_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 60)\n' +
    '    ip_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuMono-Bold.ttf", 90)\n' +
    '    subtitle_font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 30)\n' +
    'except:\n' +
    '    title_font = ImageFont.load_default()\n' +
    '    ip_font = ImageFont.load_default()\n' +
    '    subtitle_font = ImageFont.load_default()\n' +
    '\n' +
    '# Draw title\n' +
    'draw.text((width // 2, 80), "Current IP Address", font=title_font, \n' +
    '          fill=(255, 255, 255), anchor="mm")\n' +
    '\n' +
    '# Draw IP address\n' +
    'draw.text((width // 2, box_top + box_height // 2), ip_address, font=ip_font, \n' +
    '          fill=(25, 118, 210), anchor="mm")\n' +
    '\n' +
    '# Draw subtitle\n' +
    'draw.text((width // 2, box_top + box_height + 50), "🌐 Your Network Location 🌐", \n' +
    '          font=subtitle_font, fill=(100, 100, 100), anchor="mm")\n' +
    '\n' +
    '# Save the image\n' +
    "output_path = '/output/ip_address.png'\n" +
    'image.save(output_path)\n' +
    'print(f"✅ Picture saved to {output_path}")\n' +
    'print(f"\\n📍 Your IP Address: {ip_address}")\n'
}
Tool result: {
  stdout: 'Pillow already loaded from default channelNo new packages to load✅ Picture saved to /output/ip_address.png📍 Your IP Address: 255.255.255.255',
  stderr: '',
  success: true
}
[Cache] Saved state to: agent-cache/task-0-step-3.json

Step 4:
Stop reason: end_turn

Final result: Perfect! 🎉 I've successfully:

1. **Found your IP address**: **255.255.255.255**
2. **Created a picture** showing your IP address with:
   - A nice gradient blue background
   - Your IP address prominently displayed in the center
   - Decorative network nodes and connection lines
   - Professional styling and formatting

The image has been saved to `/output/ip_address.png` and is ready for you to download!
[Cache] Saved state to: agent-cache/task-0-step-4.json

Total steps: 4

Making 1 file(s) available in ./output:
  ✓ ip_address.png

4he same approach has also been leveraged by Anthropic and Cloudflare. There is some further reading that might give you more ideas:

Claude Skills is fully leveraging code generation for working with documents or other interesting things. Comes with a (non Open Source) repository of example skills that the LLM and code executor can use: anthropics/skills
Cloudflare’s Code Mode which is the idea of creating TypeScript bindings for MCP tools and having the agent write code to use them in a sandbox.

This entry was tagged ai and thoughts

copy as / view markdown

文章来源: https://lucumr.pocoo.org/2025/10/17/code/
如有侵权请联系:admin#unsafe.sh