Skip to main content

40. Direct Code Modification by Agents

Status: Accepted Date: 2025-07-06

Context

The Mercury experimental framework is designed to be augmented by AI agents that can propose and run new experiments. We need a simple and transparent way for these agents to modify experiment configurations. Building a complex API for the agent to call would be a significant engineering effort and would obscure the changes being made.

Decision

The primary method for an AI agent to interact with the experimental framework will be through direct modification of the version-controlled TypeScript configuration files. The agent will be given the ability to read the configuration files, edit them to change experimental parameters, and commit the changes to a Git branch. This approach treats the agent as a developer, leveraging the existing, well-understood workflow of code changes and commits.

Consequences

Positive:

  • Simplicity: This is far simpler than designing, building, and maintaining a complex API for experiment orchestration.
  • Transparency & Auditability: All changes made by the agent are captured directly in Git history. This makes it easy to see exactly what was changed, when, and by which agent. Pull requests provide a natural venue for human review.
  • Flexibility: The agent has the full power of the TypeScript configuration language to define complex experiments, without being limited by a restrictive API.

Negative:

  • Security Risk: Allowing an agent to directly modify source code is a potential security risk if not properly sandboxed and monitored.
  • Risk of Invalid Code: The agent could potentially generate syntactically incorrect or logically flawed configuration code that could break the application on restart.

Mitigation:

  • Sandboxing & Permissions: The agent will operate in a properly sandboxed environment with limited permissions, only allowing it to modify specific configuration files.
  • Manual Restart Control: As per adr://manual-restart-control, no changes made by an agent can affect the running system until a human operator reviews the committed code and manually restarts the instance.
  • CI/CD Validation: All code committed by an agent will be subject to the same CI/CD pipeline as human-written code, including linting, type-checking (tsc), and running tests. The application will not be restartable if these checks fail.
  • Code Review: Changes made by an agent should be submitted as a pull request, allowing for human review and approval before being merged into the main experimental branch.