79. Lock-Based Concurrency Control
Status: Accepted Date: 2025-07-06
Context
The michi system operates by executing a sequence of commands: git pull, modify files, git commit, git push. This sequence is not atomic. If two processes (e.g., two AI agents, or an agent and a human) try to run a michi command at the same time, they could interfere with each other, leading to a race condition. For example, both could pull the same initial state, but one's push would invalidate the other's commit, causing the second push to fail. We need a simple mechanism to ensure that only one process is modifying the task files at any given time.
Decision
We will implement a simple file-based locking mechanism to ensure mutual exclusion for all michi operations.
Before starting its sequence of Git operations, a michi script must first acquire a lock. The lock will be implemented as a simple lockfile (e.g., .michi.lock) in the root of the repository.
- Acquire Lock: A script will attempt to create the
.michi.lockfile. If the file already exists, it means another process holds the lock, and the script will wait or exit. - Write PID and Timestamp: Upon acquiring the lock, the script will write its Process ID (PID) and a timestamp into the lockfile. This helps in identifying stale locks.
- Time-To-Live (TTL): The lock is considered "stale" if it has existed for longer than a predefined TTL (e.g., 5 minutes). A new process is allowed to forcibly take over a stale lock, which prevents a crashed agent from holding the lock indefinitely.
- Release Lock: After the
git pushcommand completes successfully, the script will delete the.michi.lockfile, releasing the lock for other processes.
Consequences
Positive:
- Prevents Race Conditions: Effectively prevents multiple processes from interfering with each other, ensuring that the Git-based operations remain clean and conflicts are minimized.
- Simple to Implement: A file-based lock is extremely simple to implement using standard shell commands (
mkdirorflock) and requires no external services. - Stale Lock Cleanup: The TTL mechanism provides a robust way to automatically clean up stale locks left behind by crashed or hung processes.
Negative:
- Reduces Concurrency: The lock is global for the entire
michisystem. While one process is working, all others must wait. This effectively serializes all task operations. - Doesn't Work on Non-Local Filesystems: Simple file-based locks may not be reliable on certain types of network filesystems (like NFS) if not implemented carefully.
Mitigation:
- Acceptable for Single-Agent Workflow: The primary workflow is a single developer and their AI agent. True concurrency is rare, and serializing operations is an acceptable trade-off for correctness and simplicity. The operations are also very fast, so the lock is not held for long.
- Targeted for Local Filesystems: The system is designed to run on a local developer machine where the filesystem is local and file locking is reliable.
- Robust Implementation: Use a robust atomic method for creating the lockfile (e.g.,
mkdiris atomic on POSIX systems) to avoid race conditions in the locking logic itself.