54. BullMQ for Core Bot Job Processing
Status: Accepted Date: 2025-07-06
Context
The mercury-bot orchestrates many tasks: fetching data, running analysis, executing trades, and sending notifications. Many of these tasks are asynchronous and need to be performed reliably in the background. A simple in-memory queue would not be resilient to application crashes, and direct, synchronous calls between services would create a tightly-coupled, non-performant system. We need a robust solution for managing background jobs and asynchronous inter-service communication.
Decision
We will use BullMQ as the exclusive queueing and job processing library for all asynchronous tasks within the mercury-bot's scope. This includes, but is not limited to:
- Triggering a new analysis from the
Thothmodule. - Requesting market data from the
Minervamodule. - Executing a trade via the
Hermesmodule. - Sending a notification via the
Telegrammodule.
The mercury-bot application will act as both a producer (adding jobs to queues) and a consumer (processing jobs from queues), leveraging the pattern defined in adr://consumer-pattern. This is distinct from the dedicated admin-jobs queue (adr://bullmq-admin-queue); the bot will manage queues related to the core trading workflow.
Consequences
Positive:
- Reliability & Persistence: BullMQ, backed by Redis, ensures that jobs are not lost if the application restarts. It provides robust features for retries, error handling, and job lifecycle management.
- Asynchronous & Non-Blocking: Enables the bot to manage a high throughput of tasks without blocking its main event loop, leading to better performance and responsiveness.
- Decoupled Communication: Provides a powerful mechanism for decoupled, asynchronous communication between the bot's orchestration logic and the specialized modules it integrates, as described in
adr://comprehensive-module-integration. - Excellent Tooling: BullMQ has good monitoring and dashboarding capabilities, which are essential for observing the state of our job queues in a production environment.
Negative:
- Added Dependency: It introduces a dependency on a Redis server, which must be highly available.
- Complexity of Asynchronous Logic: Debugging and reasoning about asynchronous, queue-based workflows can be more complex than traditional synchronous code.
Mitigation:
- High-Availability Redis: Our infrastructure plan already includes a highly available Redis cluster, as it's a core component for multiple services. This dependency is already accounted for.
- Observability: We will heavily leverage BullMQ's monitoring tools and integrate queue metrics (e.g., queue length, wait times, failure rates) into our Grafana dashboards. This visibility is key to managing the complexity of asynchronous systems.
- Structured Logging: All queue jobs and workers will have structured logging, including a unique
jobIdin every log message, to make it possible to trace the lifecycle of a specific task across producers and consumers.