Skip to main content

54. BullMQ for Core Bot Job Processing

Status: Accepted Date: 2025-07-06

Context

The mercury-bot orchestrates many tasks: fetching data, running analysis, executing trades, and sending notifications. Many of these tasks are asynchronous and need to be performed reliably in the background. A simple in-memory queue would not be resilient to application crashes, and direct, synchronous calls between services would create a tightly-coupled, non-performant system. We need a robust solution for managing background jobs and asynchronous inter-service communication.

Decision

We will use BullMQ as the exclusive queueing and job processing library for all asynchronous tasks within the mercury-bot's scope. This includes, but is not limited to:

  • Triggering a new analysis from the Thoth module.
  • Requesting market data from the Minerva module.
  • Executing a trade via the Hermes module.
  • Sending a notification via the Telegram module.

The mercury-bot application will act as both a producer (adding jobs to queues) and a consumer (processing jobs from queues), leveraging the pattern defined in adr://consumer-pattern. This is distinct from the dedicated admin-jobs queue (adr://bullmq-admin-queue); the bot will manage queues related to the core trading workflow.

Consequences

Positive:

  • Reliability & Persistence: BullMQ, backed by Redis, ensures that jobs are not lost if the application restarts. It provides robust features for retries, error handling, and job lifecycle management.
  • Asynchronous & Non-Blocking: Enables the bot to manage a high throughput of tasks without blocking its main event loop, leading to better performance and responsiveness.
  • Decoupled Communication: Provides a powerful mechanism for decoupled, asynchronous communication between the bot's orchestration logic and the specialized modules it integrates, as described in adr://comprehensive-module-integration.
  • Excellent Tooling: BullMQ has good monitoring and dashboarding capabilities, which are essential for observing the state of our job queues in a production environment.

Negative:

  • Added Dependency: It introduces a dependency on a Redis server, which must be highly available.
  • Complexity of Asynchronous Logic: Debugging and reasoning about asynchronous, queue-based workflows can be more complex than traditional synchronous code.

Mitigation:

  • High-Availability Redis: Our infrastructure plan already includes a highly available Redis cluster, as it's a core component for multiple services. This dependency is already accounted for.
  • Observability: We will heavily leverage BullMQ's monitoring tools and integrate queue metrics (e.g., queue length, wait times, failure rates) into our Grafana dashboards. This visibility is key to managing the complexity of asynchronous systems.
  • Structured Logging: All queue jobs and workers will have structured logging, including a unique jobId in every log message, to make it possible to trace the lifecycle of a specific task across producers and consumers.