rohans-vault
← Back to blogs

Task Queues and Background Jobs

·2 min read

Consider you’re a backend engineer assigned a task: implement an email verification workflow when a user signs up to your platform.

Usually you’ll use a 3rd party provider like Resend or Brevo to send emails.

Now ask: what if we don’t do it in the background? What if we send the verification email synchronously inside the signup request?

What goes wrong when it’s synchronous

  • If you don’t have robust error handling, a failure in the “send verification email” step can cause the entire signup request to fail.
  • That creates a bad UX: the user has to come back, hit signup again, and/or trigger “resend email”.
  • Worse, you can end up with partial success: the user record exists, but the client saw an error and the user assumes signup didn’t work.

The correct approach: enqueue the verification task

After signup is complete:

  • Take the data required to send a verification email (email, name, username, userId, etc.)
  • Serialize it (often JSON) and push a task into a queue
  • Once the task is enqueued, return success to the signup request

Because of this workflow, signup succeeds regardless of whether the email provider is slow or temporarily down. The UI can confidently show: “Verification email will be sent to your inbox.”

On the other side, you run consumers / workers that:

  • Pull tasks from the queue
  • Deserialize payload into native types
  • Execute the registered handler (the code that actually sends the email)

This same pattern can power other delivery channels too (in-app notifications, mobile push, etc.) based on configuration.

Many systems also enforce expiry (e.g. the verification task expires after N minutes). After expiry, the user can request a resend.


Retries and exponential backoff

When a task fails (email provider timeout, rate limit, network blip), you typically retry. But retrying immediately in a tight loop can overload a struggling dependency.

Exponential backoff is a retry strategy where the delay increases after each failure. A common pattern looks like:

  • Attempt 1: wait 2s
  • Attempt 2: wait 4s
  • Attempt 3: wait 8s
  • Attempt 4: wait 16s

In general, the delay grows like (baseDelay \times 2^{attempt-1}), often with a maximum cap, and usually with jitter (randomness) so a fleet of workers doesn’t retry at the exact same time.


Libraries (by language)

For most languages there are libraries that handle these workflows, but it’s still valuable to understand what’s happening under the hood.

  • Python: Celery
  • Node.js: BullMQ
  • Go: asynq

Where task queues are useful

  1. Sending emails
  2. Processing images/videos into multiple formats and resolutions for delivery optimization (e.g. streaming/CDN pipelines)
  3. Generating reports (especially long-running financial reports)
  4. Sending push notifications

Rohan Sawai — Backend engineer writing about distributed systems.

More posts