How to Set Up Cron Job Failure Notifications That Actually Work
Your cron job failed last Tuesday. You found out on Friday when a customer asked why their report was missing. Sound familiar? Here are three approaches to cron job failure alerts—from quick-and-dirty to bulletproof—and why only one of them catches every type of failure.
The Problem: Cron Jobs Fail Silently
Cron was designed in the 1970s. It runs your command on a schedule. If the command fails, cron does exactly nothing about it. There is no built-in retry, no alert, no dashboard. The job either runs or it doesn't, and unless you actively check, you won't know the difference.
This creates a specific category of operational failure that is surprisingly hard to detect. Your server is up, your application is responding, your uptime monitor is green—but the nightly database backup stopped running three days ago because someone edited the crontab and introduced a syntax error.
There are three approaches to solving this, each with different trade-offs in reliability, setup effort, and maintenance burden.
Approach 1: Parsing Cron's Email Output
By default, cron sends the output of every job to the local user's mailbox (controlled by the MAILTO variable in your crontab). You can set MAILTO=you@company.com at the top of your crontab to forward this output to an external email address.
MAILTO=ops@company.com
# Every failed job's stderr gets emailed
0 2 * * * /home/deploy/scripts/backup-db.sh
*/5 * * * * /home/deploy/scripts/process-queue.sh
This works, but it has serious limitations:
- Noisy: Successful jobs that print any output also generate emails. You quickly train yourself to ignore them.
- No alert on silence: If the cron daemon itself stops, or the server reboots and cron doesn't restart, you get zero emails. No email means either "everything is fine" or "everything is broken"—and you can't tell which.
- Depends on mail delivery: Many servers don't have a working MTA configured. Cloud instances often block port 25. The emails may never arrive.
- No escalation: You can't page someone at 2am based on a cron email without building additional infrastructure.
Verdict: Better than nothing, but unreliable for anything you actually care about. The fundamental problem is that absence of email is ambiguous.
Approach 2: Wrapper Scripts with Exit Code Checking
A more robust approach is wrapping each cron job in a script that checks the exit code and sends an alert on failure. Here's a basic example:
#!/bin/bash
# cron-wrapper.sh — Run a command and alert on failure
COMMAND="$@"
OUTPUT=$($COMMAND 2>&1)
EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then
curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \
-H 'Content-Type: application/json' \
-d "{\"text\": \"Cron job failed: $COMMAND\nExit code: $EXIT_CODE\nOutput: $OUTPUT\"}"
fi
Your crontab becomes:
0 2 * * * /home/deploy/cron-wrapper.sh /home/deploy/scripts/backup-db.sh
This is significantly better than email parsing. You get alerts only on failure, and you can send them to Slack, PagerDuty, or any webhook. But it still has gaps:
- Still can't detect "didn't run": If cron itself fails to execute the job, the wrapper never runs, so no alert is sent.
- Exit codes aren't universal: Some scripts exit 0 even on partial failure. A backup that skips half the tables might still exit successfully.
- Maintenance overhead: You're now maintaining a custom alerting script. Every new notification channel requires code changes.
- No history or dashboard: You have no record of when jobs ran successfully, making it hard to spot patterns like gradually increasing runtime.
Verdict: Good for catching explicit failures. Useless for silent ones. You need discipline to ensure every job uses the wrapper.
Approach 3: Dead Man's Switch Monitoring (The Reliable Way)
A dead man's switch flips the model. Instead of waiting for failure signals, it expects success signals. Your cron job pings a monitoring endpoint every time it completes. If the ping doesn't arrive within the expected window, you get alerted.
This catches every failure mode:
- Script error: Job runs, fails, doesn't ping. Alert.
- Cron not running: Daemon stopped, job never executes. No ping. Alert.
- Server down: Machine rebooted, cron hasn't started. No ping. Alert.
- Crontab deleted: Someone ran
crontab -rinstead ofcrontab -e. No ping. Alert. - Schedule typo: Job set to run on Feb 30th. Never runs. No ping. Alert.
Key insight: Exit code monitoring asks "did the job fail?" A dead man's switch asks "did the job succeed?" The second question has no ambiguous cases. Either the ping arrived, or it didn't.
This is the approach used by services like CronPerek, Cronitor, Healthchecks.io, and Dead Man's Snitch. The concept is the same across all of them—the difference is pricing and feature set.
Setting Up CronPerek in 2 Minutes
CronPerek is a dead man's switch monitoring API. Free for up to 5 monitors, $9/mo for 50, $29/mo for unlimited. Here's how to set it up.
Step 1: Create a monitor
Use the API to create a monitor with your expected interval:
curl -X POST https://cronpeek.web.app/api/v1/monitors \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Nightly DB Backup",
"interval": 86400,
"grace_period": 3600
}'
# Response includes your monitor ID and ping URL
The interval is how often the job should run (in seconds). The grace_period is how long to wait after a missed ping before alerting—this prevents false alarms from jobs that run a few minutes late.
Step 2: Add a ping to your cron job
Append a single curl call to the end of your existing crontab entry:
# Before (unmonitored)
0 2 * * * /home/deploy/scripts/backup-db.sh
# After (monitored)
0 2 * * * /home/deploy/scripts/backup-db.sh && curl -fsS --retry 3 https://cronpeek.web.app/api/v1/ping/YOUR_MONITOR_ID
The && operator means the ping only fires if the backup script exits with code 0. If the script fails, no ping is sent, and CronPerek alerts you after the grace period.
Step 3: Configure your alert channels
Set up where you want to receive notifications:
- Email: Immediate notification to one or more addresses
- Webhook: POST to any URL—pipe it into Slack, Discord, PagerDuty, Telegram, or your own internal alerting system
That's it. No agent to install, no SDK dependency, no daemon to run. One HTTP GET request per job execution.
Combining Approaches for Maximum Coverage
The best setup uses a dead man's switch as the foundation, with a wrapper script for richer failure context:
#!/bin/bash
# monitored-job.sh — Best of both worlds
set -euo pipefail
MONITOR_ID="abc123def456"
WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
# Run the actual job
OUTPUT=$(/home/deploy/scripts/backup-db.sh 2>&1) || {
EXIT_CODE=$?
# Send failure details to Slack
curl -s -X POST "$WEBHOOK" \
-H 'Content-Type: application/json' \
-d "{\"text\": \"backup-db.sh failed (exit $EXIT_CODE): $OUTPUT\"}"
exit $EXIT_CODE
}
# Job succeeded — ping CronPerek
curl -fsS --retry 3 https://cronpeek.web.app/api/v1/ping/$MONITOR_ID
This gives you detailed failure messages in Slack when the job runs and fails, plus dead man's switch alerting when the job doesn't run at all. The wrapper script handles the "what went wrong" question. CronPerek handles the "did it run?" question.
Common Mistakes to Avoid
After working with thousands of cron jobs, these are the patterns that cause the most missed alerts:
- Pinging before the job completes: Put the curl at the end of your script, not the beginning. If your job takes 20 minutes and you ping at the start, you'll never know when it hangs mid-execution.
- Using
;instead of&&: The commandbackup.sh ; curl ping-urlsends a ping even if the backup failed. Always use&&to chain the ping conditionally on success. - Setting grace periods too tight: If your job normally takes 5–30 minutes depending on load, set the grace period to at least 45 minutes. False alarms cause alert fatigue, and alert fatigue causes real failures to be ignored.
- Only monitoring "important" jobs: The backup you didn't monitor is the one that fails. If a job is worth running, it's worth monitoring. Use a service with a generous free tier or flat-rate pricing so cost doesn't force you to make triage decisions.
- Letting the monitoring call block your job: Always use
--max-time 10and--retry 3with curl. If the monitoring service is briefly unreachable, you don't want your cron job to hang forever waiting for the ping to succeed.
Stop finding out about failures from your customers
Set up dead man's switch monitoring in under 2 minutes. Free tier includes 5 monitors. No credit card required.
Get started free →