How to Monitor Cron Jobs & Scheduled Tasks in AWS Lambda
You're using Amazon EventBridge (formerly CloudWatch Events) to trigger Lambda functions on a schedule. It works like a serverless cron job. But what happens when the schedule rule gets disabled, the Lambda times out, or the function silently errors? CloudWatch metrics alone won't catch every failure mode. This guide shows you how to add dead man's switch monitoring to your Lambda functions with CronPeek.
The AWS Lambda "Cron Job" Pattern
AWS Lambda doesn't have a built-in cron scheduler. Instead, you use Amazon EventBridge (or the older CloudWatch Events) to create schedule rules that invoke your Lambda function at fixed intervals. A typical setup looks like this:
- EventBridge rule:
rate(1 hour)orcron(0 2 * * ? *) - Target: Your Lambda function ARN
- Lambda function: Runs your business logic (ETL, reports, cleanup, etc.)
This is effectively a serverless cron job. No servers to manage, no crontab to maintain. But it comes with its own set of silent failure modes that are easy to miss.
Why CloudWatch Alone Isn't Enough
Most teams assume that CloudWatch metrics and alarms cover Lambda monitoring. And for many failure types, they do. But there are critical gaps.
What CloudWatch catches
- Invocation errors: If your Lambda throws an unhandled exception, CloudWatch records it as an error in the
Errorsmetric. You can set an alarm on this. - Duration and timeout: If your function exceeds its configured timeout, CloudWatch logs a timeout event.
- Throttling: If you hit concurrency limits, CloudWatch tracks the
Throttlesmetric. - Cold starts and memory: CloudWatch Logs include billed duration and max memory used.
What CloudWatch misses
- Disabled EventBridge rules. If someone disables the schedule rule (through the console, an IaC deploy, or an API call), the Lambda simply isn't invoked. CloudWatch has nothing to report because nothing happened. There's no "invocations dropped to zero" alarm out of the box.
- Deleted or misconfigured triggers. A Terraform apply removed the EventBridge rule. A CloudFormation stack update changed the schedule expression. The Lambda still exists but never gets called.
- Partial success. Your Lambda runs, processes 95% of the work, but silently skips a critical step due to a logic bug. It returns a 200 status. CloudWatch sees a successful invocation. But the actual business outcome failed.
- EventBridge delivery failures. In rare cases, EventBridge can fail to deliver an event to the target Lambda. These are logged in EventBridge's own metrics, but most teams don't have alarms on
FailedInvocationsfor scheduled rules. - Alarm fatigue and configuration drift. Setting up CloudWatch alarms for "zero invocations" requires a
INSUFFICIENT_DATAalarm on theInvocationsmetric, treated as zero. This is non-obvious, error-prone, and breaks if someone changes the metric period or evaluation window.
The core problem: CloudWatch monitors what did happen. A dead man's switch monitors what should have happened but didn't. These are fundamentally different. CloudWatch cannot natively alert on the absence of an expected event.
The Dead Man's Switch Solution
A dead man's switch flips the monitoring model. Instead of watching for errors, it watches for the absence of a success signal. Your Lambda function sends an HTTP "heartbeat" ping to CronPeek after completing its work. If the ping doesn't arrive within the expected window, CronPeek alerts you.
This catches every failure mode listed above:
- Disabled schedule rule? No invocation, no ping, alert fired.
- Lambda timeout? Function never reaches the ping call, alert fired.
- Unhandled exception? Function crashes before the ping, alert fired.
- Partial success? You place the ping after the critical steps, so it only fires on full success.
Step 1: Create a CronPeek Monitor
First, create a monitor through the CronPeek API. Set the expected interval to match your EventBridge schedule, plus a grace period for execution time.
# Create a monitor that expects a ping every hour
# with a 10-minute grace period
curl -X POST https://cronpeek.web.app/api/v1/monitors \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "nightly-etl-lambda",
"interval_seconds": 3600,
"grace_seconds": 600
}'
# Response includes your monitor ID and ping URL:
# { "id": "mon_abc123", "ping_url": "https://cronpeek.web.app/api/v1/ping/mon_abc123" }
Step 2: Add the Heartbeat Ping to Your Lambda
Add the ping call at the end of your handler function, after all business logic has completed successfully. This is important—the ping should be the last thing that happens, so it only fires when everything worked.
Node.js Lambda Handler
const https = require('https');
// Helper: send heartbeat ping to CronPeek
function pingCronPeek(monitorId) {
return new Promise((resolve) => {
const req = https.get(
`https://cronpeek.web.app/api/v1/ping/${monitorId}`,
(res) => {
res.resume(); // drain the response
resolve(res.statusCode);
}
);
req.on('error', () => resolve(null)); // non-critical
req.setTimeout(5000, () => {
req.destroy();
resolve(null);
});
});
}
exports.handler = async (event) => {
// ---- Your business logic ----
const records = await fetchNewRecords();
const transformed = transformRecords(records);
await writeToDatabase(transformed);
console.log(`Processed ${transformed.length} records`);
// ---- End business logic ----
// Heartbeat: signal successful completion
await pingCronPeek(process.env.CRONPEEK_MONITOR_ID);
return {
statusCode: 200,
body: JSON.stringify({ processed: transformed.length }),
};
};
Python Lambda Handler
import os
import json
import urllib.request
import urllib.error
def ping_cronpeek(monitor_id):
"""Send heartbeat ping to CronPeek. Non-blocking, non-critical."""
try:
url = f"https://cronpeek.web.app/api/v1/ping/{monitor_id}"
req = urllib.request.Request(url, method="GET")
with urllib.request.urlopen(req, timeout=5) as resp:
return resp.status
except (urllib.error.URLError, TimeoutError):
return None # Don't let monitoring break the job
def handler(event, context):
# ---- Your business logic ----
records = fetch_new_records()
transformed = transform_records(records)
write_to_database(transformed)
print(f"Processed {len(transformed)} records")
# ---- End business logic ----
# Heartbeat: signal successful completion
ping_cronpeek(os.environ["CRONPEEK_MONITOR_ID"])
return {
"statusCode": 200,
"body": json.dumps({"processed": len(transformed)})
}
Notice that both examples use Python's built-in urllib and Node's built-in https module. No external dependencies required. The ping function is wrapped in a try/catch so that a network hiccup in the monitoring call never breaks your actual job.
Step 3: Set the Environment Variable
Store the monitor ID as an environment variable on your Lambda function. This keeps it out of your code and makes it easy to change per environment.
# AWS CLI
aws lambda update-function-configuration \
--function-name my-etl-function \
--environment "Variables={CRONPEEK_MONITOR_ID=mon_abc123}"
# Or in your SAM/CloudFormation template:
# Environment:
# Variables:
# CRONPEEK_MONITOR_ID: mon_abc123
# Or in Terraform:
# environment {
# variables = {
# CRONPEEK_MONITOR_ID = "mon_abc123"
# }
# }
Step 4: Configure Alert Channels
Set up where you want to receive alerts when a ping is missed. CronPeek supports email and webhook alerts. Use webhooks to integrate with Slack, Discord, PagerDuty, or any other notification system.
# Add a webhook alert channel
curl -X POST https://cronpeek.web.app/api/v1/monitors/mon_abc123/alerts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"type": "webhook",
"url": "https://hooks.slack.com/services/T00/B00/xxxxx"
}'
Architecture Overview
Here's how all the pieces fit together:
EventBridge Schedule Rule (rate or cron expression)
|
v
AWS Lambda Function
|
|-- 1. Execute business logic
|-- 2. On success: HTTP GET to CronPeek ping URL
|
v
CronPeek Dead Man's Switch
|
|-- Ping received on time? --> All good, reset timer
|-- Ping missing? ----------> Fire alert (email / webhook)
The entire monitoring layer is a single HTTP request. No Lambda layers, no additional infrastructure, no CloudWatch alarm configuration. One GET request at the end of your function.
Common Lambda Scheduling Patterns
Here are the most common EventBridge schedule patterns and how to set up the corresponding CronPeek monitor intervals:
| Schedule | EventBridge Expression | CronPeek Interval | Grace Period |
|---|---|---|---|
| Every 5 minutes | rate(5 minutes) |
300s | 120s |
| Every hour | rate(1 hour) |
3600s | 600s |
| Daily at 2 AM UTC | cron(0 2 * * ? *) |
86400s | 1800s |
| Weekdays at 9 AM | cron(0 9 ? * MON-FRI *) |
86400s | 1800s |
| Weekly on Sunday | cron(0 0 ? * SUN *) |
604800s | 3600s |
Set the grace period to account for your function's maximum expected execution time plus some buffer. If your Lambda typically runs for 3 minutes but occasionally takes 8, set a 15-minute grace period.
Advanced: Monitoring Multi-Step Lambda Workflows
If your scheduled workflow involves multiple Lambda functions (e.g., a Step Functions state machine triggered by EventBridge), place the CronPeek ping in the final step of the workflow. This ensures you're monitoring end-to-end completion, not just the first invocation.
# In your Step Functions definition, the final Lambda:
exports.handler = async (event) => {
// Final step: upload results to S3
await uploadResults(event.processedData);
// Ping CronPeek only after the entire workflow completes
await pingCronPeek(process.env.CRONPEEK_MONITOR_ID);
return { status: 'complete', records: event.processedData.length };
};
For monitoring other scheduled task platforms (Airflow, GitHub Actions, Kubernetes CronJobs), the same pattern applies—one ping at the end of the successful execution path.
Why Not Just Use CloudWatch Alarms?
You can set up a CloudWatch alarm that triggers when Invocations drops to zero over a period. But there are practical issues:
- Configuration complexity. You need to create a metric alarm with
TreatMissingData: breaching, set the evaluation period to match your schedule, and handle edge cases like deployment windows where invocations are legitimately paused. - Per-function setup. Each Lambda needs its own CloudWatch alarm. If you have 20 scheduled Lambdas, that's 20 alarms to configure and maintain.
- No success validation. A CloudWatch invocations alarm only knows the function was called. It doesn't know if the function completed its actual work. A dead man's switch ping placed after the business logic confirms the work was done.
- Cost and noise. CloudWatch alarms cost $0.10/alarm/month. CronPeek's free tier gives you 5 monitors at no cost, and the Pro plan covers 50 for $9/mo with clearer semantics.
CloudWatch alarms are the right choice for monitoring Lambda errors and performance. A dead man's switch is the right choice for monitoring whether the job ran at all and completed successfully. Use both for full coverage, just like you'd use Uptime Robot alongside CronPeek for different monitoring concerns.
Monitoring Across Environments
Use separate CronPeek monitors for each environment (dev, staging, production). Store the monitor ID in environment-specific configuration:
# terraform/production.tfvars
cronpeek_monitor_id = "mon_prod_abc123"
# terraform/staging.tfvars
cronpeek_monitor_id = "mon_staging_def456"
This way, a missed ping in staging doesn't trigger your production alert channel, and vice versa. CronPeek's flat pricing at $9/mo for 50 monitors means you can monitor all environments without worrying about per-monitor costs.
Quick Reference: The Complete Setup Checklist
- Create a CronPeek account at cronpeek.web.app (free tier: 5 monitors)
- Create a monitor via the API with interval matching your EventBridge schedule
- Add the ping helper function to your Lambda code (Node.js or Python)
- Call the ping function at the end of your handler, after all business logic
- Store the monitor ID as a Lambda environment variable
- Configure email or webhook alerts in CronPeek
- Test by temporarily disabling your EventBridge rule and verifying you receive an alert
The Bottom Line
AWS Lambda scheduled functions are serverless cron jobs, and they need the same kind of monitoring as traditional cron—dead man's switch monitoring that detects when a job doesn't run. CloudWatch is essential for error tracking and performance metrics, but it has a blind spot for missing executions and partial failures.
Adding CronPeek heartbeat pings to your Lambda handlers takes five minutes and zero additional infrastructure. One HTTP request at the end of your function gives you confidence that your scheduled workloads are actually running, not just deployed.
Start monitoring Lambda scheduled tasks
Free tier includes 5 monitors—enough for your most critical Lambda cron jobs. No credit card required.
Get started free →