Building a Dead Simple Health Check for a Headless Python Bot
TL;DR / Key Takeaways
- Write a heartbeat timestamp to a file on every bot cycle. If the file goes stale, the bot is dead.
- Expose a minimal HTTP health endpoint using Python's built-in
http.server. No Flask required. - A single cron job checks the heartbeat age and emails you if it exceeds your threshold.
- The Predict & Profit weather and econ bots both use this pattern. It has caught two silent failures that would have run for hours undetected.
The bot crashes at 2:47am. You wake up at 7. By then it has missed four trading cycles, maybe more. You have no idea what happened. The process is gone, the logs stopped mid-line, and systemd restarted it automatically so now it looks healthy again.
This is the monitoring gap that gets solo deployments. Not a catastrophic failure. A quiet one.
I am not going to tell you to pay for Datadog. I am not going to tell you to set up Prometheus and Grafana. You are one person running a Python bot on a VPS. You need three things: a heartbeat file, something that checks it, and something that tells you when it is stale.
Step 1: Write a Heartbeat File
Every time your main bot loop completes a cycle, write the current UTC timestamp to a file.
import datetime
import pathlib
HEARTBEAT_PATH = pathlib.Path("/var/run/mybot/heartbeat.txt")
def write_heartbeat():
HEARTBEAT_PATH.parent.mkdir(parents=True, exist_ok=True)
HEARTBEAT_PATH.write_text(datetime.datetime.utcnow().isoformat())
Call this at the bottom of your main loop, after all processing is done. Not before. If the loop hangs or throws before it gets there, the file does not update. That is the point.
In my Weather Bot, the call sits just before the sleep at the end of the scan cycle. If the GRIB download hangs, if the Kalshi API times out and retries forever, if something blocks the event loop, the heartbeat goes stale and I find out about it.
Step 2: Expose a Health Endpoint
You want to be able to curl your server and get a response. This is useful for manual checks and can be hit by uptime monitors if you ever want to add one later.
Python has http.server built in. No dependencies, no pip install. This is the whole thing:
import threading
import datetime
import pathlib
from http.server import BaseHTTPRequestHandler, HTTPServer
HEARTBEAT_PATH = pathlib.Path("/var/run/mybot/heartbeat.txt")
MAX_STALE_SECONDS = 300 # 5 minutes
class HealthHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path != "/health":
self.send_response(404)
self.end_headers()
return
try:
raw = HEARTBEAT_PATH.read_text().strip()
last = datetime.datetime.fromisoformat(raw)
age = (datetime.datetime.utcnow() - last).total_seconds()
if age <= MAX_STALE_SECONDS:
body = f"OK age={int(age)}s last={raw}\n".encode()
self.send_response(200)
else:
body = f"STALE age={int(age)}s last={raw}\n".encode()
self.send_response(503)
except Exception as e:
body = f"ERROR {e}\n".encode()
self.send_response(500)
self.send_header("Content-Type", "text/plain")
self.end_headers()
self.wfile.write(body)
def log_message(self, format, *args):
pass # suppress default Apache-style access logs
def start_health_server(port=8765):
server = HTTPServer(("0.0.0.0", port), HealthHandler)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
Call start_health_server() once at bot startup, before your main loop begins. It runs in a daemon thread so it does not block anything.
Now you can do this from anywhere:
curl http://your-vps-ip:8765/health
# OK age=47s last=2026-05-27T03:14:22.104891
If the bot is hung: STALE age=612s. If it crashed and systemd has not restarted it yet: ERROR or connection refused.
Step 3: The Cron Job That Actually Tells You
The HTTP endpoint is for when you are actively checking. Cron is for when you are asleep.
Create a small shell script. I put mine at /usr/local/bin/check_bot_heartbeat.sh:
#!/bin/bash
HEARTBEAT="/var/run/mybot/heartbeat.txt"
MAX_AGE=300 # seconds
EMAIL="you@yourdomain.com"
BOT_NAME="WeatherBot"
if [ ! -f "$HEARTBEAT" ]; then
echo "$BOT_NAME heartbeat file missing on $(hostname)" \
| mail -s "[$BOT_NAME] HEARTBEAT MISSING" "$EMAIL"
exit 1
fi
LAST_MOD=$(stat -c %Y "$HEARTBEAT")
NOW=$(date +%s)
AGE=$((NOW - LAST_MOD))
if [ "$AGE" -gt "$MAX_AGE" ]; then
LAST_CONTENT=$(cat "$HEARTBEAT")
echo "$BOT_NAME has not written a heartbeat in ${AGE}s. Last seen: $LAST_CONTENT" \
| mail -s "[$BOT_NAME] STALE HEARTBEAT" "$EMAIL"
fi
Make it executable:
chmod +x /usr/local/bin/check_bot_heartbeat.sh
Then add to crontab (crontab -e):
*/5 * * * * /usr/local/bin/check_bot_heartbeat.sh
Every five minutes, cron checks whether the heartbeat file is fresh. If not, you get an email. That is the whole system. No SaaS. No monthly bill. No YAML config files for a monitoring stack you barely understand.
For the email side, I use mailutils on Ubuntu with a simple SMTP relay through my domain. If you do not have that set up, ssmtp or a sendgrid relay both work. That configuration is a separate post. For now, just know that mail needs to be functional on your server for the cron alerting to work.
What This Catches
The two silent failures I mentioned in the TL;DR were both caused by a hung external API call that never timed out properly. The bot was technically running. Systemd showed it as active. The process was alive. But no cycles were completing, no trades were being evaluated, and the log had stopped updating.
Without the heartbeat, I would not have known until I manually SSHed in and checked. With it, I got an email 10 minutes after the hang started.
That is the whole value proposition. Not dashboards. Not uptime percentages. Just: is my bot doing the thing it is supposed to be doing right now, and if not, tell me before I lose a full trading session.
Integrating With systemd
One small addition worth making: if your bot is managed by systemd, you can have the heartbeat check restart the service automatically instead of just emailing you. Add this to your service file:
[Service]
WatchdogSec=600
NotifyAccess=main
And call sd_notify from Python via systemd.daemon if you want full watchdog integration. That is more involved. The file-based heartbeat above does the same job with zero additional dependencies and works whether or not you are using systemd.
The monitoring setup I just described costs nothing and takes maybe two hours to wire in. I have been running it on both bots since the second week of deployment. It is not sophisticated. It does not need to be. It just needs to tell me when something stops working, and it does that reliably.
If you want to see how this fits into the full bot architecture, the Weather Bot v2.1 and Econ Bot both ship with this pattern baked in. The source code bundle is at predictandprofit.gumroad.com.