Deploying Your Python Trading Bot as a systemd Service on Ubuntu: The Production Setup No Tutorial Covers
TL;DR / Key Takeaways
- A terminal session is not a deployment. If the SSH connection drops, your bot stops trading.
- systemd gives you automatic restart on failure, log management via journald, and proper process isolation without any third-party tools.
- Environment variables for API keys and key paths belong in a systemd
EnvironmentFile, not a.envcommitted to your repo. - A
RestartSecdelay combined withStartLimitIntervalSecprevents a crash-looping bot from hammering the Kalshi API during an outage.
Every bot tutorial ends at the same place: python bot.py. The code runs. The trades execute. The tutorial is done.
What the tutorial does not cover is what happens at 2 AM when your SSH session times out, the terminal closes, and the bot silently stops trading. Or when the bot crashes on an unexpected API response and stays down until you notice the next morning. Or when the server reboots after a security patch and nobody restarts the process.
Running a trading bot in production means running it as a service. On Ubuntu, that means systemd.
Why systemd and not screen, tmux, or nohup
Screen and tmux keep a session alive across disconnects. They work fine for development. For production, they have the wrong failure model: if the Python process crashes, nothing restarts it. You have to notice and intervene manually.
nohup keeps the process running after logout but has the same problem. No automatic restart, no structured logging, no dependency management.
systemd manages service lifecycle as a first-class concern. Crashes trigger automatic restart according to a configurable policy. Logs go to journald with timestamps, severity levels, and log rotation. The service starts automatically on boot. You can check status, tail logs, and stop the service with standard system tools. No extra dependencies, no hidden state in a terminal multiplexer.
Creating the service user
Do not run your trading bot as root. Create a dedicated system user with no login shell and no home directory except the one you assign.
sudo useradd --system --no-create-home --shell /usr/sbin/nologin tradingbot
This user will own the bot process. It cannot log in interactively. If the bot is compromised, the blast radius is limited to what this user can access.
Give this user read access to the project directory and the private key:
sudo mkdir -p /opt/predictandprofit
sudo chown -R tradingbot:tradingbot /opt/predictandprofit
sudo chmod 750 /opt/predictandprofit
# Private key in a restricted directory
sudo mkdir -p /etc/predictandprofit/keys
sudo cp ~/.kalshi/kalshi_private_key.pem /etc/predictandprofit/keys/
sudo chown tradingbot:tradingbot /etc/predictandprofit/keys/kalshi_private_key.pem
sudo chmod 600 /etc/predictandprofit/keys/kalshi_private_key.pem
The environment file
API keys, key IDs, database paths, and feature flags belong in an environment file that systemd loads at service start. This keeps secrets out of your unit file and out of version control.
Create /etc/predictandprofit/bot.env:
sudo nano /etc/predictandprofit/bot.env
Contents:
KALSHI_API_KEY_ID=your-key-id-from-dashboard
KALSHI_PRIVATE_KEY_PATH=/etc/predictandprofit/keys/kalshi_private_key.pem
KALSHI_ENV=production
DB_PATH=/opt/predictandprofit/trades.db
MIN_CONFIDENCE=0.30
MIN_ENSEMBLE_EDGE=0.10
LOG_LEVEL=INFO
Lock it down:
sudo chown root:tradingbot /etc/predictandprofit/bot.env
sudo chmod 640 /etc/predictandprofit/bot.env
The tradingbot user can read it. Nobody else outside root can. The file never touches your Git repository.
The systemd unit file
Create /etc/systemd/system/predictandprofit.service:
[Unit]
Description=Predict and Profit Kalshi Weather Trading Bot
Documentation=https://predictandprofit.io
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=tradingbot
Group=tradingbot
WorkingDirectory=/opt/predictandprofit
EnvironmentFile=/etc/predictandprofit/bot.env
ExecStart=/opt/predictandprofit/venv/bin/python /opt/predictandprofit/bot.py
Restart=on-failure
RestartSec=30
StartLimitIntervalSec=300
StartLimitBurst=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=predictandprofit
# Prevent the service from writing outside its working directory
PrivateTmp=true
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
A few decisions worth explaining.
After=network-online.target and Wants=network-online.target delay the service start until the network is actually up. Without this, the bot can start before the network interface is ready and immediately fail on the first API call, then enter a crash loop before the system finishes booting.
Restart=on-failure triggers a restart only when the process exits with a non-zero code. If you stop the service manually with systemctl stop, it stays stopped. on-always would fight you every time you deliberately shut it down.
RestartSec=30 waits 30 seconds between restart attempts. If Kalshi's API is down or returning errors that crash the bot, you do not want it hammering the endpoint every second.
StartLimitIntervalSec=300 and StartLimitBurst=5 together mean: if the service fails to start 5 times within 5 minutes, systemd stops trying and marks it as failed. This prevents a fundamentally broken configuration from spinning forever. You get an alert from your monitoring (if you have any) or you notice it when you check status.
PrivateTmp=true gives the process its own /tmp namespace. NoNewPrivileges=true prevents any child process from gaining elevated privileges via setuid. Neither of these affects normal bot operation and both reduce the attack surface.
Enable and start
sudo systemctl daemon-reload
sudo systemctl enable predictandprofit.service
sudo systemctl start predictandprofit.service
Check that it came up:
sudo systemctl status predictandprofit.service
You should see Active: active (running) and recent log output.
Working with logs
All stdout and stderr from the bot process goes to journald automatically because of the StandardOutput=journal and StandardError=journal directives.
# Follow live logs
sudo journalctl -u predictandprofit.service -f
# Last 100 lines
sudo journalctl -u predictandprofit.service -n 100
# Logs since this morning
sudo journalctl -u predictandprofit.service --since today
# Filter by priority (err and above)
sudo journalctl -u predictandprofit.service -p err
journald handles log rotation automatically. You do not need a separate logrotate config for the service logs. If you also log to SQLite or a file inside /opt/predictandprofit, handle rotation for those separately.
Handling the virtual environment
Your bot almost certainly runs in a Python virtual environment. The ExecStart line in the unit file points to the Python binary inside the venv:
ExecStart=/opt/predictandprofit/venv/bin/python /opt/predictandprofit/bot.py
This is cleaner than activating the venv in a wrapper script because the activation step is implicit and cannot be skipped. There is no shell to source, no path manipulation, no environment variable confusion. The interpreter is exactly the one the venv was built against.
Create the venv as the tradingbot user or transfer ownership after creation:
cd /opt/predictandprofit
python3 -m venv venv
sudo chown -R tradingbot:tradingbot venv
sudo -u tradingbot venv/bin/pip install -r requirements.txt
Testing the restart policy
Before you trust the restart policy in production, test it:
# Get the PID of the running bot
sudo systemctl status predictandprofit.service | grep "Main PID"
# Kill it hard
sudo kill -9 <PID>
# Watch systemd bring it back after 30 seconds
sudo journalctl -u predictandprofit.service -f
You should see the exit logged, the 30-second pause, and then a fresh start entry. If you do not see that, check that Restart=on-failure is set correctly and that systemctl daemon-reload was run after your last edit to the unit file.
The difference this makes
Before systemd, my bot ran in a tmux session on a VM. It worked until it did not. A kernel update triggered an automatic reboot. The tmux session came back, but the bot was not configured to auto-start inside it. It sat stopped for four hours before I noticed.
With systemd, that scenario does not happen. The service is declared as a system dependency. It starts when the machine starts. When it fails, it logs the failure and tries again. When it exceeds the retry limit, it stays failed and waits for human attention rather than spinning in a crash loop.
For a system designed to trade without supervision, that is the correct behavior.
The full bot source code, including the systemd unit template and deployment scripts, is part of the Predict & Profit package.
How It Works — Full System Overview
Frequently Asked Questions
Q: Why not use Docker instead of a systemd service?
A: Docker is a reasonable choice for more complex deployments. For a single Python bot on a dedicated VM, it adds complexity without clear benefit. systemd is already present, requires no additional installation, and integrates directly with the OS logging and process management layer.
Q: Can I run both the Kalshi bot and the Alpaca bot as separate services?
A: Yes. Create two unit files with separate SyslogIdentifier values, separate environment files, and separate working directories. Each service manages its own lifecycle independently. If one crashes, the other continues running.
Q: How do I update the bot without downtime?
A: Deploy the new code to /opt/predictandprofit, then run sudo systemctl restart predictandprofit.service. The restart is fast. The only window where you are not trading is the few seconds between the old process stopping and the new one initializing.