Let's Encrypt expiry monitoring without certbot --staple
Let's Encrypt has revolutionized SSL/TLS, making HTTPS accessible to everyone. Its widespread adoption is a testament to its mission. However, one of its core tenets – 90-day certificate validity – means that proactive monitoring is absolutely critical. While certbot and other ACME clients automate renewals, the responsibility for ensuring those renewals actually happen, are deployed correctly, and are publicly visible, ultimately falls to you.
Many engineers, when considering Let's Encrypt expiry monitoring, might immediately think of certbot --staple. It's a convenient feature, but it's far from a universal solution. In this article, we'll dive into why certbot --staple might not be enough and explore more robust, practical methods for monitoring your Let's Encrypt certificates, especially for non-standard setups or when you need true public-facing validation.
The certbot --staple Command: A Quick Review and Its Limitations
First, let's understand what certbot --staple does. When you run certbot renew --dry-run --staple (or certbot renew --staple for real renewals), Certbot checks the expiry of the certificates it manages. If a certificate is within its renewal window, it will attempt to renew it. The --staple flag itself primarily refers to OCSP stapling, but in the context of certbot renew, it ensures that the renewed certificate includes the necessary OCSP stapling data.
More importantly for monitoring, certbot typically integrates with your system's scheduler (like cron or systemd timers) to run certbot renew periodically. If you've installed Certbot using a package manager, this setup is often automatic. The monitoring aspect here is implicit: if certbot renew fails (e.g., due to ACME challenge issues, rate limits, or file permissions), it might log an error locally, and potentially send an email to root, depending on your system's configuration.
The core limitation of this approach for monitoring is that it's an internal check.
* Local-only: It only verifies that the certificate files on the local filesystem are up to date.
* No deployment validation: It doesn't tell you if your web server (Nginx, Apache, etc.) has actually loaded the new certificate. A successful renewal followed by a failed server reload means your users still see an expired certificate.
* Limited scope: It only monitors certificates managed directly by that specific Certbot instance. If you use acme.sh, lego, or manually manage certificates, or have certificates issued by cloud providers (e.g., AWS ACM, Google Cloud Load Balancer SSL policies), certbot --staple on your web server won't know anything about them.
* No public visibility check: Most crucially, it doesn't verify the certificate that your users actually see when they connect to your service from the internet. This is the ultimate source of truth.
For these reasons, relying solely on certbot --staple or similar internal client-side checks is often insufficient for robust production monitoring.
Why Monitor Externally? The Case for Public-Facing Checks
The goal of certificate monitoring isn't just to know if a file on a server is renewed; it's to prevent your public-facing services from going down due to an expired certificate. An expired certificate means downtime, broken trust, and a poor user experience.
External monitoring validates the entire chain: 1. Successful Renewal: The ACME client successfully obtained a new certificate. 2. Correct Deployment: Your web server or load balancer picked up the new certificate. 3. Correct Configuration: Your server is serving the correct certificate, including intermediate certificates. 4. Public Accessibility: The certificate is reachable and valid from the internet, across various network conditions.
If any of these steps fail, external monitoring will catch it. Internal checks, by their nature, cannot.
Method 1: Leveraging ACME Client Hooks (for acme.sh users)
If you're using an ACME client other than certbot, like acme.sh, you often have more granular control over what happens post-renewal via hooks. This is still an internal check, but it allows for custom actions.
acme.sh offers renew_hook and deploy_hook options. These scripts or commands are executed after a successful renewal or deployment, respectively. You can use these to trigger alerts or update an external system.
Example: Sending an email after renewal
# Example acme.sh command with a renew hook
acme.sh --issue -d example.com -d www.example.com \
--webroot /var/www/html \
--renew-hook "echo 'Let\'s Encrypt certificate for example.com renewed successfully!' | mail -s 'LE Cert Renewed' admin@yourdomain.com" \
--reloadcmd "systemctl reload nginx"
In this example, after example.com is renewed, an email is sent to admin@yourdomain.com. The reloadcmd ensures Nginx picks up the new certificate.
Pitfalls:
* Still internal: This only confirms the renewal and local reload command execution, not that the certificate is actually publicly visible and valid.
* Requires local mail setup: For the mail command to work, your server needs a working MTA (Mail Transfer Agent) configured.
* Custom scripting: You're responsible for writing and maintaining the hook script. If it fails, you might not know.
Method 2: Scripting with openssl for Local Checks
You can write a simple script to check the expiry date of certificate files on your server. This is a step up from just relying on Certbot's logs, as it gives you explicit control over the check.
Example: Checking a specific certificate file
# Get the expiry date of a local certificate file
openssl x509 -in /etc/letsencrypt/live/example.com/cert.pem -noout -enddate
# Sample Output: notAfter=Jan 15 12:34:56 2024 GMT
You can parse this output, convert it to a timestamp, and compare it against the current date to calculate the remaining days.
Script idea:
```bash
!/bin/bash
Configuration
ALERT_THRESHOLD_DAYS=14 ALERT_EMAIL="your-email@example.com" LE_CERTS_DIR="/etc/letsencrypt/live"
for DOMAIN_DIR in "${LE