Free Tier Limits of PagerDuty for Cert Expiry Monitoring

SSL/TLS certificates are the unsung heroes of secure communication. They ensure your users' data is encrypted, build trust, and are a non-negotiable part of modern web infrastructure. Yet, despite their critical role, certificate expiry remains a surprisingly common cause of outages, often leading to frantic, late-night debugging sessions and significant reputational damage.

The problem isn't usually a lack of awareness that certs expire; it's the challenge of consistent, reliable monitoring and timely alerting. Many engineering teams turn to incident management tools like PagerDuty to handle critical alerts. PagerDuty offers a free tier, which can be tempting for small teams or those just starting out with incident management. But when it comes to the specific, nuanced task of certificate expiry monitoring, how far can PagerDuty's free tier really take you? As engineers, we need to be honest about the trade-offs.

This article will explore the practical limitations and hidden complexities you'll encounter if you try to build a robust certificate expiry monitoring system using only PagerDuty's free tier.

Understanding PagerDuty's Free Tier for Monitoring

PagerDuty's free tier is designed to help individuals or very small teams get started with basic incident management. It's generous in some ways, offering:

  • 1 User: A single user can receive notifications and manage incidents.
  • 1 Integration: You can connect one external service or tool to send events to PagerDuty. This is typically the Generic Events API or an email integration.
  • Email & Push Notifications: The primary notification channels available.
  • Basic Incident Management: You can create, acknowledge, and resolve incidents.

For a single person managing a handful of alerts from a single source, this can be quite effective. However, certificate expiry monitoring isn't just about receiving an alert; it's about proactively identifying the expiry risk and ensuring the right person gets notified with enough lead time.

The Core Challenge: PagerDuty is an Alerting Tool, Not a Monitoring Tool

This is the fundamental disconnect. PagerDuty excels at taking an event (like "certificate expiring soon") and transforming it into an actionable incident. What PagerDuty doesn't do, especially in its free tier, is the actual monitoring. It won't go out and check your certificates for you.

To use PagerDuty for cert expiry, you need a separate mechanism that: 1. Discovers your certificates (domains, ports, internal services). 2. Checks their expiry dates. 3. Calculates the remaining validity period. 4. Triggers an event to PagerDuty if an expiry threshold is crossed.

This "separate mechanism" is where the complexities and limitations of the free tier truly begin to show.

Option 1: Scripting Your Own Monitoring (The PagerDuty Free Tier Way)

Since PagerDuty won't monitor for you, your first instinct as an engineer might be to write a script. This is the most common approach when trying to leverage free tiers and existing tools.

Building the Monitoring Script

You'll need a script that can connect to your services, retrieve certificate details, and then, if necessary, send an event to PagerDuty. Let's consider a basic Python script that checks a public domain's certificate and sends an event to PagerDuty's Events API v2.

import ssl
import socket
import datetime
import requests
import os

# PagerDuty configuration (replace with your actual integration key)
PAGERDUTY_ROUTING_KEY = os.getenv("PAGERDUTY_ROUTING_KEY", "YOUR_PAGERDUTY_ROUTING_KEY")
PAGERDUTY_EVENTS_API_URL = "https://events.pagerduty.com/v2/enqueue"

# Threshold for expiry (e.g., 30 days)
EXPIRY_THRESHOLD_DAYS = 30

def check_ssl_certificate(hostname, port=443):
    """Checks the SSL certificate expiry for a given hostname."""
    try:
        context = ssl.create_default_context()
        with socket.create_connection((hostname, port), timeout=5) as sock:
            with context.wrap_socket(sock, server_hostname=hostname) as ssock:
                cert = ssock.getpeercert()
                not_after_timestamp = ssl.cert_time_to_seconds(cert['notAfter'])
                not_after_date = datetime.datetime.fromtimestamp(not_after_timestamp)
                return not_after_date
    except Exception as e:
        print(f"Error checking {hostname}:{port} - {e}")
        return None

def send_pagerduty_event(summary, severity, source, component, group, custom_details=None):
    """Sends an event to PagerDuty's Events API v2."""
    if not PAGERDUTY_ROUTING_KEY or PAGERDUTY_ROUTING_KEY == "YOUR_PAGERDUTY_ROUTING_KEY":
        print("PagerDuty routing key not set. Skipping PagerDuty event.")
        return

    payload = {
        "routing_key": PAGERDUTY_ROUTING_KEY,
        "event_action": "trigger",
        "payload": {
            "summary": summary,
            "severity": severity,
            "source": source,
            "component": component,
            "group": group,
            "custom_details": custom_details if custom_details else {}
        }
    }
    try:
        response = requests.post(PAGERDUTY_EVENTS_API_URL, json=payload, timeout=10)
        response.raise_for_status()
        print(f"PagerDuty event sent successfully: {response.json()}")
    except requests.exceptions.RequestException as e:
        print(f"Error sending PagerDuty event: {e}")

if __name__ == "__main__":
    # List of domains to monitor (you'd likely load this from a config file or API)
    domains_to_monitor = [
        "example.com",
        "another-service.net",
        "sub.domain.org"
    ]

    for domain in domains_to_monitor:
        expiry_date = check_ssl_certificate(domain)
        if expiry_date:
            days_left = (expiry_date - datetime.datetime.now()).days
            print(f"Certificate for {domain} expires on {expiry_date}. Days left: {days_left}")

            if 0 < days_left <= EXPIRY_THRESHOLD_DAYS:
                summary = f"Certificate for {domain} expires in {days_left} days!"
                details = {
                    "domain": domain,
                    "expiry_date": expiry_date.isoformat(),
                    "days_left": days_left
                }
                send_pagerduty_event(
                    summary=summary,
                    severity="warning",
                    source="cert-expiry-monitor",
                    component="ssl-certificate",
                    group="security",
                    custom_details=details
                )
            elif days_left <= 0:
                summary = f"Certificate for {domain} has expired!"
                details = {
                    "domain": domain,
                    "expiry_date": expiry_date.isoformat(),
                    "days_left": days_left
                }
                send_pagerduty_event(
                    summary=summary,
                    severity="critical",
                    source="cert-expiry-monitor",
                    component="ssl-certificate",
                    group="security",
                    custom_details=details
                )

This script works for a single user and a single PagerDuty integration. But consider the implications:

Where to Run the Script?

This is your first major hurdle. The script needs to run somewhere reliably and periodically.

  • Your Laptop: Unreliable. Your laptop isn't always on, connected, or stable.
  • A Dedicated Server/VM: Now you're maintaining infrastructure purely