Pagerduty Cloudwatch integrationยถ

It is possible to send your own custom payload to the Pagerduty Cloudwatch integration from a Lambda (instead of via a Cloudwatch alarm). Pagerduty does not document the internals but if you publish a custom message to the SNS topic that you have a HTTPS subscription to Pager duty following these simple rules you will see the event in Pagerduty.

PagerDuty Integration Config:

  • Derive name from should be set to Alarm Description

  • If set to default it will not work because it parses data suchas Trigger.Statistics to generate the name.

SNS Subject:

  • The message subject is important it must start with

ALARM: 
# Note the space after the colon
  • It doesnโ€™t matter what you put after the colon it will not be processed by PD or visible at all in PD.

  • The alarm status (in this case ALARM) must match the NewStateValue in the SNS message body or it will be discarded.

  • You can also clear the incident in Pagerduty by following the above rules and replacing ALARM with OK

SNS Message:

  • The integration is very strict when it parses the JSON message any slight syntax errors will cause it to be discarded

  • You can put anything else you want into the JSON payload and it will be visible in Pagerduty.

  • A minimal message looks like this:

{
 "NewStateValue": "ALARM",
 "foo": "bar"
 }

This is what Cloudwatch SNS sends to Pagerduty.

{
  "Type" : "Notification",
  "MessageId" : "c2228c71-f550-5e3d-b92c-d7dada9f6d76",
  "TopicArn" : "arn:aws:sns:ap-southeast-1:003422198502:testbc",
  "Subject" : "ALARM: \"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c...\" in Asia Pacific (Singapore)",
  "Message" : "{\"AlarmName\":\"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"AlarmDescription\":\"DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.\",\"AWSAccountId\":\"003422198502\",\"AlarmConfigurationUpdatedTimestamp\":\"2022-09-26T04:41:30.103+0000\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).\",\"StateChangeTime\":\"2022-09-26T04:41:51.610+0000\",\"Region\":\"Asia Pacific (Singapore)\",\"AlarmArn\":\"arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"OldStateValue\":\"INSUFFICIENT_DATA\",\"OKActions\":[],\"AlarmActions\":[\"arn:aws:sns:ap-southeast-1:003422198502:testbc\",\"arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e\"],\"InsufficientDataActions\":[],\"Trigger\":{\"MetricName\":\"CPUUtilization\",\"Namespace\":\"AWS/ECS\",\"StatisticType\":\"Statistic\",\"Statistic\":\"AVERAGE\",\"Unit\":\"Percent\",\"Dimensions\":[{\"value\":\"testservice\",\"name\":\"ServiceName\"},{\"value\":\"test\",\"name\":\"ClusterName\"}],\"Period\":60,\"EvaluationPeriods\":1,\"DatapointsToAlarm\":1,\"ComparisonOperator\":\"LessThanThreshold\",\"Threshold\":11.700000000000001,\"TreatMissingData\":\"breaching\",\"EvaluateLowSampleCountPercentile\":\"\"}}",
  "Timestamp" : "2022-09-26T04:41:51.652Z",
  "SignatureVersion" : "1",
  "Signature" : "Zr8NlG6+KlEfOcj1ZS96BU4Z3K3aKWpJpf8pWc9/u84rbG6Q5kPdqJEY0jiLK4WCbEwmrZFols/ULvKB/W0Z5goBnyQmMlW7XIxpDIoU7I4aGd9XvQNyDed/TEUQ3IK280PerWmBRPPsxgTKN48emazGbch5Ea84DThT/tpw8L98KvC0yzgV04mB2fPgXGdytoRupn/bYitwcgTkkccynzHFHDAWCQkhcYql/wCt41eANLtIAfbdg02uKVs44LPwcoiJv5fO/jo/qMOQZd7i2xNBh6yD9Vn8kkNE6FCmEiIzRmiiOA6sqB9HZB/xQueBhJz/kboyR/Qe6IMpcjb21A==",
  "SigningCertURL" : "https://sns.ap-southeast-1.amazonaws.com/SimpleNotificationService-56e67fcb41f6fec09b0196692625d385.pem",
  "UnsubscribeURL" : "https://sns.ap-southeast-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:ap-southeast-1:003422198502:testbc:894babc8-8186-4b49-b68d-ff18e204e59a"
}

Cleaned up Message field extracted from above:

{
    "AlarmName": "2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
    "AlarmDescription": "DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.",
    "AWSAccountId": "003422198502",
    "AlarmConfigurationUpdatedTimestamp": "2022-09-26T04:41:30.103+0000",
    "NewStateValue": "ALARM",
    "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).",
    "StateChangeTime": "2022-09-26T04:41:51.610+0000",
    "Region": "Asia Pacific (Singapore)",
    "AlarmArn": "arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
    "OldStateValue": "INSUFFICIENT_DATA",
    "OKActions": [],
    "AlarmActions": ["arn:aws:sns:ap-southeast-1:003422198502:testbc", "arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e"],
    "InsufficientDataActions": [],
    "Trigger": {
        "MetricName": "CPUUtilization",
        "Namespace": "AWS/ECS",
        "StatisticType": "Statistic",
        "Statistic": "AVERAGE",
        "Unit": "Percent",
        "Dimensions": [{
            "value": "testservice",
            "name": "ServiceName"
        }, {
            "value": "test",
            "name": "ClusterName"
        }],
        "Period": 60,
        "EvaluationPeriods": 1,
        "DatapointsToAlarm": 1,
        "ComparisonOperator": "LessThanThreshold",
        "Threshold": 11.700000000000001,
        "TreatMissingData": "breaching",
        "EvaluateLowSampleCountPercentile": ""
    }
}

This code will let you send the pager duty alarm.

import boto3
import json
sns = boto3.client('sns')
topic_arn = 'arn:aws: sns:us-east-1:123123123123:pd-sns'
# Must set the derive name from "Alarm Descrition" in pager duty
# Set the alarm type must be [ ALARM | OK ]
alarm_type = "ALARM" # To create a new incident
# alarm_type = "OK" # To clear an existing incident
alarm_message = "This is an alarm about an alarm" 
# The alarm message is not parsed from the subject field and is not visible in PD
# Only the AlarmDescrition from the message body is used.
subject = f"{alarm_type}: {alarm_message}" # The space after the : is important
message = {
    "AlarmDescription": "other seceiption",
    "NewStateValue": alarm_type,
    # The above must be unchanged
    # Put any json data you want here
    "foo": "bar"
}
message_str = json.dumps(message)
response = sns.publish(
    TopicArn=topic_arn,
    MessageStructure="string",
    Message=message_str,
    Subject=subject
)

This is a small tool that was run behind ngrok which the SNS https subscription was pointed at to inspect the SNS content of a Cloudwatch alarm payload.

"""
Very simple HTTP server in python for logging requests
Usage::
    ./server.py [<port>]
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging

class S(BaseHTTPRequestHandler):
    def _set_response(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()

    def do_GET(self):
        logging.info("GET request,\nPath: %s\nHeaders:\n%s\n", str(self.path), str(self.headers))
        self._set_response()
        self.wfile.write("GET request for {}".format(self.path).encode('utf-8'))

    def do_POST(self):
        content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
        post_data = self.rfile.read(content_length) # <--- Gets the data itself
        logging.info("POST request,\nPath: %s\nHeaders:\n%s\n\nBody:\n%s\n",
                str(self.path), str(self.headers), post_data.decode('utf-8'))

        self._set_response()
        self.wfile.write("POST request for {}".format(self.path).encode('utf-8'))

def run(server_class=HTTPServer, handler_class=S, port=8080):
    logging.basicConfig(level=logging.INFO)
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    logging.info('Starting httpd...\n')
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()
    logging.info('Stopping httpd...\n')

if __name__ == '__main__':
    from sys import argv

    if len(argv) == 2:
        run(port=int(argv[1]))
    else:
        run()

Javaยถ

package org.example;


import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.sns.AmazonSNS;
import com.amazonaws.services.sns.AmazonSNSClientBuilder;
import com.amazonaws.services.sns.model.PublishRequest;
import com.amazonaws.services.sns.model.PublishResult;
import com.google.gson.Gson;

import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        AmazonSNS snsClient = AmazonSNSClientBuilder.standard()
                .withRegion(Regions.US_EAST_1).build();
        String topicArn = "arn:aws:sns:us-east-1:123123123123:brent";
        String alarmType = "ALARM"; // Use "OK" to clear an existing incident
        String alarmMessage = "This is only a test";
        String subject = alarmType + ": " + alarmMessage; // Space after ':' is important

        Map<String, String> message = new HashMap<>();
        message.put("AlarmDescription", "Brents Test ALARM");
        message.put("AlarmName", "Brents Test ALARM FROM JAVA");
        message.put("NewStateValue", alarmType);
        // Add any additional JSON data here
        message.put("foo", "bar");

        Gson gson = new Gson();
        String messageStr = gson.toJson(message);

        PublishRequest publishRequest = new PublishRequest()
                .withTopicArn(topicArn)
                .withMessage(messageStr)
                .withSubject(subject);
        System.out.println(messageStr);
        PublishResult publishResponse = snsClient.publish(publishRequest);
        System.out.println(publishResponse);
    }
}

Comments

comments powered by Disqus