Pagerduty Cloudwatch integrationยถ
It is possible to send your own custom payload to the Pagerduty Cloudwatch integration from a Lambda (instead of via a Cloudwatch alarm). Pagerduty does not document the internals but if you publish a custom message to the SNS topic that you have a HTTPS subscription to Pager duty following these simple rules you will see the event in Pagerduty.
PagerDuty Integration Config:
Derive name from
should be set toAlarm Description
If set to default it will not work because it parses data suchas
Trigger.Statistics
to generate the name.
SNS Subject:
The message subject is important it must start with
ALARM:
# Note the space after the colon
It doesnโt matter what you put after the colon it will not be processed by PD or visible at all in PD.
The alarm status (in this case
ALARM
) must match theNewStateValue
in the SNS message body or it will be discarded.You can also clear the incident in Pagerduty by following the above rules and replacing
ALARM
withOK
SNS Message:
The integration is very strict when it parses the JSON message any slight syntax errors will cause it to be discarded
You can put anything else you want into the JSON payload and it will be visible in Pagerduty.
A minimal
message
looks like this:
{
"NewStateValue": "ALARM",
"foo": "bar"
}
This is what Cloudwatch SNS sends to Pagerduty.
{
"Type" : "Notification",
"MessageId" : "c2228c71-f550-5e3d-b92c-d7dada9f6d76",
"TopicArn" : "arn:aws:sns:ap-southeast-1:003422198502:testbc",
"Subject" : "ALARM: \"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c...\" in Asia Pacific (Singapore)",
"Message" : "{\"AlarmName\":\"2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"AlarmDescription\":\"DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.\",\"AWSAccountId\":\"003422198502\",\"AlarmConfigurationUpdatedTimestamp\":\"2022-09-26T04:41:30.103+0000\",\"NewStateValue\":\"ALARM\",\"NewStateReason\":\"Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).\",\"StateChangeTime\":\"2022-09-26T04:41:51.610+0000\",\"Region\":\"Asia Pacific (Singapore)\",\"AlarmArn\":\"arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3\",\"OldStateValue\":\"INSUFFICIENT_DATA\",\"OKActions\":[],\"AlarmActions\":[\"arn:aws:sns:ap-southeast-1:003422198502:testbc\",\"arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e\"],\"InsufficientDataActions\":[],\"Trigger\":{\"MetricName\":\"CPUUtilization\",\"Namespace\":\"AWS/ECS\",\"StatisticType\":\"Statistic\",\"Statistic\":\"AVERAGE\",\"Unit\":\"Percent\",\"Dimensions\":[{\"value\":\"testservice\",\"name\":\"ServiceName\"},{\"value\":\"test\",\"name\":\"ClusterName\"}],\"Period\":60,\"EvaluationPeriods\":1,\"DatapointsToAlarm\":1,\"ComparisonOperator\":\"LessThanThreshold\",\"Threshold\":11.700000000000001,\"TreatMissingData\":\"breaching\",\"EvaluateLowSampleCountPercentile\":\"\"}}",
"Timestamp" : "2022-09-26T04:41:51.652Z",
"SignatureVersion" : "1",
"Signature" : "Zr8NlG6+KlEfOcj1ZS96BU4Z3K3aKWpJpf8pWc9/u84rbG6Q5kPdqJEY0jiLK4WCbEwmrZFols/ULvKB/W0Z5goBnyQmMlW7XIxpDIoU7I4aGd9XvQNyDed/TEUQ3IK280PerWmBRPPsxgTKN48emazGbch5Ea84DThT/tpw8L98KvC0yzgV04mB2fPgXGdytoRupn/bYitwcgTkkccynzHFHDAWCQkhcYql/wCt41eANLtIAfbdg02uKVs44LPwcoiJv5fO/jo/qMOQZd7i2xNBh6yD9Vn8kkNE6FCmEiIzRmiiOA6sqB9HZB/xQueBhJz/kboyR/Qe6IMpcjb21A==",
"SigningCertURL" : "https://sns.ap-southeast-1.amazonaws.com/SimpleNotificationService-56e67fcb41f6fec09b0196692625d385.pem",
"UnsubscribeURL" : "https://sns.ap-southeast-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:ap-southeast-1:003422198502:testbc:894babc8-8186-4b49-b68d-ff18e204e59a"
}
Cleaned up Message
field extracted from above:
{
"AlarmName": "2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
"AlarmDescription": "DO NOT EDIT OR DELETE. For TargetTrackingScaling policy arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e.",
"AWSAccountId": "003422198502",
"AlarmConfigurationUpdatedTimestamp": "2022-09-26T04:41:30.103+0000",
"NewStateValue": "ALARM",
"NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [0.010321114212274551 (26/09/22 04:40:00)] was less than the threshold (11.700000000000001) (minimum 1 datapoint for OK -> ALARM transition).",
"StateChangeTime": "2022-09-26T04:41:51.610+0000",
"Region": "Asia Pacific (Singapore)",
"AlarmArn": "arn:aws:cloudwatch:ap-southeast-1:003422198502:alarm:2TargetTracking-service/test/testservice-AlarmLow-12f1d1bb-c839-47f4-9b31-b1c4f8e2aec3",
"OldStateValue": "INSUFFICIENT_DATA",
"OKActions": [],
"AlarmActions": ["arn:aws:sns:ap-southeast-1:003422198502:testbc", "arn:aws:autoscaling:ap-southeast-1:003422198502:scalingPolicy:e13f37fd-a0ae-48c2-bfae-9d7b1fb80803:resource/ecs/service/test/testservice:policyName/ttt:createdBy/3ef1f6b6-e824-4ad4-bb81-ee0c7460124e"],
"InsufficientDataActions": [],
"Trigger": {
"MetricName": "CPUUtilization",
"Namespace": "AWS/ECS",
"StatisticType": "Statistic",
"Statistic": "AVERAGE",
"Unit": "Percent",
"Dimensions": [{
"value": "testservice",
"name": "ServiceName"
}, {
"value": "test",
"name": "ClusterName"
}],
"Period": 60,
"EvaluationPeriods": 1,
"DatapointsToAlarm": 1,
"ComparisonOperator": "LessThanThreshold",
"Threshold": 11.700000000000001,
"TreatMissingData": "breaching",
"EvaluateLowSampleCountPercentile": ""
}
}
This code will let you send the pager duty alarm.
import boto3
import json
sns = boto3.client('sns')
topic_arn = 'arn:aws: sns:us-east-1:123123123123:pd-sns'
# Must set the derive name from "Alarm Descrition" in pager duty
# Set the alarm type must be [ ALARM | OK ]
alarm_type = "ALARM" # To create a new incident
# alarm_type = "OK" # To clear an existing incident
alarm_message = "This is an alarm about an alarm"
# The alarm message is not parsed from the subject field and is not visible in PD
# Only the AlarmDescrition from the message body is used.
subject = f"{alarm_type}: {alarm_message}" # The space after the : is important
message = {
"AlarmDescription": "other seceiption",
"NewStateValue": alarm_type,
# The above must be unchanged
# Put any json data you want here
"foo": "bar"
}
message_str = json.dumps(message)
response = sns.publish(
TopicArn=topic_arn,
MessageStructure="string",
Message=message_str,
Subject=subject
)
This is a small tool that was run behind ngrok which the SNS https subscription was pointed at to inspect the SNS content of a Cloudwatch alarm payload.
"""
Very simple HTTP server in python for logging requests
Usage::
./server.py [<port>]
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging
class S(BaseHTTPRequestHandler):
def _set_response(self):
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers()
def do_GET(self):
logging.info("GET request,\nPath: %s\nHeaders:\n%s\n", str(self.path), str(self.headers))
self._set_response()
self.wfile.write("GET request for {}".format(self.path).encode('utf-8'))
def do_POST(self):
content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
post_data = self.rfile.read(content_length) # <--- Gets the data itself
logging.info("POST request,\nPath: %s\nHeaders:\n%s\n\nBody:\n%s\n",
str(self.path), str(self.headers), post_data.decode('utf-8'))
self._set_response()
self.wfile.write("POST request for {}".format(self.path).encode('utf-8'))
def run(server_class=HTTPServer, handler_class=S, port=8080):
logging.basicConfig(level=logging.INFO)
server_address = ('', port)
httpd = server_class(server_address, handler_class)
logging.info('Starting httpd...\n')
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
httpd.server_close()
logging.info('Stopping httpd...\n')
if __name__ == '__main__':
from sys import argv
if len(argv) == 2:
run(port=int(argv[1]))
else:
run()
Javaยถ
package org.example;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.sns.AmazonSNS;
import com.amazonaws.services.sns.AmazonSNSClientBuilder;
import com.amazonaws.services.sns.model.PublishRequest;
import com.amazonaws.services.sns.model.PublishResult;
import com.google.gson.Gson;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) {
AmazonSNS snsClient = AmazonSNSClientBuilder.standard()
.withRegion(Regions.US_EAST_1).build();
String topicArn = "arn:aws:sns:us-east-1:123123123123:brent";
String alarmType = "ALARM"; // Use "OK" to clear an existing incident
String alarmMessage = "This is only a test";
String subject = alarmType + ": " + alarmMessage; // Space after ':' is important
Map<String, String> message = new HashMap<>();
message.put("AlarmDescription", "Brents Test ALARM");
message.put("AlarmName", "Brents Test ALARM FROM JAVA");
message.put("NewStateValue", alarmType);
// Add any additional JSON data here
message.put("foo", "bar");
Gson gson = new Gson();
String messageStr = gson.toJson(message);
PublishRequest publishRequest = new PublishRequest()
.withTopicArn(topicArn)
.withMessage(messageStr)
.withSubject(subject);
System.out.println(messageStr);
PublishResult publishResponse = snsClient.publish(publishRequest);
System.out.println(publishResponse);
}
}
Comments
comments powered by Disqus