Making your Enterprise Solutions (ESB, EAI, SOA, other) proactive with OpsGenie


#1

Regardless of the tool, architecture, or framework used within your organization to integrate your enterprise applications (webMethods/Software AG, Informatica, Mule, Oracle, IBM, Tibco, others), these solutions will periodically experience infrastructure, process and data processing exceptions which need to be reported and addressed ASAP. Many monitoring tools (Datadog, Loggly, Nagios, SCOM, Solarwinds, etc.), infrastructure techniques (HA, etc.) and programming practices (logging) can be used to address some portions of the overall problem. However, what are you currently using to call attention to exceptions encountered within the business process itself? For example, to address issues like SLA violations, data processing exceptions, external service failures, etc.? In a timely fashion? How effective are they? Do you really want to wait for someone to see and read an email, or to review an applications logs/UI, in order to determine that a step in your business process could not be completed? Or, is at risk of violating an SLA?

This topic provides a specific example of how you could leverage OpsGenie’s Alert API in order to enhance your notification process such that someone could immediately be notified to take action when business process exceptions occur (notifications via Voice, SMS, Mobile Application, etc.). For the purpose of significantly simplifying the steps involved, let’s presume the following:

  1. Your SOA, EAI, ESB solution supports the capability to make HTTP POST calls sending JSON data.
    For this example, I will be referencing Software AG’s webMethods Integration Server (wMIS) solution. Specifically, I will be leveraging wMIS’s pub.client:http method. Fortunately, most tools and programming frameworks provide similar functionality.
  2. You are reasonably comfortable implementing API calls from within your organizations tools framework.
  3. You have an OpsGenie account that has been configured with Users, Teams, Routing Rules, Schedules and Escalations. This will allow for OpsGenie to route the Alerts created to the right individuals to take immediate action (Business Analysts, DBA’s, Network Admins, their bosses - whomever is appropriate).
  4. You have taken a look at OpsGenies API Framework (https://docs.opsgenie.com/docs/alert-api).

With those presumptions in place, the included image was taken from a Software AG webMethods Integration Server instance and annotated in order to depict the values which need to be set on the pub.client:http method to create an Alert in OpsGenie.

This is just one of many possibilities for leveraging OpsGenie’s capabilities within your enterprise. If you have others that you would like to explore, feel free to engage us here to discuss the possibilities.


#2

Providing another scenario where OpsGenie might come in handy in your ESB/SOA: Monitoring situations where you have jobs (Mainframe/JCL, etc.) or other processes (BPM’s, etc.) which run on a set schedule.

Occasionally, those jobs might not complete on schedule. In such cases, you might want more than just an email to let you know of this condition. After all, when was the last time that an email woke you up when there was a problem that required immediate attention?

Above I referenced OpsGenies CreateAlert API. However, OpsGenie also provides a Heartbeat API (https://docs.opsgenie.com/docs/heartbeat-api). The configuration would be the same as the image above for webMethods IS. You would just be using a different URL and a different JSON payload. Examples should be available on the docs page.

The idea is that upon job completion, you would call the Heartbeat API. Then, as long as the message is received by OpsGenie within the specified time period for the Heartbeat, no Alert is generated. However, as soon as the job is late - no “heartbeat ping” was received within the specified time period - an Alert would be generated within OpsGenie and the Notifications would start flowing to the appropriate people.

Feel free to propose any other ideas/scenarios and I will see what additional solutions we can come up with.