Notification Policies and Maintainence


#1

Hello OpsGenie Community!

I’m trying to understand how maintenance windows and notification policies work together.
I have alerts coming in from solarwinds, and maintenance windows from ServiceNow. I’m want to suppress the alerts coming in from solarwinds (or rather delay them) during the maintenance window so that when the window is over, I still get the alerts that were suppressed/delayed during that window. To further complicate things, there may be multiple maintenance windows.

This may be an API related question, but the current solution I’m thinking of is to create policies based on the maintenance windows (from ServiceNow) and delay them for the amount of minutes that the window lasts. However, that seems to be sidestepping the maintenance functionality in OpsGenie. Has anyone had a similiar use case, and if so, can you recommend a best practice?

Maybe I’m just missing something simple with regards to how maintenance works in opsgenie, but it seems to be suited more towards policies that adhere to the same time.

Thank you,

Arun


#2

Hey Arun,

I think you are on the right track. Maintenance windows in OpsGenie are almost always used together with Policies. The maintenance feature serves a single purpose: ability to enable/disable functions automatically in the future, for a single time period.

There are 4 typical case of using maintenance windows, as far as my experience goes. When the scheduled time comes, a maintenance window could:

  • Enable a Notification Policy, which is suppressing a subset of alerts created during the window.

  • Enable a Notification Policy, which is delaying alerts created during the window. This can be a specific hour (First 3:00 for example) or minutes relative to the alert creation. If all your windows ends around the same time (not uncommon), settings up the specific time in the policy is very neat.

  • Disable an integration completely. This is the more barbaric version, but honestly, it does work as well and is indeed the solution which is minimizing the noise.

  • Enable an Alert Policy which is tagging alerts. Additional policies on the team level could do the suppress/delay function. It really only works if your alerting is owned by the teams themselves and there is a good default config across all teams. Probably this is not the solution you are looking for.

Once the window ends, either, the Policy is disabled or the Integration is re-enabled automatically.

In short: the maintenance feature is not a lot more then a scheduler. You can set up a future date, when your policy will be enabled and disabled afterwards, automatically.

Hope this helps!


#3

@Daniel Thanks for your response! Good to know that I’m on the right track (I’m new to OpsGenie).
I realized a few things:

  1. Disabling the integration would be ideal (as it is the easiest solution). However also not ideal because any other communications wouldn’t be passed.

  2. Using policies would mean creating individualized policies as each window would have a different delay. Based on this, the policies would need to be deleted after use (as they would likely be random amounts of minutes to delay). This can be done, but would require more API/coding to accommodate. I’m glad the API is pretty decent, and will allow me to do this nonetheless.

  3. I also thought of a case where I might create different policies based on different preconfigured time intervals (30 min/1 hour/2 hour/3 hour/etc) so that if a maintenance policy matches the downtime, it would select that policy (with a matching delay). The only issue then becomes what would happen if there are multiple policies for let’s say 1 hour back to back or overlapping intervals. I believe OpsGenie knows how to handle that, and thinking this might be a good option (maybe better than option 2)

  4. After giving this all a lot of thought, maybe I can make a feature request for this:
    Have a policy option that indicates the following:
    (Delay for time period matching maintenance routine)
    This would ensure that during a maintenance window that you could have ONE policy so that alerts are delayed appropriately during that window. What do you think? If you think it isn’t a bad idea, is there another place in the forum to create a feature request?

Thank you,

Arun