Retries before Alerting - Suggestions required
Hi all,
We are moving in the direction of providing retries (before generating an alert) at a Monitor/attribute level instead of at a "Global" level. (The current global options are available under "Admin" --> "Global Settings" --> "Retry Polls before Reporting an Error") . This is done as we require such retries option for certain monitors and their attributes and for some these retries need not take effect (as requested by many of you ).
Retries need to be configured for
1. Availability
2. Numeric/String attributes (like Response Time, CPU Utilization etc.,)
For availability , we will provide a separate option on a per monitor basis so that you can configure the retries on a Monitor level
For other attributes, we are planning to provide the retries option at a Threshold level. i.e., if you create and associate a Threshold Profile to the "Response time" attribute for all "tomcat servers", then the retries will take effect for all these attributes. Changes to the Threshold Profile will affect the retry options for all these attributes. This way you need not configure retries for each tomcat monitor --> Response Time and also you still have the option to control retries at the attribute level.
Drawback: If a particular Threshold is associated to 10 attributes and if you need to configure a different retry for one of the attributes, it cannot be done using the already associated Threshold. However it can be achieved by creating and associating another Threshold with a different retry configuration.
We have decided to take the above approach after analysing the following options as well.:
* If we provide retries at a Monitor level then it seems to be a simple option and takes effect for all attributes (including availability). However, having a common retry for all attributes in a monitor does not provide an option to configure retries upto the attribute level and hence is not useful. For E.g., For CPU utilization, we may want to retry 3 times before alerting whereas for the Disk Usage of the same system, retries will not be required. Such a case is not possible if we provide the retry option at a Monitor level.
* We have also been thinking of putting up configuration of retries for each attribute of each monitor independently so that the user has full control but it becomes very tedious to configure such a retry option for each and every attribute of each Monitor. For E.g., For CPU utilization, first go to Monitor1, configure retries for CPU, go to monitor 2 and configure retries for CPU etc., The work involved becomes even more when we want to configure retries for more attributes (Response Time etc.,) and more monitors.
Please feel free to post your suggestions/inputs on the whether the approach we have taken meets your requirement(s) or not so that we can make sure we come out with the best options as part of this feature.
Thanks ahead.
- The Applications Manager Team -
New to ADSelfService Plus?