As someone responsible for deploying and implementing IT management solutions, we might come across a lot of random content on the web that prescribe thresholds as part of best practices. We assume that these recommendations cannot go wrong and go ahead with updating the tools with the 'best' thresholds. The monitoring tools then start firing alerts based on the set thresholds. Now, you face a new set of challenges.
The thresholds you configured may not at all be appropriate for your environment. Let's say the server that runs Oracle DB has an expected memory usage of 90%. However, based on standards recommended you set the value at 75%. You end up being bombarded with unwanted alerts and are forced to finally reconfigure the threshold on the solution that actually suits and is relevant to your environment.
Its important that we understand the metrics and the reasoning behind its configuration. Have seen many users configuring threshold for the overall disk utilization(all drives combined) in OpManager and AppManager. One bad day, they come back complaining that the tool didn't alert them when the F drive where SQL log files are written was full. The problem is simple and straightforward. The users overlook configuring thresholds for individual drives and end up in deep trouble.
I'd therefore suggest that you stop searching for the standards. Allow your tool to monitor the performance for a few days, observe the behavior and set baseline values based on its past performance. Now configure thresholds for relevant metrics. After all, no one knows your network better than you!
Thank you
Rameshkumar Ramachandran