We have been trying to monitor the state of a series of scripted jobs we run on one of our servers and have been using the Syslog feature to do so.
Our syslog entries have "ERROR" and "OK" substrings, and the script name within each message, e.g.
[INSTANCEA_standby_mover.sh] ERROR : file could not be transferred
[INSTANCEA_standby_mover.sh] File transfer OK
[INSTANCEB_standby_mover.sh] ERROR : file could not be transferred
[INSTANCEB_standby_mover.sh] File transfer OK
Currently, our 'Critical' match is (?=.*_standby_mover)(?=.*ERROR) and our 'Clear' match is (?=.*_standby_mover)(?=.*OK)
We have had the following issues with this and are wondering if there are any workarounds or plans to improve this feature:
We are unable to make a 'clear' rule for the OK messages that takes into account the previous fail rule, e.g. INSTANCEB/OK can clear INSTANCEA/ERROR and we would like for it to be possible that INSTANCEB/OK can only clear INSTANCEB/ERROR without having to make separate Syslog Rules for each INSTANCEx. Can OpManager (be made to) understand replacement/group RegEx syntax, so that (?=(.*)_standby_mover)(?=.*ERROR) might be cleared with (?=$2_standby_mover)(?=.*OK), or similar?
We appear to get 'Clear' notifications, even if there were no original 'Critical' state. The 'File transfer OK' message is used by our script to denote any successful transfer; and successful transfers logically clear previous Error-state transfers. We should not need to be notified every time a transfer was successful but we would want to know if it failed and we would want to know if succeeded again subsequent to the failure. Can OpManager (be made to) only send 'Clear' notifications if that rule is already in 'Critical'?
OpManager's Regex appears not to apply to syslog tags, only to message text, but when using the Syslog Viewer, both the tag and the message text appear under the heading 'message text'. This makes it confusing as to what can be matched in the rule and what can't.
Similarly, there seems to be no way to test your Regex against a Syslog message to determine whether it would fall under a rule. Have I missed something or is this something that could be added to OpManager?