Various issues preventing implementation

Various issues preventing implementation

Our company purchased OpManager a few years ago (maybe v3?) and stopped using it after a year due to excessive issues.  We ended up going to HP SiteScope, which, with its latest versions, gone to crap and is no longer good enough for us.

The latest OpManager sure seems to have a lot of improvements over the old one, and many of the issues we used to have are no longer there, but unfortunately a few more have creeped up.

Here are the "serious" issues we have, that right now are a serious negative on implementation:
#1.  User access is "read only" or "full control".  We need more granular control.  For example, some people may only need the ability to schedule maintenance, disable or acknowledge alerts, and view everything.

#2.  The application has stopped responding twice, I have no idea why, and that is disconcerting.

#3.  There is no API, or even good database documentation I can find.  This would help with various automated tasks.  For example when we push out web code to 20 web servers, it would be nice to be able to use our code push process to also disable alerts for those 20 servers, so while the code is being pushed nobody gets a page.

#4.  I added a CSV of 1000 servers and used the discovery feature on it.  Only about a third of those servers ended up being added, though all showed up in the list of which ones in the file you want to add.


These are some "annoyances" I have encountered or various suggestions:
- Ability to Edit Credentials (Right now you can only delete and add new)
- Some of the popup windows don't have scroll bars, so you have to select text and drag to scroll
- When you click on troubleshoot a monitor that is empty, and click more, it opens in the same window which can't be maximized
- When looking at a infrastructure view, when it shows servers and monitors, the ability to click on a server/monitor and see a graph or link or something would be nice, not just on little icons
- Choose the columns seen on list views, do not hard-code to CPU(%) and Memory(%) (we may want to see free space, or some service, etc).  We could have pages for disk space, some for CPU/Memory, others for disk IO
- Ability to disable alerts (not monitoring) for an infrastructure group
- It would be nice if devices could be in more than 1 infrastructure group (basically just pointers)
- The ability to tier devices, so like we could have DATA_CENTER->CAGE->RACK->SERVER
- The network dependency view is cool, what about a dependency view for servers?
- Device templates that look at a name, so part of a name could indicate if its a SQL or WEB windows 2003 server
- Is there any way that the Event_YYYY_MM_DD_YY that has syslog messages can include columns that actually parsed the messages, instead of TEXT which makes it hard to query the table for DISTINCT
- The "Jump To" list up top displays outside the browser window
- Add CSV import for URL monitors
- Add "export configuration" option, to make it easy to back up the server configuration, without all the collected data
- You can't rename URL monitors
- Discovery of type of device is still very inaccurate
- All screens need to auto-refresh after a couple minutes
- Rate of increase/decrease/consumption/etc for monitors (like how much disk space is used per month or CPU increase over last 6 months, etc)
- How can I change various parameters on all monitors of a specific type, on specific servers?
- Dependencies for a monitor, so if a server is mostly broke we don't get 20 pages for each monitor on it (make all the URL monitors dependent upon the W3svc working for example)
- Multiple dependencies, not just 1
- Allow a monitor to determine if alerts go off. We could set a reg key, or the existence of a file, that would prevent alerting from going off on that particular device
- Ping/URL monitors need a threshold for failures so they don't alert on the first one
- Ability to use an instant messagener to send alerts would be nifty
- The report generator is too hard-coded.  If I create new monitors there is no way to run reports with them.
- Allow notes on alerts to be tied to the alert string, not just the device+alert string.  If we have 100 web servers, and its a web error, the same note could apply to all 100 servers and would help with troubleshooting.
- When a windows service fails, allow running a script, not just start/restart.
- Event log rules should be able to be applied to a particular server.  Some errors on some servers are MUCH more critical than the same errors on other servers.
- SQL Queries as a monitor? The ability to use a SQL query returned value as a monitor could replace a number of our custom applications that do monitoring on those values now.
- Some of the default graphs on the server pages seem to rely on SNMP, can't they just use WMI as an alternative so for the hundreds of non-SNMP windows servers we have we can see the pretty little graphs for CPU/Memory/etc?
- Apparently the business view designer hasn't been updated since the old version, and its just crappy.  Any way to do something like HTML files, and simply replace strings like %SERVERNAME% or %STATUSICON% within the file so the maps could be easier to make and look better?
- How about a single page, with no menus, with *all* devices and a single small icon for status.  This could be nice to put on our monitoring wall to see just how many issues there are at any given time.  This could be done easily with an API or sql query.
- The ability to filter the alarm page by infrastructure types.

I'm sure there are more issues, as I haven't yet tested network equipment, MSSQL, or Linux.                                            
Sooooo, how much of this stuff is "fixed in the next release", "fixed by extensive SQL documentation", or "Nope, never happen"?

Or, since we've exhausted all the monitoring solutions on the market, we just write our own in house :(

Thanks,
Eric S. Smith
Match.com

                  New to ADSelfService Plus?