Administrators would agree that Network Monitoring tools are living embodiment of “Heart” in a Service Provider ecosystem. This bionic heart monitors & manages servers, applications of customer networks in an uninterrupted manner and floats their boat.
Thought I'd share an instance where more than one heart skipped a beat :)
OpManager is an application that has been trusted by thousands of Service Providers for more than a decade. In my recent consultation programs, I happened to engage closely with one such customer using OpManager since 2007.
The customer is a Service Provider monitoring the networks of credit unions & financial institutions. Their live instance has a ginormous OpManager database (25 GB database) & monitors over a 1000 devices in a heterogeneous environment spread across 40 client networks. My task was to upgrade this OpManager instance to the latest build & then to migrate the OpManager server to a ‘hip’ n beefy VM .
Detailed procedures with backup plans were drafted and re-visited umpteen times prior to the D-Day. As planned, the application was stopped at 6.30 pm after approval from the management and their clients.
The application upgrades were smooth, with testing plans executed in-between upgrades. The migration of OpManager instance was a breeze and the upgrade,migration task was completed by 11 pm.
The new instance of OpManager was live & monitoring all client networks except one critical client network. During testing, we found the customer’s client network was not reachable from OpManager server and all their devices were reported to be down. On trying a manual ping from any other server, we were able to reach the customer’s client network.
In this hair-trigger situation,the customer’s client was appraised of the issue and in parallel to the troubleshooting steps,a temporary instance of OpManager was installed to monitor this network from a different server.
Several rounds of troubleshooting involving Network routes, DNS lookups , adding static routes, rollback to old instance went in vain as the heartbeat was offline with respect to this client’s network.
The ISP’s priority helpline was put to full use and by 5 am we made headway as the client’s network was back online in OpManager .
This esteemed customer continues to love OpManager for its robust monitoring, alerting abilities and is now also evaluating our plugins/applications that integrates with OpManager . Migration to the highly scalable OpManager Large Enterprise Edition ( LEE ), is also on the cards right now. I'm sure that you'll agree with me that for those in customer facing roles, there is nothing more gratifying than putting a smile on the customer's face. I headed home with a smile too :)
Thanks for reading this & I'd sure love to hear your story when your network skipped a Heart Beat!