Crashed WebLogic server causes AppManager checks to get blocked

Crashed WebLogic server causes AppManager checks to get blocked

Hi,

We use AppManager to monitor various WebLogic managed servers. The application which was deployed in one of these WebLogic managed servers got stuck in a deadlock (application issue) and other managed servers which were part of the same WebLogic cluster also crashed due to this (with out of memory errors). Strangely enough the java process (the managed server) still runs at operating system level. According to netstat, this process is still listening on it's webserver port. When a connection attempt is being made to this port, the connection will get stuck in the SYN_SENT state and will only time out after a long period of time.

Normally we expect the AppManager to detect these kind of errors, but apparently this doesn't seem to be the case for our environment. I suspect that the AppManager is getting in a situation where it tries to connect to the broken managed server, but doesn't time out on it and will keep waiting for any data to arrive (which won't happen in our current situation). Eventually, this will end up in other checks not being measured and making the results presented in the AppManager unreliable (as they're outdated).

When AppManager got in this situation, I tried adding some new HTTP checks. Adding these checks went fine, but after creation they remained in a state where they were still waiting for an initial check result.

Our environment contains about 30 different WebLogic managed servers. Various of these are grouped together in
WebLogic clusters. Half of these servers use WebLogic 9.2 while the other half uses WebLogic 10.2. The broken WebLogic managed server is one using WebLogic 10.2. The version of AppManager we have installed is 9.3.

I just tried to generate a stacktrace and here's what I think is the blocking thread:
DataCollection-1 WAITING 5 false
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Unknown Source)
at com.sun.corba.se.impl.transport.CorbaResponseWaitingRoomImpl.waitForResponse(Unknown Source)
at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.waitForResponse(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.waitForResponse(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(Unknown Source)
at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.non_existent(Unknown Source)
at org.omg.CORBA.portable.ObjectImpl._non_existent(Unknown Source)
at weblogic.corba.j2ee.naming.ORBHelper.getORBReferenceWithRetry(ORBHelper.java:590)
at weblogic.corba.j2ee.naming.ORBHelper.getORBReference(ORBHelper.java:559)
at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:85)
at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFactoryImpl.java:31)
at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:46)
at javax.naming.spi.NamingManager.getInitialContext(Unknown Source)
at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source)
at javax.naming.InitialContext.init(Unknown Source)
at javax.naming.InitialContext.<init>(Unknown Source)
at weblogic.management.remote.common.ClientProviderBase.makeConnection(ClientProviderBase.java:169)
at weblogic.management.remote.common.ClientProviderBase.newJMXConnector(ClientProviderBase.java:81)
at javax.management.remote.JMXConnectorFactory.newJMXConnector(Unknown Source)
at javax.management.remote.JMXConnectorFactory.connect(Unknown Source)
at com.adventnet.adaptors.clients.weblogic.WebLogic90Client.lookupMBeanServer(WebLogic90Client.java:470)
at com.adventnet.adaptors.clients.weblogic.WebLogicClient.connect(WebLogicClient.java:296)
at com.adventnet.adaptors.clients.weblogic.WebLogic90Client.connect(WebLogic90Client.java:454)
at com.adventnet.appmanager.server.framework.datacollection.AMJMXConnectorFactory.getClient(AMJMXConnectorFactory.java:247)
at com.adventnet.appmanager.server.framework.datacollection.AMMBeanServerCacheHandler.lookUp(AMMBeanServerCacheHandler.java:48)
at com.adventnet.appmanager.server.framework.datacollection.AMDataCollector.lookupMBeanServerAndDeployAgent(AMDataCollector.java:1201)
at com.adventnet.appmanager.server.framework.datacollection.AMDataCollector.getCollectedData(AMDataCollector.java:303)
at com.adventnet.nms.applnfw.datacollection.server.PerformDataCollection.run(PerformDataCollection.java:88)
at com.adventnet.management.scheduler.WorkerThread.run(WorkerThread.java:68)































If wanted I can send to complete stacktrace.

Is it possible to set a timeout on the WebLogic checks to prevent them from blocking other checks? Or is this issue already resolved in AppManager 10.0?

















                New to ADSelfService Plus?