/ java.net.socketinputstream.socketread0 hangs thread ~ Java EE Support Patterns

4.28.2011

java.net.socketinputstream.socketread0 hangs thread

This article will provide you a detailed explanation on how you can identify, resolve and prevent Thread hang problems involved in a java.net.socketinputstream.socketread0 blocking IO operation.

Proper understanding of these concepts is quite important when developing and supporting a Java EE production system since lack of proper timeout and understanding will eventually bring your system to its knees via Thread depletion.

Background

Modern Java EE production systems usually rely heavily on remote service providers for its various business processes. Such service providers are referred as downstream systems. For example: you may need to pull your client account data from an Oracle database or consume a remote Web Service SOAP/XML data. Regardless of the protocol used (HTTP, HTTPS. JDBC etc.) the Java VM will eventually be requested to establish a Socket connection from your production server to the remote service provider environment and then write / read data to / from the Socket. This process is referred as a “blocking IO call”.

The Thread involved in this blocking IO call can get hang for either:


·         Socket.connect() operation (establish a new physical connection between your production server and your remote service provider such as an Oracle database listener, a Web Service URL etc.)
·         Socket.write() operation (send the data to the service provider such as a database query request / SQL, an XML request data etc.)
·         Socket.read() operation  (wait for the service provider to complete its processing and consume the response data such as results of a database SQL query or an XML response data)


The third operation is what we will cover today.

In order to better help you visualize this process, find below a high level graphical view of the different operations and interactions: 

Problem identification: JVM Thread Dump to the rescue! 

Generating a JVM Thread Dump is the best and fastest procedure to quickly pinpoint the source of the slowdown and identify which Socket operation is hanging. The example below is showing you a JBoss Thread hanging in a Socket.read() operation triggered from a Web Service HTTPS call. You can also refer to this JBoss problem post for the complete case study and root cause analysis.

The first line of the Thread Stack trace will always show you which type of Socket operation is hanging.

"http-0.0.0.0-8443-102" daemon prio=3 tid=0x022a6400 nid=0x1bd runnable [0x78efb000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
        at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:789)
        - locked <0xdd0ed968> (a java.lang.Object)
        at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:746)
        at com.sun.net.ssl.internal.ssl.AppInputStream.read(AppInputStream.java:75)
        - locked <0xdd0eda88> (a com.sun.net.ssl.internal.ssl.AppInputStream)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        - locked <0xddb1f6d0> (a java.io.BufferedInputStream)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:652)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
        - locked <0xdd0e5698> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
        at org.jboss.remoting.transport.http.HTTPClientInvoker.getResponseCode(HTTPClientInvoker.java:1269)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)  
 ………………………………………………………………………………………………

Root cause and resolution

As seen in the above diagram, the most common scenario for a hanging Socket.read() is a high processing time or unhealthy state of your remote service provider. This means that you will need to communicate with the service provider support team right away in order to confirm if they are facing some slowdown condition on their system.

Your applicaton server Threads should be released once the remote service provider system problem is resolved but quite often you will need to restart your server instances (Java VM) to clear all the hanging Threads; especially if you are lacking proper timeout implementation.

Other less common causes include:

-       Huge response data causing increased elapsed time to read / consume the Socket Inputstream e.g. such as very large XML data. This can be proven easily by analysing the size of the response data
-       Network latency causing increased elapsed time in data transfer from the service provider to your Java EE production system. This can be proven by running some network sniffer between your production server and the service provider and determine any major lag/latency problem

Problem prevention: timeout implementation!

Too many Thread hanging in a IO blocking calls such as Socket.read() can lead to a rapid Thread depletion and full outage of your production enviroment, regardless of the middle ware vendor you are using (Oracle Weblogic, IBM WAS, Red Hat JBoss etc,.).

In order to prevent and reduce the impact of such instability of your service providers, the key solution is to implement proper timeout for all 3 Socket operations.  Implementation and validation of such timeout will allow you to cap the number of time you are allowing the application server Thread to wait for a particular the Socket operation. Most modern communication APIs today allow you to setup such timeout fairly easily.

I also recommend that you perform negative testing in order to simulate such service provider instability / slowdown. This will ensure that your timeouts are working properly and that your production system will be able to survive during these negative scenarios.

Conclusion

I hope this article has helped you better understand the source and root cause of these hang Threads that you see in your environment and how to analyse and take some corrective actions. My next article will provide you with more detail on timeout implementations and methods available from various communication APIs.

Please don't hesitate to add a comment or email me if you still have questions or doubts on this type of problem.

59 comments:

Thank you for your post. It was of a great help for me :-)

Thanks Anonymous.

I'm glad this post has helped you fix your hang Thread problem.

P-H

You really didn't offer a solution, can you just point out some examples for the timeout implementations?

Hi anonymous and thanks for your comments,

The intent of this first post on Socket.read() hang was to provide a diagnostic approach and common hang scenarios.

Given the interest I’m seeing with this post, my goal is to publish a part #2 in about 1 week from now that will provide you with detailed solution / timeout implementations for the most common HTTP/HTTPS API (Weblogic JAX-WS/RPC, Apache HTTP client, Apache AXIS etc.) and based on my past experience with production Java EE systems.

In the meantime, can you please provide me with more detail on your communication API causing you problem / hang Thread and I will provide you with my recommendations from a timeout perspective. You can also include of snippet of the Thread Stack Trace from your captured Thread Dump; if available.

Very well explained! Not to long and easy to understand with the view and example thread dump. Thank you!

Great article. Really helpful. It would be geat if you provide details on time out implimention.

Thank you anonymous,

Please stay tuned since my next article will cover timeout implementation detail to help prevent stuck threads for various common communication API.

P-H

Excellent article

Now I am able to understand thread dump better then earlier.

Sujit

Hello, thanks for the good article.
I have a problem where I wait for the response of the database and already know the root cause.

But one thing interests me:

All my socketRead0 Threads which are having problems are in thread state runnable ("state: R" in javacore).

All my socketRead0 Threads which also connect to DB and running fine are in thread state "Waiting on condition" ("state: CW" in javacore).

Do you know the reason for this? On which condition could "SocketInputStream.socketRead0" wait in case for the healthy threads.

Hi Darren, this is a very good question.

The problem is that in stuck Thread scenario like this (hanging IO call etc.) the JVM Thread is healthy and still seen as RUNNABLE state. However from a middleware (Weblogic, WAS, JBoss etc.) perspective this Thread is definitely stuck. This is why middleware vendors like Weblogic have their own Thread state on top of native JVM states.

Unless the Thread is in a true deadlock scenario or waiting for a lock on an Object monitor etc. you will see Threads in “normal” / R state which is totally normal.

From a Thread Dump analysis perspective, my recommendation is to always pay much more attention to the actual Java Stack Trace and problem pattern vs. too much time on native Thread state. The Stack Trace will always give you the whole story and the first / top line will give you indication of where the Thread is stuck.

Hope this helps.
P-H

One more point, in order to prove a certain Thread stuck in that socketRead0 call, simply generate a few Thread Dump snapshots, lets say 5 seconds interval and check which Thread(s) are still in that condition.

You can also use the JMX API to dynamically generate and extract Thread Dump data remotely in a short amount of time and determine the running elapsed time of each Thread and its associated Stack Trace.

P-H

Hey P-H,

thanks for the fast reply.
Well I also focused on the last Stack Line. It was just for interest why the healthy and fast replied threads to DB are in "Wait on condition" and the blocked threads by DB were in "Runnable".

The healthy "Wait on condition" also dissappeared in the next thread dumps while the runnable waited since days for db response.

I assume that the healthy "wait on condition" threads wait on some operating system event and that they are snapshot mostly in this condition because this consume the longest time in this whole db communication.

I didn't know that Weblogic or some other vendors use different thread states. When they are using this on top of native JVM I also expect I can not see this states in a "normal" thread dump.

Thanks
Darren

I am having issue on unix server. am getting following error dump files:
"Thread-23" (TID:0x30082F98, sys_thread_t:0x42ED2E70, state:R, native ID:0x1E1F) prio=5
4XESTACKTRACE at java.net.SocketInputStream.socketRead(Native Method)
4XESTACKTRACE at java.net.SocketInputStream.read(SocketInputStream.java(Compiled Code))
4XESTACKTRACE at oracle.net.ns.Packet.receive(Unknown Source)(Compiled Code)
4XESTACKTRACE at oracle.net.ns.DataPacket.receive(Unknown Source)(Inlined Compiled Code)
4XESTACKTRACE at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)(Compiled Code)
4XESTACKTRACE at oracle.net.ns.NetInputStream.read(Unknown Source)(Compiled Code)
4XESTACKTRACE at oracle.net.ns.NetInputStream.read(Unknown Source)(Inlined Compiled Code)
4XESTACKTRACE at oracle.net.ns.NetInputStream.read(Unknown Source)(Compiled Code)
4XESTACKTRACE at oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java(Inlined Compiled Code))
4XESTACKTRACE at oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java(Inlined Compiled Code))
4XESTACKTRACE at oracle.jdbc.ttc7.Oall7.receive(Oall7.java(Compiled Code))
4XESTACKTRACE at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1894)
4XESTACKTRACE at oracle.jdbc.ttc7.TTC7Protocol.executeFetch(TTC7Protocol.java:956)
4XESTACKTRACE at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:2138)
4XESTACKTRACE at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:2015)
4XESTACKTRACE at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:2877)
4XESTACKTRACE at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:608)
4XESTACKTRACE at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:684)
4XESTACKTRACE at net.guardean.kernel.dao.KernelDao.waitForAlert(Unknown Source)
4XESTACKTRACE at net.guardean.kernel.AlertKernelStrategy.waitForStep(Unknown Source)
4XESTACKTRACE at net.guardean.kernel.Kernel.runKernel(Unknown Source)
4XESTACKTRACE at net.guardean.kernel.Kernel.mainInternal(Unknown Source)
4XESTACKTRACE at java.lang.reflect.Method.invoke(Native Method)
4XESTACKTRACE at net.guardean.kernel.KernelStarter.invokeStaticMethod(Unknown Source)
4XESTACKTRACE at net.guardean.kernel.KernelStarter.access$000(Unknown Source)
4XESTACKTRACE at net.guardean.kernel.KernelStarter$StarterThread.run(Unknown Source)


please suggest solution on it as its hampering Production site...thanx in advance...

Hi Ananymous,

This is an actual Thread Stack Trace showing a stuck Thread. This Stuck Thread is wainting in a socketRead() condition. Again, as per this article this means that it is waiting from the remote system. In this case, this is an Oracle database that we are dealing with.

It looks like that this code >> net.guardean.kernel.dao.KernelDao.waitForAlert() is attempting to insert / update some data to your Oracle database and this SQL is hanging or slow to execute (could be due to table or row level data lock etc.).

I recommend that you engage your DBA team asap. Please ask them to generate a AWR Report so they can attempt to identify this potential slow update SQL.

Also, what is your environmetn middleware software? This code appears to originates from a Net Guardean product / software e.g. are you using Net Guardian monitoring solution / agent solution? We can see this code is basically waiting for alerts to come in but hanging to internal SQL not performed properly on Oracle DB side.

Thanks.
P-H

Any suggestions or comments on fixing the below?


[java] java.net.SocketTimeoutException: Read timed out
[java] at java.net.SocketInputStream.socketRead0(Native Method)
[java] at java.net.SocketInputStream.read(SocketInputStream.java:155)
[java] at com.ibm.jsse2.a.a(a.java:177)
[java] at com.ibm.jsse2.a.a(a.java:117)
[java] at com.ibm.jsse2.jc.a(jc.java:285)
[java] at com.ibm.jsse2.jc.a(jc.java:245)
[java] at com.ibm.jsse2.e.read(e.java:1)
[java] at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
[java] at java.io.BufferedInputStream.read(BufferedInputStream.java:246)
[java] at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)
[java] at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)
[java] at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1115)
[java] at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1832)
[java] at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1590)
[java] at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:995)
[java] at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
[java] at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
[java] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
[java] at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324)

Hi Anonymous,

Actually this error is thrown because the remote HTTP server you are connecting to is not replying fast enough as per your configured timeout value. Read timeout means that Connection was established but response is not coming fast enough from the remote server.

I recommend 2 things:

- First review the timeout value confirmed from your code to Apache HttpClient e.g. 30 seconds? 60 seconds? Make sure it is not too low

- Contact the support team of this remote Web Service provider in order to see if they are facing any problem with their environment

Please keep me posted.

Thanks.
P-H

Hi ,

We are facing a issue with waiting threads in my SOA server.
below is the output from thread dumps.


"[ACTIVE] ExecuteThread: '22' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@5d8f794f WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '21' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@79e59f7c WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '20' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@68fdf3b3 WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@55ba70e5 WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '18' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@2cccf2e0 WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '17' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@291bfe83 WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '16' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@1ee94df WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '15' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@41ffba4f WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)

"[ACTIVE] ExecuteThread: '14' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock weblogic.work.ExecuteThread@48acb798 WAITING
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:162)
weblogic.work.ExecuteThread.run(ExecuteThread.java:183)


due to this we have observed cpu usage on my server.


Thanks in advance,
Sasidhar

Hi Sasidhar and thanks for your post,

Was this Thread Dump sample taken during an actual slowdown condition? The above Threads are actuall healthty e.g. waiting for some work to do e.g. waitForRequest() is the state Weblogic Threads are when waiting for a request to comes in. Thread Dump has to be taken during slowdown condition and when stuck/hogging Threads are observed.

Can you pleaser repost an actual Thread Dump during a slowdown condition?

Thanks.
P-H

hi P-H

can you please take a look at this stacktrace.

Name: RMI TCP Connection(4)-10.102.2.111
State: RUNNABLE
Total blocked: 4 Total waited: 0

Stack trace:
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked java.io.BufferedInputStream@1ba6ccc8
java.io.FilterInputStream.read(FilterInputStream.java:66)
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:517)
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
java.lang.Thread.run(Thread.java:662)

i cannot determine wich operation is actually running as sdtack trace is somewhat not specific. Application is just frozen. we run Spring on Jetty server and seems like backend at some point just hangs. Nothing happens. Log files are clear. We are desperate to know the true root cause for this.

The application run under moderate stress testing.


Thanks,
Dennis K.

Hi Dennis,

Do you have a few Thread Dump samples captured during the incident / hang condition. The above Thread is not really a concern as it is showing some RMI related operations. I'm more interested in Threads reaching out to your Spring layer from the Jetty server.

Hang situation be explained by many factors but let's first review the health of the Jetty Threads. Do you have also data from your JVM Heap & GC process captured during the stress test?

You can either post the full Thread Dump data via the root cause forum or email to me directly.
http://javaeesupportpatterns.blogspot.ca/p/java-ee-forum_27.html
phcharbonneau@hotmail.com

Regards,
P-H

Hi P-H,

we have one production sever last couple of days we are seeing high HTC[Hogging thread] in weblogic.
[ACTIVE] ExecuteThread: '161' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x01616bd0 nid=0x6d7eb runnable [0x41dfe000..0x41dffc70]
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at weblogic.servlet.internal.ChunkOutput.writeChunkTransfer(ChunkOutput.java:525)


which is causing slowdown of application, please need your suggestion on this...

Thanks in advance
Cooper

Hi Cooper,

Unless you are dealing with a large Http response payload to send back to your clients, changes are you are dealing with a global problem affecting the entire JVM, in this scenario, Threads will slowdown to operations like socketWrite since some of your clients already terminated the client socket. Can you please post the entire Thread Dump to the RCA forum or send it to me via email?

Also, can you please provide a history of your Java Heap & GC process if you have this data.

Thanks.
P-H

Hi P-H,

When will be the timeout implementations and methods available from various communication APIs post be ready.

Regards
Aravind

Hi Aravind,

This will be my next article published, probably in 1 week or less from now.

Regards,
P-H

hello, i have a basic application server and >100 clients communicating through socket. I have an error like this "java.net.SocketException: Software caused connection abort: recv failed at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)...". I hope you would help me. I don't know how to solve this.

Hi P-H
Thanks for the detailed explanation.
I have a question on socket timeout.
My Scenario:
I have one socket.read() operation which takes more than 10 mins to complete.But all my other read operations complete less than 1 min.
I want to set so_Timeout for my all operations.
Is there a way we can ask socket not to timeout on particular operation?
P.S: I know so_Timeout will not discriminate between the socket calls.Is there a way to handle this?
Thanks,
SB

Hi SB, what communication API are you using? That segregation must be done at the API level. Please share your communication API e.g. Apache commons http client etc. since the timeout is injection via those APIs.

Thanks.
P-H

I am getting this error in the javacore and an outage happened today in the production system.Can you tell me what is the probable issue and resolution which I can try out.

3XHNATIVESTACK Native Stack
NULL ------------
3XHSTACKLINE at 0xD30DD838 in NET_Recv
3XHSTACKLINE at 0xD30EDEA4 in Java_java_net_SocketInputStream_socketRead0
3XMTHREADINFO "WebContainer : 19" (TID:0x70C03400, sys_thread_t:0x39036CA8, state:R, native ID:0x5AD4) prio=5
4XESTACKTRACE at java.net.SocketInputStream.socketRead0(Native Method)
4XESTACKTRACE at java.net.SocketInputStream.read(SocketInputStream.java(Compiled Code))
4XESTACKTRACE at com.ibm.jsse2.a.a(a.java(Compiled Code))
4XESTACKTRACE at com.ibm.jsse2.a.a(a.java(Compiled Code))
4XESTACKTRACE at com.ibm.jsse2.by.a(by.java(Compiled Code))
4XESTACKTRACE at com.ibm.jsse2.by.m(by.java(Compiled Code))
4XESTACKTRACE at com.ibm.jsse2.by.startHandshake(by.java:68)
4XESTACKTRACE at org.apache.soap.util.net.SSLUtils.buildSSLSocket(Unknown Source)
4XESTACKTRACE at sun.reflect.GeneratedMethodAccessor217.invoke(Unknown Source)
4XESTACKTRACE at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
4XESTACKTRACE at java.lang.reflect.Method.invoke(Method.java(Compiled Code))
4XESTACKTRACE at org.apache.soap.util.net.HTTPUtils.buildSocket(Unknown Source)
4XESTACKTRACE at org.apache.soap.util.net.HTTPUtils.post(Unknown Source)
4XESTACKTRACE at org.apache.soap.transport.http.SOAPHTTPConnection.send(Unknown Source)
4XESTACKTRACE at org.apache.soap.rpc.Call.invoke(Unknown Source)
4XESTACKTRACE at com.ibm.ws.management.connector.soap.SOAPConnectorClient$2.run(SOAPConnectorClient.java:266)
4XESTACKTRACE at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.management.connector.soap.SOAPConnectorClient.reconnect(SOAPConnectorClient.java:259)
4XESTACKTRACE at com.ibm.ws.management.connector.soap.SOAPConnectorClient.(SOAPConnectorClient.java:190)
4XESTACKTRACE at sun.reflect.GeneratedConstructorAccessor168.newInstance(Unknown Source)
4XESTACKTRACE at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java(Compiled Code))
4XESTACKTRACE at java.lang.reflect.Constructor.newInstance(Constructor.java(Compiled Code))
4XESTACKTRACE at com.ibm.websphere.management.AdminClientFactory.createAdminClient(AdminClientFactory.java:313)
4XESTACKTRACE at com.ibm.ws.security.token.WSCredentialTokenMapper.getOpaqueTokenFromMBean(WSCredentialTokenMapper.java:1240)
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl$2.run(distContextManagerImpl.java:1754)
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl.runAs(distContextManagerImpl.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl.runAsSystem(distContextManagerImpl.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl.getOpaqueTokenFromMBean(distContextManagerImpl.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl.getOpaqueTokenFromCacheOrOriginatingServer(distContextManagerImpl.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl.login(distContextManagerImpl.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.auth.distContextManagerImpl.login(distContextManagerImpl.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.web.WebAuthenticator.validate(WebAuthenticator.java(Compiled Code))
4XESTACKTRACE at com.ibm.ws.security.web.WebAuthenticator.validateCookie(WebAuthenticator.java(Compiled Code))

Hi Subhro,

The Thread stacktrace you see indicates inter-WAS server slowdown.

The stacktrace is showing:

- A WebService call is executed from your IBM WAS container for cookie validation
- WAS auth. layer is attempting to obtain a token
- In order to obtain this token, it has to establish a connection with the MBean server (via AdminClientFactory)
- A remote SOAP call is attempted which is hanging, taking long time

From my perspective this looks more like a symptom of the actual production problem. For example, if JVM is trashing (excessive GC), remote JMX calls will start to hang and you will see "symptoms " like this stuck thread etc.

Can you please upload or send me (phcharbonneau@hotmail.com) the javacore file so I can have a closer look.

Thanks.
P-H

Hi P-H,
I sent you javacore files to your mail ID.
Will wait for your feedback on those.

Thanks for help in advance !!

Regards,
Subhro

Hi Subhro,

I found reference from IBM WAS of a known problem matching your stuck threads. This correlats with my initial analysis that the stuck threads and due to internal WAS calls attempting to acquire token/subjects during security/auth process.

Please review the link below and see if you can implement the propose workaround in your environment.

http://www-01.ibm.com/support/docview.wss?uid=swg1PM23482

Thanks.
P-H

Hi,

I am getting this error. how to over come this issue? Any specific problems?

at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(Unknown Source)
at com.sun.net.ssl.internal.ssl.InputRecord.readV3Record(Unknown Source)
at com.sun.net.ssl.internal.ssl.InputRecord.read(Unknown Source)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(Unknown Source)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at com.sun.net.ssl.internal.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.ChunkedInputStream.fastRead(Unknown Source)
at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at java.io.PushbackInputStream.read(Unknown Source)
at org.apache.crimson.parser.XmlReader$Utf8Reader.read(XmlReader.java:645)
at org.apache.crimson.parser.XmlReader.read(XmlReader.java:487)
at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1068)
at org.apache.crimson.parser.InputEntity.getNameChar(InputEntity.java:295)
at org.apache.crimson.parser.Parser2.nameCharString(Parser2.java:652)
at org.apache.crimson.parser.Parser2.maybeGetNameCacheEntry(Parser2.java:626)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1302)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1779)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1507)
at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:500)
at org.apache.crimson.parser.Parser2.parse(Parser2.java:305)
at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
at org.dom4j.io.SAXReader.read(SAXReader.java:465)
at org.dom4j.io.SAXReader.read(SAXReader.java:343)

Hi anonymous,

This is a stack trace showing that your application is waiting from the remote server via HTTPS.

Do you know which HTTP/HTTPS system your applicatin is invoking?

Can you please let me know how long you see your Java threads stuck in this condition?

Thanks.
P-H

Hi , I get following error everyday same time . This time some schedule jobs get started .I have analyzed thread dump 87% threads are waiting on condition . WebSphere shows thread hang message , after sometime it shows "WebContainer : 1" (00000030) was previously reported to be hung but has completed. " During this time (5-10 minutes) application become very slow. then it gets normal . Same application is deployed on same WebSphere server with different profile for different customer but without any issue. Please provide some fix. Thanks in advance!

[4/1/13 1:17:30:517 CDT] 00000027 ThreadMonitor W WSVR0605W: Thread "WebContainer : 1" (00000030) has been active for 698558 milliseconds and may be hung. There is/are 2 thread(s) in total in the server that may be hung.
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:155)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(Unknown Source)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(Unknown Source)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(Unknown Source)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(Unknown Source)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(Unknown Source)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecute(WSJdbcPreparedStatement.java:891)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.execute(WSJdbcPreparedStatement.java:596)
.......

[4/1/13 1:20:58:656 CDT] 0000002a ExceptionUtil E CNTR0020E: EJB threw an unexpected (non-declared) exception during invocation of method "assignBatchOfItemsToCycle" on bean "BeanId(certification-service#certification-cycleassignment-service-impl.jar#CycleAssignmentEjb, null)". Exception data: java.lang.NullPointerException

Regards,
Pheni

Hi Pheni,

Do you have the detail of this scheduled job? Looks like it is hammering your environment. The thread stack trace is showing that SQL's are executed against a SQLServer database and thread is waiting for the response from the DB...

What I'm suspecting is that this job is causing strain against your SQLServer database when this job is fired.

Can you please look at action items below:

- Engage your SQLServer DBA team and ask them to verify SQLServer health and impact during job execution
- Generate a few JVM Thread Dump snapshots from your WAS environment so we can look at the other threads while this job is running

Thanks.
P-H

Thanks P-H for your prompt reply . I never expected so fast reply from you. Hats off to you!. I will engage my DBA to verify SQL server health during job execution. However I am going to send you thread dump which I have taken during normal condition.
regards,
Pheni

Hi P-H,

I'm trying to upload data to a remote server using http client. Here is my threaddump.

at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.net.SocketOutputStream.write(Unknown Source)
at com.sun.net.ssl.internal.ssl.OutputRecord.writeBuffer(Unknown Source)

at com.sun.net.ssl.internal.ssl.OutputRecord.write(Unknown Source)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(Unknown Source
)
- locked <0x03612e20> (a java.lang.Object)
at com.sun.net.ssl.internal.ssl.AppOutputStream.write(Unknown Source)
- locked <0x03617000> (a com.sun.net.ssl.internal.ssl.AppOutputStream)
at java.io.BufferedOutputStream.write(Unknown Source)
- locked <0x03624f30> (a java.io.BufferedOutputStream)
at org.apache.commons.httpclient.methods.multipart.FilePart.sendData(Fil
ePart.java:223)
at org.apache.commons.httpclient.methods.multipart.Part.send(Part.java:3
12)
at org.apache.commons.httpclient.methods.multipart.Part.sendParts(Part.j
ava:385)
at org.apache.commons.httpclient.methods.multipart.MultipartRequestEntit
y.writeRequest(MultipartRequestEntity.java:164)
at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequ
estBody(EntityEnclosingMethod.java:499)
at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodB
ase.java:2114)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.j
ava:1096)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Htt
pMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMe
thodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav
a:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav


- senthil

Hi P-H,

Thanks for your prompt reply. My replies inline.

Hi anonymous,

This is a stack trace showing that your application is waiting from the remote server via HTTPS.

Do you know which HTTP/HTTPS system your applicatin is invoking?
"Our applications invoking VMware ESX/ESXi server".

Can you please let me know how long you see your Java threads stuck in this condition?

"Java Thread was hanged until restart the service. It will be working 10-30 mins.Again stuck"

Thanks.
P-H

Thanks,
Moorthy. M

Hi,

I'm experiencing a similar problem while reading a BLOB column of the database from an application. This does not happen when user try to use the application manually but this problem comes when we run the performance and volume testing scripts run. In this case many requestors are trying to access the databse with BLOB read operation.
Because of the hung threads, requestors lock is happening and application becomes unresponsive resulting into HTTP - 500 errors.

Please advise.

Below is the stack trace generated in Webspehere Application server logs. and Data base is Microsoft SQL server 2008.

Thread[WebContainer : 4,5,main]
java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:140)
at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:851)
at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:839)
at com.ibm.jsse2.a.a(a.java:148)
at com.ibm.jsse2.a.a(a.java:96)
at com.ibm.jsse2.tc.a(tc.java:302)
at com.ibm.jsse2.tc.a(tc.java:185)
at com.ibm.jsse2.e.read(e.java:2)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1652)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3694)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:5026)
at com.microsoft.sqlserver.jdbc.SQLServerResultSet$CursorFetchCommand.doExecute(SQLServerResultSet.java:4812)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4575)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1400)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:179)
at com.microsoft.sqlserver.jdbc.SQLServerResultSet.doServerFetch(SQLServerResultSet.java:4849)
at com.microsoft.sqlserver.jdbc.SQLServerResultSet.next(SQLServerResultSet.java:979)
at com.ibm.ws.rsadapter.jdbc.WSJdbcResultSet.next(WSJdbcResultSet.java:3120)
at com.pega.pegarules.data.internal.access.DatabaseResultSet.next(DatabaseResultSet.java:387)
at com.pega.pegarules.data.internal.access.PageDatabaseMapperImpl.createPageFromResultSet(PageDatabaseMapperImpl.java:461)
at com.pega.pegarules.data.internal.access.Opener.getResultFromResultSet(Opener.java:794)
at com.pega.pegarules.data.internal.access.Opener.openData(Opener.java:621)
at com.pega.pegarules.data.internal.access.DatabaseImpl.open(DatabaseImpl.java:3827)
at com.pega.pegarules.data.internal.access.DatabaseImpl.open(DatabaseImpl.java:3646)
at com.pega.pegarules.data.internal.access.DatabaseImpl.open(DatabaseImpl.java:3633)
at com.pegarules.generated.anzcmfw_anzcasemanagement.DidCurrentOperatorProcessCase01_16_01(anzcmfw_anzcasemanagement.java:121)
at com.pegarules.generated.anzcmfw_anzcasemanagement.DidCurrentOperatorProcessCase(anzcmfw_anzcasemanagement.java:172)
at com.pegarules.generated.when.ra_when_canoperatorverifypayment_ce4e0259d28187dec6bffed6bf26c639.evaluate(ra_when_canoperatorverifypayment_ce4e0259d28187dec6bffed6bf26c639.java:48)
at com.pega.pegarules.session.internal.mgmt.Executable.evaluateWhen(Executable.java:4178)
at com.pegarules.generated.pega_rules_expressionevaluators.when06_01_01(pega_rules_expressionevaluators.java:2550)
at com.pegarules.generated.pega_rules_expressionevaluators.when(pega_rules_expressionevaluators.java:2566)


Thanks,
Manjunatha HV

Hi Senthil,

Do you know how big is your data payload that you are trying to upload to the server? Thread is in socket.write(), the means it is still uploading to the remote server..

Regards,
P-H

Hi Moorthy,

I recommend that you investigate on the Mware ESX/ESXi server side, socket.read() condition indicates the thread is sitting there and waiting for the response from the remote server.

Also, try to figure out the side of the data payload and see if this could be a factor. However, the fact these threads are stuck, indicates the remote server side is not responding.

Regards,
P-H

Hi Manjunatha,

That is correct, the thread stack trace indicates you are waiting from the SQL Server 2008 database to reply back. The fact that you are getting this stuck thread under concurrent load indicates you are likely dealing with a concurrency problem with this particular table/BLOB.

Did you deep dive with your DBA team in to this problem?

What was the overall utilization of the SQLServer database during this load test?

How many data rows are you trying to return from this SQL >> com.pegarules.generated.anzcmfw_anzcasemanagement.DidCurrentOperatorProcessCase01_16_01 >> DatabaseImpl.open() ?

Regards,
P-H

Hi P-H,

Thanks for the quick response.

I have seen the DB logs and observed too many Printing deadlock information messages.

Earlier, We had our APP server and DB server in the same domain and we did not face this issue while performance test scripts execution.

Recently we have moved our DB server to the different domain from the App server and we encountered this problem.

DBA's says CPU utilization is normal while run of the scripts.

com.pegarules.generated.anzcmfw_anzcasemanagement.DidCurrentOperatorProcessCase01_16_01 >> DatabaseImpl.open() --> this is a java code automatically geneates the SQL query to open a BLOB column of a single row. (BLOB cloumn contains the XML data).

I have also engaged network team to look into the isuse, considering problem could with the network because of the cross domain. Still they are analysing.

Please advise how i can move on with this. It would be great if you could provide me the pointers regarding the next steps on this issue.


Thanks,
Manjunatha HV





Hi we are facing following Thread Stuck issue,
could you please provide us fix for this issue?

"[STUCK] ExecuteThread: '32' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE native

java.net.SocketInputStream.socketRead0(Native Method)

java.net.SocketInputStream.read(SocketInputStream.java:129)

com.sun.mail.util.TraceInputStream.read(TraceInputStream.java:110)

java.io.BufferedInputStream.fill(BufferedInputStream.java:218)

java.io.BufferedInputStream.read(BufferedInputStream.java:237)

com.sun.mail.util.LineInputStream.readLine(LineInputStream.java:88)

com.sun.mail.smtp.SMTPTransport.readServerResponse(SMTPTransport.java:1589)

com.sun.mail.smtp.SMTPTransport.openServer(SMTPTransport.java:1369)

com.sun.mail.smtp.SMTPTransport.protocolConnect(SMTPTransport.java:412)

javax.mail.Service.connect(Service.java:288)

javax.mail.Service.connect(Service.java:169)

javax.mail.Service.connect(Service.java:118)

javax.mail.Transport.send0(Transport.java:188)

javax.mail.Transport.send(Transport.java:118)

com.thortech.xl.dataobj.util.tcEmailNotificationUtil.sendEmail(Unknown Source)

It would be gr8 help if you can inform me about this error. Not able to understand the issue. Also when the thread is getting stuck I can see a lot of connection session which is not possible for my application.

java.sql.SQLRecoverableException: No more data to read from socket
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:263)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:271)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:445)
at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1059)
at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1016)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:589)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:194)
at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:953)
at oracle.jdbc.driver.T4CPreparedStatement.executeMaybeDescribe(T4CPreparedStatement.java:897)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1187)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3386)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3430)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:128)

Hi anonymous,

This could be a symptom of what we call a "stale" JDBC Connection.

Can you please verify in Weblogic, look at your affected JDBC Data Source and determine if you have enabled "Test Connection on Reserve"

Regards,
P-H

"Hi we are facing following Thread Stuck issue,
could you please provide us fix for this issue?"

Hi anonymous,

This STUCK thread is clearly showing that your application was unable to physical connect with this SMTP mail host at that time. Please investigate if any planned work or network issue was happening around that time.

I also recommend to revisit the timeout values implemented with your client and ensure connect timeout is properly implemented.

Regards,
P-H

Hi,

Your article is excellent. I am having one problem related to socket.
Our CPU usage is high but having very less traffic and observed that because the following trace system usage are reaching 90-100%.

Can you suggest us what would be the issue.

The following is the trace


INFO | jvm 1 | main | 2013/07/26 13:20:25.053 | "TP-Processor14" daemon prio=10 tid=0x00000000597c0800 nid=0x3414 runnable [0x00000000458d6000]
INFO | jvm 1 | main | 2013/07/26 13:20:25.053 | java.lang.Thread.State: RUNNABLE
INFO | jvm 1 | main | 2013/07/26 13:20:25.053 | at java.net.SocketInputStream.socketRead0(Native Method)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at java.net.SocketInputStream.read(SocketInputStream.java:129)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | - locked <0x0000000621d3f538> (a java.io.BufferedInputStream)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at org.apache.jk.common.ChannelSocket.read(ChannelSocket.java:628)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at org.apache.jk.common.ChannelSocket.receive(ChannelSocket.java:566)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:693)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:898)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
INFO | jvm 1 | main | 2013/07/26 13:20:25.054 | at java.lang.Thread.run(Thread.java:662)


Regards,
Raj.

P.H.
I have a similar issue, where my IBM MQ client code hangs on java.net.SocketInputStream.socketRead0.
Is there a way to put a timeout on that using some global settings such as system property?
Would that (global) be wise?

Regards,
Erez

Hi Raj,

Actually this Thread is waiting for a request to come in. This is normal and not a problem. I'm more concerned about your high CPU utilization. Did you perform a CPU per Thread analysis? You can following the guide as per below:

http://javaeesupportpatterns.blogspot.com/2012/05/java-high-cpu-troubleshooting-guide.html

Regards,
P-H

Hi Erez,

Can you please share the stack trace showing the thread hanging from MQ client calls?

Thanks.
P-H

Hi Pierre,

I really like your article http://javaeesupportpatterns.blogspot.com/2011/04/javanetsocketinputstreamsocketread0.html
Very well informed, explained, well-articulated and correlate with the production incidents.

We had few production outage in recent past due to not setting socket read timeout. We are using JBOSS EAP 4.3 CP10, and commons-httpclient-3.0.1.jar API for http calls.
We are now planning to set socket read timeout value with something like
// Set the timeout. Http posts wait 110 seconds (industry standard is 100) before
// declaring a timeout.
HttpClientParams cp = new HttpClientParams();
cp.setSoTimeout(110000);
httpClient.setParams(cp);

Now my question is would above setting timeout the threads stuck at native socket call. I have read a post which says this will not timeout threads stuck at native socket call.
I would appreciate if you can confirm the behavior?

Thanks
Rishi Aggarwal


http://stackoverflow.com/questions/1925154/how-to-make-the-http-connection-timeout-disconnected-after-a-time-interval
There are 2 timeouts involved in HTTPClient, try to set both,
client.getHttpConnectionManager().
getParams().setConnectionTimeout(5000);
client.getHttpConnectionManager().
getParams().setSoTimeout(5000);
However, the values will be ignored if the connection is stuck in a native socket call. So you might have to run the request in a different thread so you can time it out. See my answer to this question on how to do that,

"http-0.0.0.0-8080-194" daemon prio=10 tid=0x00002b6a14f33800 nid=0x7a85 runnable [0x00002b6a227a3000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked <0x00002aab0d357bb0> (a java.io.BufferedInputStream)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:77)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:105)
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1115)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1832)
at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1590)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:995)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:397)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:170)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346)
at com.deploy.atao.integration.http.HttpHandler.preReleaseUndeprecatedSendData(HttpHandler.java:338)
at com.deploy.atao.integration.http.HttpHandler.sendData(HttpHandler.java:191)
at com.deploy.atao.integration.http.HttpHandler.sendData(HttpHandler.java:102)

On my production env. had below issue and java core dump showing below information

Thread Name
WebContainer : 31
State
Runnable
Monitor
Owns Monitor Lock on java/lang/String@0x00000000896083E0/0x00000000896083EC
Java Stack
at java/net/SocketInputStream.socketRead0(Native Method)
at java/net/SocketInputStream.read(SocketInputStream.java:141(Compiled Code))
at com/ibm/jsse2/a.a(a.java:245(Compiled Code))
at com/ibm/jsse2/a.a(a.java:227(Compiled Code))
at com/ibm/jsse2/sc.a(sc.java:172(Compiled Code))
at com/ibm/jsse2/sc.a(sc.java:394(Compiled Code))
at com/ibm/jsse2/e.read(e.java:18(Compiled Code))
at java/io/BufferedInputStream.fill(BufferedInputStream.java:230(Compiled Code))
at java/io/BufferedInputStream.read1(BufferedInputStream.java:270(Compiled Code))
at java/io/BufferedInputStream.read(BufferedInputStream.java:329(Compiled Code))
at sun/net/www/http/HttpClient.parseHTTPHeader(HttpClient.java:688(Compiled Code))
at sun/net/www/http/HttpClient.parseHTTP(HttpClient.java:633(Compiled Code))
at sun/net/www/protocol/http/HttpURLConnection.getInputStream(HttpURLConnection.java:1067(Compiled Code))
at com/ibm/net/ssl/www2/protocol/https/b.getInputStream(b.java:67(Compiled Code))
at org/apache/fop/apps/FOURIResolver.resolve(FOURIResolver.java:286(Compiled Code))
at org/apache/fop/apps/FopFactory.resolveURI(FopFactory.java:763(Compiled Code))
at org/apache/fop/apps/FOUserAgent.resolveURI(FOUserAgent.java:415(Compiled Code))
at org/apache/fop/apps/FOUserAgent.resolveURI(FOUserAgent.java:398(Compiled Code))
at org/apache/fop/apps/FOUserAgent$1.resolveURI(FOUserAgent.java:141(Compiled Code))
at org/apache/xmlgraphics/image/loader/impl/AbstractImageSessionContext.newSource(AbstractImageSessionContext.java:78(Compiled Code))
at org/apache/xmlgraphics/image/loader/impl/AbstractImageSessionContext.needSource(AbstractImageSessionContext.java:280(Compiled Code))
at org/apache/xmlgraphics/image/loader/cache/ImageCache.needImageInfo(ImageCache.java:120(Compiled Code))
at org/apache/xmlgraphics/image/loader/ImageManager.getImageInfo(ImageManager.java:122(Compiled Code))
at org/apache/fop/fo/flow/ExternalGraphic.bind(ExternalGraphic.java:73(Compiled Code))
at org/apache/fop/fo/FObj.processNode(FObj.java:123(Compiled Code))
at org/apache/fop/fo/FOTreeBuilder$MainFOHandler.startElement(FOTreeBuilder.java:282(Compiled Code))
at org/apache/fop/fo/FOTreeBuilder.startElement(FOTreeBuilder.java:171(Compiled Code))
at org/apache/xerces/parsers/AbstractSAXParser.startElement(Bytecode PC:299(Compiled Code))
at org/apache/xerces/parsers/AbstractXMLDocumentParser.emptyElement(Bytecode PC:4(Compiled Code))
at org/apache/xerces/impl/XMLNSDocumentScannerImpl.scanStartElement(Bytecode PC:4(Compiled Code))
at org/apache/xerces/impl/XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Bytecode PC:416(Compiled Code))
at org/apache/xerces/impl/XMLDocumentFragmentScannerImpl.scanDocument(Bytecode PC:26(Compiled Code))
at org/apache/xerces/parsers/XML11Configuration.parse(Bytecode PC:26(Compiled Code))
at org/apache/xerces/parsers/XML11Configuration.parse(Bytecode PC:29(Compiled Code))
at org/apache/xerces/parsers/XMLParser.parse(Bytecode PC:11(Compiled Code))
at org/apache/xerces/parsers/AbstractSAXParser.parse(Bytecode PC:43)
at org/apache/xerces/jaxp/SAXParserImpl$JAXPSAXParser.parse(Bytecode PC:53(Compiled Code))

Thanks,
Subash

Did the 2nd part for timeout setups ever released? If so, please provide the link here.

I think this is the followup post http://javaeesupportpatterns.blogspot.com/2011/08/javanetsocketinputstreamsocketread0.html

Hi
I am facing the below problem from a long time in clustered environment, there are stuck threads coming up frequently and the server is going to warning state until GC is done. But again it is showing up within half an hour. Is there any permanent solution for this?
"[STUCK] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE native

java.net.SocketOutputStream.socketWrite0(Native Method)

java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

java.net.SocketOutputStream.write(SocketOutputStream.java:124)

weblogic.servlet.internal.ServletOutputStreamImpl.commit(ServletOutputStreamImpl.java:465)

weblogic.servlet.internal.ServletResponseImpl.send(ServletResponseImpl.java:1495)

weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1441)

weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)

weblogic.work.ExecuteThread.run(ExecuteThread.java:173)

"[STUCK] ExecuteThread: '18' for queue: 'weblogic.kernel.Default (self-tuning)'" RUNNABLE native

java.net.SocketOutputStream.socketWrite0(Native Method)

java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)

java.net.SocketOutputStream.write(SocketOutputStream.java:124)

weblogic.servlet.internal.ServletOutputStreamImpl.commit(ServletOutputStreamImpl.java:465)

weblogic.servlet.internal.ServletResponseImpl.send(ServletResponseImpl.java:1495)

weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1441)

weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)

weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
These are the stuck threads i had taken from the thread dump

Thank you Sir. This article was great help to troubleshoot the similar issue in our production environment.
Many many thanks for sharing such a great knowledge.
RJ

Hi,

A WebLogic STUCK Thread trying to do a Socket.write() operation means Weblogic is unable to write the HTTP response payload back to the client.

This normally indicates stale sockets/connections and could be a symptom that the server/application took long time to process the request. It can also indicates the client itself is in trouble, thus reading the incoming response very slowly.

Please check the application response time first and determine if you are dealing with very large response time from time to time.

Thanks
P-H

Post a Comment