Hello,
I have spent two 10 hours days on this problem with countless search permutations and peering into everything from kernel source to java source. Unfortunately, I am not coming up with useful results.
Overview:
I have a very large java application (Apache solr) running on a server with 12GB memory. Without going into too much detail, this application at a point will use a Runtime exec call to fire off a bash shell script. Over time, however, I end up with an IOException "Cannot allocate memory." (More info and stack trace below.)
That application aside I have been able to reproduce this by varying the -Xms and -Xmx jvm parameters with confusing results using the simple program below:
//Use with various -Xms and -Xmx arguments to produce
//IOException on call to Runtime.exec()
//
import java.io.*;
public class DoRuntime {
public static void main(String args[]) throws IOException {
Runtime runtime = Runtime.getRuntime();
long total = runtime.totalMemory();
long max = runtime.maxMemory();
long free = runtime.freeMemory();
System.out.println("total: " + total);
System.out.println("max: " + max);
System.out.println("free: " + free);
Process process = runtime.exec("/bin/ls");
InputStream is = process.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
}
Here are some sample runs:
-----
RUN1
rwoodrum@util1wad:~/tmp$ java DoRuntime
total: 188743680
max: 954466304
free: 187759312
DoRuntime.class
DoRuntime.java
rwoodrum@util1wad:~/tmp$
RUN2
rwoodrum@util1wad:~/tmp$ java -Xms10g -Xmx10g DoRuntime
total: 10290069504
max: 10290069504
free: 10236381080
Exception in thread "main" java.io.IOException: java.io.IOException: Cannot allocate memory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
at java.lang.Runtime.exec(Runtime.java:591)
at java.lang.Runtime.exec(Runtime.java:429)
at java.lang.Runtime.exec(Runtime.java:326)
at DoRuntime.main(DoRuntime.java:14)
rwoodrum@util1wad:~/tmp$
RUN3
rwoodrum@util1wad:~/tmp$ java -Xms10g -Xmx11g DoRuntime
total: 10290069504
max: 10498867200
free: 10236381080
DoRuntime.class
DoRuntime.java
rwoodrum@util1wad:~/tmp$
-----
(FWIW, I get the same results replacing /bin/ls with something as lightweight as /bin/true.)
As can be seen in the above output of RUN2, I have set a fixed heap size of 10GB. The jvm seems content with that heap size, but when it goes to exec the child process it becomes rather unhappy. From this I would perhaps conclude that ultimately inside the call to forkAndExec (libjava.so) the call to fork() failed with an ENOMEM. (As a side note, an strace on the process doesn't actually ever show a fork, only a call to clone - which utilizes a fork eventually - more on clone below.)
The results of RUN3, however, seem to imply that the problem is within the jvm itself. In RUN3, I set the initial heap allocation to 10GB, as in RUN2, but I set the maximum at 11GB. One would think (or at least I would) that if the fork in RUN2 failed that it would certainly fail in the case of RUN3. It doesn't.
The manpage of clone indicates that the child process will "share parts of its execution context with the calling process". Indeed, if I'm reading it correctly, it indicates that the child process stack (among other things) is housed in the parent process address space.
Could it be that by setting this very large, fixed heap size as in RUN2, there is no room for the jvm to "maneuver" and properly handle the effort of the Runtime.exec? The fact that RUN3 is successful is what has made me think something like this, but I am by far an expert on how the jvm would handle that sort of thing.
In our production setup of this application, I adjusted the Xmx setting to be something larger than the Xms setting. This worked for a period of time, but eventually produced the same results. I suspect over time the heap simply increased toward the max and again the jvm couldn't do its thing when it came time to RunTime.exec. The fact that this happens on the trivial Runtime.exec program would seem to rule out a memory leak of the production application.
Ultimately something fishy is going on with memory, but with seemingly contradictory results, I am at a loss of where the problem lies.
If I have omittied any potentially relevant information, please let me know. Any thoughts are greatly appreciated.
Some basic system information:
-----
rwoodrum@util1wad:~/tmp$ free
total used free shared buffers cached
Mem: 12304936 12200376 104560 0 337276 7082508
-/+ buffers/cache: 4780592 7524344
Swap: 2097144 32 2097112
rwoodrum@util1wad:~/tmp$ uname -a
Linux util1wad 2.6.18-4-amd64 #1 SMP Mon Mar 26 11:36:53 CEST 2007 x86_64 GNU/Linux
rwoodrum@util1wad:~/tmp$ java -version
java version "1.5.0_10"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_10-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_10-b03, mixed mode)
-ryan woodrum