I write a piece of java code to create 500K small files (average 40K each) on CentOS. The original code is like this:
package MyTest;
import java.io.*;
public class SimpleWriter {
public static void main(String[] args) {
String dir = args[0];
int fileCount = Integer.parseInt(args[1]);
String content="@#$% SDBSDGSDF ASGSDFFSAGDHFSDSAWE^@$^HNFSGQW%#@&$%^J#%@#^$#UHRGSDSDNDFE$T#@$UERDFASGWQR!@%!@^$#@YEGEQW%!@%!!GSDHWET!^";
StringBuilder sb = new StringBuilder();
int count = 40 * 1024 / content.length();
int remainder = (40 * 1024) % content.length();
for (int i=0; i < count; i++)
{
sb.append(content);
}
if (remainder > 0)
{
sb.append(content.substring(0, remainder));
}
byte[] buf = sb.toString().getBytes();
for (int j=0; j < fileCount; j++)
{
String path = String.format("%s%sTestFile_%d.txt", dir, File.separator, j);
try{
BufferedOutputStream fs = new BufferedOutputStream(new FileOutputStream(path));
fs.write(buf);
fs.close();
}
catch(FileNotFoundException fe)
{
System.out.printf("Hit filenot found exception %s", fe.getMessage());
}
catch(IOException ie)
{
System.out.printf("Hit IO exception %s", ie.getMessage());
}
}
}
}
You can run this by issue following command:
java -jar SimpleWriter.jar my_test_dir 500000
I thought this is a simple code, but then I realize that this code is using up to 14G of memory. I know that because when I use free -m to check the memory, the free memory kept dropping, until my 15G memory VM only had 70 MB free memory left. I compiled this using Eclipse, and I compile this against JDK 1.6 and then JDK1.7. The result is the same. The funny thing is that, if I comment out fs.write(), just open and close the stream, the memory stabilized at certain point. Once I put fs.write() back, the memory allocation just go wild. 500K 40KB files is about 20G. It seems Java's stream writer never deallocate its buffer during the operation.
I once thought java GC does not have time to clean. But this make no sense since I closed the file stream for every file. I even transfer my code into C#, and running under windows, the same code producing 500K 40KB files with memory stable at certain point, not taking 14G as under CentOS. At least C#'s behavior is what I expected, but I could not believe Java perform this way. I asked my colleague who were experienced in java. They could not see anything wrong in code, but could not explain why this happened. And they admit nobody had tried to create 500K file in a loop without stop.
I also searched online and everybody says that the only thing need to pay attention to, is close the stream, which I did.
Can anyone help me to figure out what's wrong?
Can anybody also try this and tell me what you see?
BTW, some people in online community tried the code on Windows and it seemed to worked fine. I didn't tried it on windows. I only tried in Linux as I thought that where people use Java for. So, it seems this issue happened on Linux).
I also did the following to limit the JVM heap, but it take no effects
java -Xmx2048m -jar SimpleWriter.jar my_test_dir 500000