how to store a huge string list with ArrayList ?
807580Jan 13 2010 — edited Jan 13 2010I am running a java program that has to read in about 250 million lines of string and hold them in memory.
each string contains 3 characters, I did the math and in theory the memory requirement of holding all these strings will
be is about (250M x 3 x 16bit =) 1.5GBytes.
I first attempted to sequentially read in each line as a String and put them in an ArrayList, but I am surprised
that for about every 10 million record it consumes 450MB of heap, and that I can get no near to 50 millions of records.
I know in Java String have quite some overheads, so I also try to make my own "String" class (wrapped around char[] )
but this still, costs about 320MB of heap every 10 million record.
I started to suspect my use of ArrayList but I saw somewhere that mentioned ArrayList have very little overhead,
and if this is true, I can't see any way of reading in such huge amount of data.
A few days ago I managed to use a super computer that has a total 32G of ram to test the program, with a massive
java -Xmx28000m Temp.java 250000000
it was aborted at about 100 million records and the log shows that the Maximum Vir mem used is 27G !
I may not totally understand how Java manages memory, but i was surprised about the overhead...
So here I am asking if someone knows a solution to:
store the above mentioned data within 24G (preferable even less than 12G, as I don't always have access to that super computer)
And yes, I have to hold them all once in memory, I know very few people do that but I will need to generate a Minimum perfect hash
for all the keys (Strings) so every single key has to be in memory to process. oh yes, no two Strings in my case are equal.
Thanks in advance!
p.s. I attached my test code here if it helps:
import java.util.ArrayList;
public class Temp {
public static void main(String args[]){
int n = Integer.parseInt(args[0]);
int k=100000;
ArrayList<String> list = new ArrayList<String>(n);
char[] array = new char[3];
for(int i=0;i<n;i++){
array[0] = 'a';
array[1] = 'b';
array[2] = 'c';
list.add(new String(array));
if(i%k == 0){
long m1 = Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
m1 /= (1024*1024);
System.out.println("memory used when "+i+" key inserted ="+m1+" MBytes.");
}
}
}
}
Edited by: W.Liu on Jan 13, 2010 4:55 AM
Edited by: W.Liu on Jan 13, 2010 4:55 AM
Edited by: W.Liu on Jan 13, 2010 4:56 AM