HashMap memory usage
807588Sep 27 2007 — edited Feb 11 2009Hi,
I am implementing an indexer / compressor for plain text files (text, query log and urls files). The basic skeleton of the indexer is the Huffman codec, plus some various addon to boost performance.
Huffman is used on words (Huffword); the first operation I execute is the complete scan of the file to collect term frequencies, which I will use to generate the Huffman model. Frequencies are stored in a HashMap<String, Integer>.
The main problem is the HashMap dimension, I quickly run out of memory.
In a query log of 300MB I collect something around 1700000 String-Integer pairs; is it possible that I need an 512MB-sized heap?