Determining the max allowable buffer size
843798Jan 11 2007 — edited Jan 14 2007I am working on a program which works with very large amounts of data. Each job that is processed requires the program to read through as many as several hundred files, each of which has tens of thousands of values. The result of the operation is to take the data values in these files, rearrange them, and output the new data to a single file. The problem is with the new order of the data. The first value in each of the input files (all several hundred of them) must be output first, then the 2nd value from each file, and so on. At first I tried doing this all in memory, but for large datasets, I get OutOfMemoryErrors. I then rewrote the program to first output to a temporary binary file in the correct order. However, this is a lot slower. As it processes one input file, it must skip all around in the output file, and do this several hundred times in a 60+ MB file.
I could tell it to increase the heap size, but there is a limit to how much it will let me give it. I'd like to design it in a way so that I could allocate as big of a memory buffer as I can, read through all the files and put as much data in as it can into the buffer, output the block of data to the output file, and then run through everything again for the next block.
So my question is, how do I determine the biggest possible byte buffer I can allocate, yet still have some memory left over for the other small allocations that will need to be done?