JPA:Handling of excessive numbers of records/entities?
843830Oct 21 2009 — edited Dec 30 2009Hi
Scenario:
I am trying to insert a large number of entities (millions) into a database, preferably within the same transaction, in an EJB3 stateless session bean using JPA with Hibernate (or EclipseLink). Obviously I cannot have all of them in memory in one go. Therefore I am trying to process them in batches of say 100'000 at a time and I am making sure that I keep no references to already-processed entities in my code before processing the next batch, so the memory should be garbage-collected and reused for every new batch.
Problem:
The process is busy and hangs using the maximum memory allocated (however I do not get any out-of-memory exception), nothing happens...
Assumption:
The entity manager / persistence context is internally keeping already-processed entities in memory when I try to process subsequent batches, so the system runs out of memory, as I am simply using standard container-managed transactions, i.e. the commit only occurs after all entities have been processed.
Unsuccessful attempt to solve problem:
I have tried to use the EntityManager's flush() method after processing one batch, in the hope that any memory used by the persistence context should be freed for the next batch, but the symptoms remain as mentioned above.
BTW, the batch-size is not the problem, I can easily process a single batch.
Questions:
Not tried yet:I could annotate the method that processes a single batch with @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) and call it in a new EJB instance, or I could explicitly use bean/application-managed transactions with UserTransaction and a commit() after processing each batch. Are these the correct/best possible solutions? Why did flush() not work? How do other people handle such scenarios? I could imagine that this was quite a common scenario, so I would expect some not-too-difficult resolution...
Extra question:
In case of an error (also a logical error that I detect in the data, not only a system exception/error), I want to roll back everything, therefore I was hoping to avoid using several UserTransactions that I would have to manually undo. Currently I use setRollbackOnly() of a SessionContext to revert. Does the entity manager keep references of ALL entities of the current transaction in memory to be able to do this (that may explain why flush() did not solve my problem)? or does the database manage this?
Thanks in advance for any tips.
Alex.
Edited by: alex00 on Oct 21, 2009 9:21 AM:Added "Extra question"
Edited by: alex00 on Oct 22, 2009 12:12 AM:I now write that I have not tried @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) yet (previously I erroneously wrote that I had already tested this option)