Skip to Main Content

Analytics Software

Announcement

For appeals, questions and feedback, please email oracle-forums_moderators_us@oracle.com

ODI 11g socket write timeouts and hanging sessions

smomotiuJan 2 2019

Hi Community,

We have been experiencing an intermittent and novel issue with our ODI sessions/agents.

  • We have noted that our ODI sessions are becoming "stale" – specifically we have noted that many sessions complete within the ODI_WORK repository on the DB-side but the agents lose connections to the repository and the status is never updated (at times the sessions will hang despite the parent process eliciting a start/end time). We believe that the db server connection may timeout due to large batch sizes or other timeout settings on the db-side.
  • When these sessions become stale – we have to run a clean stale process to end the sessions. If this is not done ASAP the stale sessions consume memory resources, leading often to agents crashing due to Java Heap or PermGen issues. The configuration setting has helped – but this only delays the inevitable if you will if the sessions hangs for too long of a time.

In other instances the ODI ES agent recognizes it has lost connectivity as connections are interrupted – often a Socket Write Error is logged – and the agent flags the Session Execution In Error.

1. We have a stand-alone agent that we have made odiparams.bat changes to improve - but why in some instances do sessions become stale and in some cases the agents drop connections?

We have made as many configuration adjustments outside of the fetch and batch array to help remedy this issue on the ODI-side and would like to discuss options to help potentially prevent a timeout or connection on the DB-side. Given that the SQL server does not provide High Availability to our stand-alone agent – once we lose connectivity the agent is unable to automatically reconnect.

Our work repository rests on a SQL server database.

Are there any SQL server parameters we can adjust to prevent the db from dropping jdbc connections from the remote agent (sits on a different server)?

There seem to be a relationship with these issues as heap and permgen only delay the inevitable if the connections keep getting dropped and error or stale sessions are created because agents crash and are recognized after running the clean stale sessions process manually.

Is there a way in SQL server db to actually see a hung/stale process? I have noted sleeping sessions that I believe are simply pooled connections?

DBs and infrastructure have not noted any network or outage issues - so perhaps it is large batch sizes that wreak havoc and cause downstream issues that can be controlled with some db connection pooling/timeout? I am not told of any firewall restrictions either.

Thanks Community,

Sam

Comments