Skip to Main Content

Request Is Getting Duplicated In Target System Even When there are no Retry within SOA 12c

Abhinav MittalJul 11 2022 — edited Jul 11 2022

Hi All
I am facing an issue where the request is getting duplicated under Heavy loads, but, there are no indications of a timeout or a retry within Fusion ( SOA 12c ) which could confirm that the request was retried
Scenario :
System A -->MQ --> SOA Requestor Service --> Common Queue --> SOA Common Service <--> Target System ( creates the order and returns the Order # as a Sync Response )
Landscape : SOA is a 3 Node Cluster, Version 12.2.1, OHS is configured
Service Call to Target System is through a Load balancer ( NetScaler )
Issue Audit Trail doesn't indicate that SOA Common Service invoked the SOAP HTTP(s) of the endpoint twice, but, during Heavy Load, we are able to see multiple Orders Created for the same System A number
I can see Connection Reset in Fusion logs( diagnostic logs - Soket.ConnectionReset) which is a RemoteFault, but, other than that there are no indicators to denote the previous failed request request still got completed.
Request --> Target System ( Timeout, returns the error--> SOA retries, 1st request still completes successfully )
We checked with NetScaler team and they haven't configured AppQoE needed to perform Retry for TCP/IP Connection Resets or any other errors.
Target System indicates that the request was received twice in some cases and more than twice in others.
So, whenever SOA receives the RemoteFault, the service has a global fault policy to retry the transaction once. Therefore, the orders gets triggered again. But, in some cases, there are no Remote Faults and even then the request gets processed multiple times.
So, i have the following 2 issues/questions :
When Fusion receives a remote Fault, what happens to the request that resulted in the Error? How to prevent that request from still processing and creating an Order ? Since the Business requirement dictates creating the Order, we have to keep a minimum of 1 retry on Remote Faults to make sure we don't fail on the first attempt. This works in all cases except Heavy Load where it usually results in Duplicates
When Fusion request doesn't time out ( Time out defined as 120 seconds ), Fusion receives an Order number in the Response, but, the target System logs show multiple requests and also results in duplicates
Now, i am not sure what and where to look at as this issue is a Critical PROD issue. Oracle Support is involved and they think it to be a Load balancer issue which LB team indicates isn't theirs as NetScaler has no AppQoE policy defined or enabled required for retries.
Can someone please help where the issue could be ? Where should i look to help fix this issue ?

Post Details
Added on Jul 11 2022