Skip to Main Content

GoldenGate

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

OGG to Flume: high Lag at Chkpt

TXUJun 15 2017 — edited Jun 19 2017

Hi Experts,

Our setup:

database:        oracle 11.2.0.4

goldengate:    12.2.0.1.160823

linux:               2.6.32.43

JRE:                1.8.0_121

one extract process, classic capture mode

12 replicat processes, through ogg big data adapter, to flume exit

no remote trail used. extract and adapter processes locate in the same box.

Problem:

After deployment, the setup runs smoothly for days or so, and then randomly we found lag, reported by "info all" on ogg console, and also by our client program.

The lag seems quite random, not time nor workload related. some times it will catch up in a few hours, some times it gets worse and  lag reported can reach 12 hours.

output of capture and adapter processes look like below:

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                          

EXTRACT     RUNNING     EXT         00:00:00      00:00:01   

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                          

REPLICAT    RUNNING     REP01       13:57:51      00:00:07   

REPLICAT    RUNNING     REP02       13:35:07      00:00:03   

REPLICAT    RUNNING     REP03       13:45:37      00:00:00   

REPLICAT    RUNNING     REP04       15:00:40      00:00:04   

REPLICAT    RUNNING     REP05       13:15:52      00:00:09   

REPLICAT    RUNNING     REP06       14:09:47      00:00:09   

REPLICAT    RUNNING     REP07       14:17:53      00:00:07   

REPLICAT    RUNNING     REP08       14:41:56      00:00:09   

REPLICAT    RUNNING     REP09       15:18:12      00:00:06   

REPLICAT    RUNNING     REP10       14:40:01      00:00:03   

REPLICAT    RUNNING     REP11       14:30:23      00:00:05   

REPLICAT    RUNNING     REP12       13:30:47      00:00:09   

The server running database and ogg is quite powerful, and idle ( both cpu and io util are lower than 5%), we can't find any bottleneck in that server.

and we have several similar setups deployed, this one is the only one on which we hit the problem, and the others are running just fine. We also tried to migrate this setup to another server, but no luck. But this at least ruled out environment problem.

We looked into ggserror.log and dirrpt directory, no error nor warnning was found.

We also tried troubleshooting methods described in mysupport. We tried different parameters like grouptransops, maxtransops, batchsql, etc. and no luck.

This has haunted us for months, can anyone give a clue?

Thanks for any reply.

- Todd

This post has been answered by K.Gan on Jun 19 2017
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 17 2017
Added on Jun 15 2017
8 comments
2,776 views