GoldenGate monitoring.
In our environment we have completely dumped streams and useless dataguard. We have been very very pleased with goldengate software and it really does work well (as advertised). However to get monitoring or realtime stats on goldengate we would need to purchase the dashboard. This software is extremely expensive. So I am wondering if anyone has come up with a good way to monitor your goldengate processes?
I am currently working on creating a python script to pump out to Grid Control but as usual Grid Control is not working correctly. So maybe I will send it out to Nagios. Anyway I am not a python programmer but I like the language so if you can improve upon this script by all means let it fly.
Requirements
*goldengate
*Python 2.3-2.4 (tested on 2.3 and 2.4)
*Linux might work on windows never tried never will
Note:Python is very dependent on indentation so it looks like this forum doesn't support whitespace.
#!/usr/bin/python
####################
#Author: Some Guy
#Rev: Initial
#Date: June 24 2010
####################
#!/usr/bin/python
import sys
import getopt
import os
import re
import popen2
def run_ggsci(obey):
pipe = popen2.Popen4("./ggsci")
pid = pipe.pid
pipe.tochild.write(obey + "\n")
pipe.tochild.close() # programm will get EOF on STDIN
return pipe.fromchild.readlines()
if len(sys.argv) < 3:
print 'Usage ',sys.argv[0], ' lag|checkpoint|broken db_instance_name rep|ext'
sys.exit(2)
#Set the oracle_sid env
#Check the oratab for it
oratab=open('/etc/oratab','rb')
db = re.compile(sys.argv[2],re.IGNORECASE)
found = 'N'
for instance in oratab:
if db.match(instance):
os.putenv('ORACLE_SID',sys.argv[2])
found = 'Y'
#if db instance not found exit. Make sure your rac instances are in your oratab
if found == 'N':
print "Error database instance ",sys.argv[2]," not found in oratab."
print 'Usage ',sys.argv[0], ' lag|checkpoint|broken db_instance_name rep|ext'
sys.exit(3)
if sys.argv[3] == "rep":
rep_ext = re.compile(r"REPLICAT")
elif sys.argv[3] == "ext":
rep_ext = re.compile(r"EXTRACT")
else:
print "Read the usage rep or ext not ",sys.argv[3]
print 'Usage ',sys.argv[0], ' lag|checkpoint|broken db_instance_name rep|ext'
sys.exit(3)
for line in run_ggsci("info all"):
if rep_ext.match(line):
#not going to explain this line too deeply read regexp this looks line breaks out each column and then some
match = re.search('^([A-Z]+)[\t ]+([A-Z]+)[\t ]+([A-Z0-9]+)[\t ]+(..):(..):(..)[\t ]+(..):(..):(..)', line)
if match:
if sys.argv[1] == 'lag':
lag_sec = int(match.group(4))*3600+int(match.group(5))*60+int(match.group(6))
if lag_sec > 100 and lag_sec < 500:
print 'em_result=LAG ',match.group(3),lag_sec
elif lag_sec > 500:
print 'em_result=LAGBAD ',match.group(3),lag_sec
sys.exit(2)
#print 'LAG ', match.group(3), int(match.group(4))*3600+int(match.group(5))*60+int(match.group(6))
elif sys.argv[1] == 'checkpoint':
chk_sec = int(match.group(7))*3600+int(match.group(8))*60+int(match.group(9))
if chk_sec > 100 and chk_sec < 500:
print 'em_result=CHECKPOINT ',match.group(3),chk_sec
elif chk_sec > 500:
print 'em_result=CHECKPOINTBAD ',match.group(3),chk_sec
sys.exit(3)
#print 'Checkpoint ', match.group(3), int(match.group(7))*3600+int(match.group(8))*60+int(match.group(9))
elif sys.argv[1] == 'broken':
if match.group(2) == 'ABENDED':
print 'em_result=ABENDED ',match.group(1),match.group(2)
sys.exit(4)
else:
print 'Usage ',sys.argv[0], ' lag|checkpoint|broken db_instance_name rep|ext'
sys.exit(3)
else:
print 'Error ',data
Edited by: user5827076 on Jun 24, 2010 2:36 PM
Edited by: user5827076 on Jun 24, 2010 2:37 PM
Edited by: user5827076 on Jul 5, 2010 10:47 AM