Skip to Main Content

Hardware

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Heartbeat time outs some times caused by clocks-out-of-sync?

Henk Vandenbergh-OracleMar 26 2015 — edited Mar 26 2015

Heartbeat timeouts are usually caused by the java socket communication between the master and its slaves being slow (or dead).

This slowness in turn can be caused by a system being overloaded, something than can easily happen when we're running out of memory.

Memory problems can be caused by running against file system cache, which in turns eats up all memory, etc.

Some times the java socket connection between the master and remote hosts just is slow, also causing heartbeat timeouts.

Today I had a case where, in a multi-host situation, one of the remote hosts had his clock set five minutes early.

On the master this resulted in lots of 'slow getmessage' warnings telling you that it took longer than five seconds between the time the message was sent and that it was received. These timings use both the master and the remote host's current clock values.

In above case that always meant that every message was considered five minutes late.

Then, further on during the test the remote system's clock was reset and increased by five minutes.

That was it for the heartbeat timer, because it also uses the system's clock values.

So, if you run into heartbeat problems again, check your clocks.

In your 'xxxx-0.stdout.html' file you can verify, within reason, the clock settings, for instance:

11:52:19.013 11:47:17.605 task_run_all(): 395 tasks

11:57:39.004 11:57:39.000 Slow getMessage: hd2-0 1957 301404 REQUEST_SLAVE_STATISTICS

The first time value is the timestamp from the master at the time the message was received, the second is from the slave at the time that the message was sent, and as you can see, there is a five minute difference. The next message shows the now changed clock values.

The important message of this: make sure your clocks are synchronized in a multi-host environment.

Henk.

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 23 2015
Added on Mar 26 2015
0 comments
629 views