TCP RST's using Dir Server 5.1
807573Nov 28 2007 — edited Nov 28 2007I am ocassionally seeing this error in two of our web servers;
"nscd[13100]: [ID 293258 user.error] libsldap: Status: 7 Mesg: LDAP ERROR (91): Can't connect to the LDAP server."
This causes the LDAP lookup to fail and in this case the IP address is never resolved. The web server are making the connection tho the LDAP servers via a loadbalancer.
I investigated this further and have found that when the above mentioned error is seen, a TCP Reset is sent from the web server to the Directory server, it appears that the Reset is sent because the directory server has replied with the wrong TCP sequence number, the web server tries to reply a few times then gives up and sents RST.
The SYN / ACK is sent back from the LDAP server to the web server via the load balancer and the webserver immediately RST's the connection. The reason is that the ACK number in the SYN/ACK from the LDAP server is not the orig seq no +1 as expected.
The strange thing is that I can stop/start the directory server instance and this problem goes away for a few hours. I suspect that it has something to do with directory server / tcp tuning but am a little out of my depth with trying to solve it.
Has anyone else encountered this or knows of a problem fitting this description ?