Under what circumstances can a core dump file be corrupted?
Hello Gurus!
I am hoping someone out there has seen this problem before and knows of a fix/patch because so far I could not find any related patches or figure this one out!
Here is the summary of the challenge:
1) A daemon (unstripped) core dumps on a Solaris 8 64bit host (multi-CPU 450Mhz, 2GB Ram, Sun 420R) but the core file is only 40K and is corrupted when we try to debug it. We see from the system messages file, the core dump had actually failed:
Feb 26 17:58:58 host1 genunix: [ID 457380 kern.notice] NOTICE: core_log: lim.orig[6820] core dump failed, errno=5: /tmp/core.daemon.orig.6820
and errno=5 means a physical i/o problem. (see below) BUT...
2) SunSolve says this is a physical disk error:
http://sunsolve.Sun.COM/pub-cgi/retrieve.pl?doc=finfodoc%2F11371&zone_32=core%20dump%20failed%2C%20errno%3D5
I/O error
=========
Some physical Input/Output error has occurred. If the process was
writing a file, data corruption is possible.
First find out which device is experiencing the I/O error.
If the device is a hard disk drive, you might need to run fsck(1M) and
possibly even reformat the disk.
In some cases this error might occur on a call following the one to
which it actually applies.
The symbolic name for this error is EIO, errno=5.
----
But the host and disk is fine, there are no other problems with the box or disk..
3) There are no core dump file size limits and we configured "coreadm" to allow all core file generation. /tmp has plenty of free disk space and the host has no other problems. We can make the daemon as well as other daemons core dump (kill -11 <pid>) and the core files can be very large. If we try to core dump this daemon manually , the core file is >2MB, but when there is a system problem and it core dumps on its own, it is only 40K and corrupted. (DBX cannot process the core file and says the stack memory is corrupted)
4) I could not find any known patches that relate to system problems in generating core files. We even tried another Solaris 8 64bit box and the same thing had happened. It seems like 40960 bytes is a magic number for the core dump size when the system core dumps the daemon.
Any ideas/suggestions would be most appreciated.
Thank you for your time.
Alan