I am running a C++ program that runs fine on 2 other Sun boxes (running Solaris 10), but (naturally) bombs out on the Sun box (running Solaris 10) that will be our production server. The binaries were copied from one of the other boxes so, it was not recompiled.
I have not been able to reproduce this error on any other box, but the box it fails on is the prod box so, I have to get this fixed.
I am not using LD_PRELOAD and the mdb output shows that it is getting malloc from libc.
Now, here is the REALLY odd part:
I have 2 variables, defined as long, that when set to certain integers (seems to be mainly odd numbers), the malloc fails, if I set them to other values (like even numbers or very low odd numbers), the malloc works fine.
Here is a portion of the code:
typedef struct
{
char proc_Name[19];
char proc_Status[3];
char proc_exec_hst_ind[3];
long sleep_seconds;
long commit_rcd_cnt;
long max_rcd_cnt;
*long min_threshold_cnt;* <--- these 2 variables control
*long max_threshold_cnt;* <--- whether malloc fails or not
*parmRcd *parms;* <--- malloc is performed on this field
} processRcd;
processRcd ProcRcd;
...
ProcRcd.max_threshold_cnt = 999998;
// ProcRcd.min_threshold_cnt = 999999; // VJD FAILS
// ProcRcd.min_threshold_cnt = 999998; // VJD PASSES
// ProcRcd.min_threshold_cnt = 999997; // VJD FAILS
// ProcRcd.min_threshold_cnt = 999996; // VJD PASSES
// ProcRcd.min_threshold_cnt = 50001; // VJD FAILS
// ProcRcd.min_threshold_cnt = 50000; // VJD PASSES
// ProcRcd.min_threshold_cnt = 501; // VJD FAILS
// ProcRcd.min_threshold_cnt = 500; // VJD PASSES
// ProcRcd.min_threshold_cnt = 47; // VJD FAILS
// ProcRcd.min_threshold_cnt = 42; // VJD PASSES
ProcRcd.min_threshold_cnt = 41; // VJD FAILS
// ProcRcd.min_threshold_cnt = 9; // VJD PASSES
userlog("VJD before calloc, parms addr = %x", ProcRcd.parms);
ProcRcd.parms = (parmRcd *)calloc(numberOfParms, sizeof(parmRcd));
userlog("VJD after calloc, parms addr = %x ", ProcRcd.parms);
...
I have also tried substituting malloc for calloc (same results) and moving the calloc above the block of assignments (same results).
Here is the mdb output:
::status
debugging core file of CIS_dsClient (32-bit) from nc1omzzpa05
file: /home/ncps/oracle/CIS_dsClient
initial argv: CIS_dsClient CIS_DATASEND
threading model: multi-threaded
status: process terminated by SIGSEGV (Segmentation Fault)
::stack
libc.so.1`t_splay+0x170(ffbff5b4, f4, 861, 0, fddbe3c0, ff3c52c0)
libc.so.1`realfree+0x8c(ffbff4b8, f5, e7974, 0, 0, ffbff4b0)
libc.so.1`_malloc_unlocked+0x260(ffbff3c0, 1f4, ffbff3b8, ffbff3c0, fddc1910, 0)
libc.so.1`malloc+0x4c(f8, 1, e8070, fe0c9f34, fddbe3c0, fddc85b8)
libc.so.1`calloc+0x58(3e, 3e, f8, 0, 328, 0)
getParms+0xfc(ffbff3c4, ffbff3c8, ffbff390, 0, 328, 0)
__1cIprocParmHrefresh6M_v_+0x68(ffbff38c, 228, fe371228, 0, 328, 0)
main+0x884(2, ffbff95c, ffbff968, 30800, fdff07c0, 0)
_start+0x108(0, 0, 0, 0, 0, 0)
The end of the truss output:
open64("/home/ncps/logs/ulogs/ULOG.051309", O_WRONLY|O_APPEND|O_CREAT, 0666) = 11
umask(022) = 0
write(11, " 1 5 5 8 0 4 . n c 1 o m".., 103) = 103
close(11) = 0
Incurred fault #6, FLTBOUNDS %pc = 0xFDCD7188
siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000B <--- usually I get a SEGV_ACCERR
Received signal #11, SIGSEGV [default]
siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000B
Log file entries:
155804.nc1omzzpa05!CIS_dsClient.13585.1.0: VJD at start, parms addr = ffbff390
155804.nc1omzzpa05!CIS_dsClient.13585.1.0: VJD before assgnmts parms addr = ffbff390
155804.nc1omzzpa05!CIS_dsClient.13585.1.0: VJD before calloc, parms addr = ffbff390, numP=4, sizeof=62
155854.nc1omzzpa05!BBL.18642.1.0: LIBTUX_CAT:216: WARN: Process 13585 died; removing from BB
Any assistance would be much appreciated!
Thank you
Edited by: Valerie101 on May 13, 2009 9:37 AM
Edited by: Valerie101 on May 13, 2009 9:38 AM