I have installed Centos 5.9 (don't ask...), my storage is a disk array of 10 scsi disks connected via the iscsi protocol, and I installed Grid Infrastructure 11.2.0.3 (software only).
I was configuring ASM, but when I added the candidate disks, it never ended, then I canceled. After several failed attempts to make it work, I decided to restart the server.
I think I screw it because before restarting the server I had uninstalled ASMLib.
The server didn't want to start because of errors related to the disks. After doing some magic, I disabled the connection to the disks and was able to boot, but now I have some problems when trying to connect to the disks:
$ /etc/init.d/iscsid start
$ chkconfig --add iscsi
$ chkconfig iscsi on
$ iscsiadm -m discovery -t sendtargets -p 10.9.254.2
10.9.254.1:3260,1 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.9:3260,2 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.2:3260,3 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.10:3260,4 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.3:3260,5 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.11:3260,6 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.4:3260,7 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
10.9.254.12:3260,8 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
Then I try to connect to portal 10.9.254.2 (I'm not quite sure how to determine which portal I should use, but this one used to work before):
$ iscsiadm -m node -T iqn.1986-03.com.hp:storage.p2000g3.121514b3cc -l -p 10.9.254.2
But it never ends, so I have to cancel it. It looks like it is connected and working:
$ iscsiadm -m session
tcp: [8] 10.9.254.1:3260,1 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
tcp: [9] 10.9.254.3:3260,5 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc
...but it isn't working:
$ iscsiadm -m session -P3
(...)
Attached SCSI devices:
************************
Host Number: 5 State: running
scsi5 Channel 00 Id 0 Lun: 0
Attached scsi disk sda State: Unknown
Moreover, I can't see the disks! (sda, sdb, sdc...)
$ ls -l /dev/s*
lrwxrwxrwx 1 root root 3 feb 6 17:05 /dev/scd0 -> sr0
crw------- 1 root root 21, 0 feb 6 17:05 /dev/sg0
crw------- 1 root root 21, 1 feb 26 12:30 /dev/sg1
crw------- 1 root root 21, 10 feb 26 14:31 /dev/sg10
crw------- 1 root root 21, 2 feb 26 12:32 /dev/sg2
crw------- 1 root root 21, 3 feb 26 13:02 /dev/sg3
crw------- 1 root root 21, 4 feb 26 13:02 /dev/sg4
crw------- 1 root root 21, 5 feb 26 13:31 /dev/sg5
crw------- 1 root root 21, 6 feb 26 13:33 /dev/sg6
crw------- 1 root root 21, 7 feb 26 14:01 /dev/sg7
crw------- 1 root root 21, 8 feb 26 14:01 /dev/sg8
crw------- 1 root root 21, 9 feb 26 14:30 /dev/sg9
crw------- 1 root root 10, 231 feb 6 17:05 /dev/snapshot
brw-rw---- 1 root disk 11, 0 feb 6 17:05 /dev/sr0
lrwxrwxrwx 1 root root 15 feb 6 17:05 /dev/stderr -> /proc/self/fd/2
lrwxrwxrwx 1 root root 15 feb 6 17:05 /dev/stdin -> /proc/self/fd/0
lrwxrwxrwx 1 root root 15 feb 6 17:05 /dev/stdout -> /proc/self/fd/1
crw------- 1 root root 4, 0 feb 6 14:04 /dev/systty
The strange thing here, is that at the beginning, in /proc/partitions I can see only sda. But the remaining devices appear progresively... after 1 hour, I can see all of them:
$ cat /proc/partitions
major minor #blocks name
104 0 292935982 cciss/c0d0
104 1 104391 cciss/c0d0p1
104 2 292824787 cciss/c0d0p2
253 0 286720000 dm-0
253 1 6094848 dm-1
8 0 96679680 sda
8 16 96679680 sdb
8 32 96679680 sdc
8 48 96679680 sdd
8 64 96679680 sde
8 80 96679680 sdf
8 96 96679680 sdg
8 112 96679680 sdh
8 128 96679680 sdi
8 144 96679680 sdj
8 160 96679680 sdk
8 176 96679680 sdl
At the end of the day, I can see 2 devices as "unknown" and the remaining as "running". But I can't even fdisk them:
$ fdisk -l /dev/sda
Nothing returned. Let's try to create a partition:
$ fdisk /dev/sda
Cannot open /dev/sda
I can't access any sdX device, because there's no one in /dev.
This is the log I got when I tried to connect to the targets by using the iscsiadm command:
Feb 6 15:16:55 bat-cvracdb02 kernel: scsi5 : iSCSI Initiator over TCP/IP
Feb 6 15:16:55 bat-cvracdb02 kernel: Vendor: HP Model: P2000 G3 iSCSI Rev: T250
Feb 6 15:16:55 bat-cvracdb02 kernel: Type: Direct-Access ANSI SCSI revision: 05
Feb 6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: 193359360 512-byte hdwr sectors (99000 MB)
Feb 6 15:16:55 bat-cvracdb02 kernel: sda: Write Protect is off
Feb 6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: drive cache: write back
Feb 6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: 193359360 512-byte hdwr sectors (99000 MB)
Feb 6 15:16:55 bat-cvracdb02 kernel: sda: Write Protect is off
Feb 6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: drive cache: write back
Feb 6 15:16:56 bat-cvracdb02 iscsid: Connection1:0 to [target: iqn.1986-03.com.hp:storage.p2000g3.121514b3cc, portal: 10.9.254.2,3260] through [iface: default] is operational now
Feb 6 15:17:05 bat-cvracdb02 kernel: sda:<3> connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295476749, last ping 4295481749, now 4295486749
Feb 6 15:17:05 bat-cvracdb02 kernel: connection1:0: detected conn error (1011)
Feb 6 15:17:06 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 6 15:17:06 bat-cvracdb02 udevd-event[3628]: wait_for_sysfs: waiting for '/sys/devices/platform/host5/session1/target5:0:0/5:0:0:0/ioerr_cnt' failed
Feb 6 15:17:09 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.
Feb 6 15:17:09 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)
Feb 6 15:17:24 bat-cvracdb02 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295495002, last ping 4295500002, now 4295505002
Feb 6 15:17:24 bat-cvracdb02 kernel: connection1:0: detected conn error (1011)
Feb 6 15:17:24 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 6 15:17:27 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.
Feb 6 15:17:27 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)
Feb 6 15:17:37 bat-cvracdb02 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295508253, last ping 4295513253, now 4295518253
Feb 6 15:17:37 bat-cvracdb02 kernel: connection1:0: detected conn error (1011)
Feb 6 15:17:38 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 6 15:17:40 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.
Feb 6 15:17:40 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)
Feb 6 15:17:50 bat-cvracdb02 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295521254, last ping 4295526254, now 4295531254
Feb 6 15:17:50 bat-cvracdb02 kernel: connection1:0: detected conn error (1011)
Feb 6 15:17:51 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 6 15:17:53 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.
Feb 6 15:17:53 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)
Feb 6 15:18:03 bat-cvracdb02 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295534255, last ping 4295539255, now 4295544255
Feb 6 15:18:03 bat-cvracdb02 kernel: connection1:0: detected conn error (1011)
Feb 6 15:18:04 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 6 15:18:06 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.
Feb 6 15:18:06 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)
Feb 6 15:18:12 bat-cvracdb02 avahi-daemon[3321]: Invalid query packet.
Feb 6 15:18:16 bat-cvracdb02 last message repeated 6 times
Feb 6 15:18:16 bat-cvracdb02 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295547259, last ping 4295552259, now 4295557259
Feb 6 15:18:16 bat-cvracdb02 kernel: connection1:0: detected conn error (1011)
Feb 6 15:18:16 bat-cvracdb02 kernel: sd 5:0:0:0: Unhandled error code
Feb 6 15:18:16 bat-cvracdb02 kernel: sd 5:0:0:0: SCSI error: return code = 0x000e0000
Feb 6 15:18:16 bat-cvracdb02 kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK
Feb 6 15:18:16 bat-cvracdb02 kernel: Buffer I/O error on device sda, logical block 0
Feb 6 15:18:17 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 6 15:18:19 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.
Feb 6 15:18:19 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)
I've also tried to reinstall ASMLib, Grid Insfrastructure, and the iscsi tools.
I'm tired of this problem, please kindly share any ideas. On the Centos forum nobody could help me.