Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

ASM & iSCSI - Disks disappeared forever

danrs2011Feb 26 2014 — edited Apr 14 2014

I have installed Centos 5.9 (don't ask...), my storage is a disk array of 10 scsi disks connected via the iscsi protocol, and I installed Grid Infrastructure 11.2.0.3 (software only).

I was configuring ASM, but when I added the candidate disks, it never ended, then I canceled. After several failed attempts to make it work, I decided to restart the server.

I think I screw it because before restarting the server I had uninstalled ASMLib.

The server didn't want to start because of errors related to the disks. After doing some magic, I disabled the connection to the disks and was able to boot, but now I have some problems when trying to connect to the disks:

$ /etc/init.d/iscsid start

  $ chkconfig --add iscsi

  $ chkconfig iscsi on

  $ iscsiadm -m discovery -t sendtargets -p 10.9.254.2

10.9.254.1:3260,1 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.9:3260,2 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.2:3260,3 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.10:3260,4 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.3:3260,5 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.11:3260,6 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.4:3260,7 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

10.9.254.12:3260,8 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

Then I try to connect to portal 10.9.254.2 (I'm not quite sure how to determine which portal I should use, but this one used to work before):

  $ iscsiadm -m node -T iqn.1986-03.com.hp:storage.p2000g3.121514b3cc -l -p 10.9.254.2

But it never ends, so I have to cancel it. It looks like it is connected and working:

  $ iscsiadm -m session

tcp: [8] 10.9.254.1:3260,1 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

tcp: [9] 10.9.254.3:3260,5 iqn.1986-03.com.hp:storage.p2000g3.121514b3cc

...but it isn't working:

  $ iscsiadm -m session -P3

                (...)

                Attached SCSI devices:

                ************************

                Host Number: 5  State: running

                scsi5 Channel 00 Id 0 Lun: 0

                        Attached scsi disk sda          State: Unknown

Moreover, I can't see the disks! (sda, sdb, sdc...)

$ ls -l /dev/s*

lrwxrwxrwx 1 root root       3 feb  6 17:05 /dev/scd0 -> sr0

crw------- 1 root root 21,   0 feb  6 17:05 /dev/sg0

crw------- 1 root root 21,   1 feb 26 12:30 /dev/sg1

crw------- 1 root root 21,  10 feb 26 14:31 /dev/sg10

crw------- 1 root root 21,   2 feb 26 12:32 /dev/sg2

crw------- 1 root root 21,   3 feb 26 13:02 /dev/sg3

crw------- 1 root root 21,   4 feb 26 13:02 /dev/sg4

crw------- 1 root root 21,   5 feb 26 13:31 /dev/sg5

crw------- 1 root root 21,   6 feb 26 13:33 /dev/sg6

crw------- 1 root root 21,   7 feb 26 14:01 /dev/sg7

crw------- 1 root root 21,   8 feb 26 14:01 /dev/sg8

crw------- 1 root root 21,   9 feb 26 14:30 /dev/sg9

crw------- 1 root root 10, 231 feb  6 17:05 /dev/snapshot

brw-rw---- 1 root disk 11,   0 feb  6 17:05 /dev/sr0

lrwxrwxrwx 1 root root      15 feb  6 17:05 /dev/stderr -> /proc/self/fd/2

lrwxrwxrwx 1 root root      15 feb  6 17:05 /dev/stdin -> /proc/self/fd/0

lrwxrwxrwx 1 root root      15 feb  6 17:05 /dev/stdout -> /proc/self/fd/1

crw------- 1 root root  4,   0 feb  6 14:04 /dev/systty

The strange thing here, is that at the beginning, in /proc/partitions I can see only sda. But the remaining devices appear progresively... after 1 hour, I can see all of them:

  $ cat /proc/partitions

major minor  #blocks  name

104     0  292935982 cciss/c0d0

104     1     104391 cciss/c0d0p1

104     2  292824787 cciss/c0d0p2

253     0  286720000 dm-0

253     1    6094848 dm-1

   8     0   96679680 sda

   8    16   96679680 sdb

   8    32   96679680 sdc

   8    48   96679680 sdd

   8    64   96679680 sde

   8    80   96679680 sdf

   8    96   96679680 sdg

   8   112   96679680 sdh

   8   128   96679680 sdi

   8   144   96679680 sdj

   8   160   96679680 sdk

   8   176   96679680 sdl

At the end of the day, I can see 2 devices as "unknown" and the remaining as "running". But I can't even fdisk them:

  $ fdisk -l /dev/sda

Nothing returned. Let's try to create a partition:

$ fdisk /dev/sda

  Cannot open /dev/sda

I can't access any sdX device, because there's no one in /dev.

This is the log I got when I tried to connect to the targets by using the iscsiadm command:

Feb  6 15:16:55 bat-cvracdb02 kernel: scsi5 : iSCSI Initiator over TCP/IP

Feb  6 15:16:55 bat-cvracdb02 kernel:   Vendor: HP        Model: P2000 G3 iSCSI    Rev: T250

Feb  6 15:16:55 bat-cvracdb02 kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05

Feb  6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: 193359360 512-byte hdwr sectors (99000 MB)

Feb  6 15:16:55 bat-cvracdb02 kernel: sda: Write Protect is off

Feb  6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: drive cache: write back

Feb  6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: 193359360 512-byte hdwr sectors (99000 MB)

Feb  6 15:16:55 bat-cvracdb02 kernel: sda: Write Protect is off

Feb  6 15:16:55 bat-cvracdb02 kernel: SCSI device sda: drive cache: write back

Feb  6 15:16:56 bat-cvracdb02 iscsid: Connection1:0 to [target: iqn.1986-03.com.hp:storage.p2000g3.121514b3cc, portal: 10.9.254.2,3260] through [iface: default] is operational now

Feb  6 15:17:05 bat-cvracdb02 kernel:  sda:<3> connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295476749, last ping 4295481749, now 4295486749

Feb  6 15:17:05 bat-cvracdb02 kernel:  connection1:0: detected conn error (1011)

Feb  6 15:17:06 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)

Feb  6 15:17:06 bat-cvracdb02 udevd-event[3628]: wait_for_sysfs: waiting for '/sys/devices/platform/host5/session1/target5:0:0/5:0:0:0/ioerr_cnt' failed

Feb  6 15:17:09 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.

Feb  6 15:17:09 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)

Feb  6 15:17:24 bat-cvracdb02 kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295495002, last ping 4295500002, now 4295505002

Feb  6 15:17:24 bat-cvracdb02 kernel:  connection1:0: detected conn error (1011)

Feb  6 15:17:24 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)

Feb  6 15:17:27 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.

Feb  6 15:17:27 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)

Feb  6 15:17:37 bat-cvracdb02 kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295508253, last ping 4295513253, now 4295518253

Feb  6 15:17:37 bat-cvracdb02 kernel:  connection1:0: detected conn error (1011)

Feb  6 15:17:38 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)

Feb  6 15:17:40 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.

Feb  6 15:17:40 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)

Feb  6 15:17:50 bat-cvracdb02 kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295521254, last ping 4295526254, now 4295531254

Feb  6 15:17:50 bat-cvracdb02 kernel:  connection1:0: detected conn error (1011)

Feb  6 15:17:51 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)

Feb  6 15:17:53 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.

Feb  6 15:17:53 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)

Feb  6 15:18:03 bat-cvracdb02 kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295534255, last ping 4295539255, now 4295544255

Feb  6 15:18:03 bat-cvracdb02 kernel:  connection1:0: detected conn error (1011)

Feb  6 15:18:04 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)

Feb  6 15:18:06 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.

Feb  6 15:18:06 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)

Feb  6 15:18:12 bat-cvracdb02 avahi-daemon[3321]: Invalid query packet.

Feb  6 15:18:16 bat-cvracdb02 last message repeated 6 times

Feb  6 15:18:16 bat-cvracdb02 kernel:  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295547259, last ping 4295552259, now 4295557259

Feb  6 15:18:16 bat-cvracdb02 kernel:  connection1:0: detected conn error (1011)

Feb  6 15:18:16 bat-cvracdb02 kernel: sd 5:0:0:0: Unhandled error code

Feb  6 15:18:16 bat-cvracdb02 kernel: sd 5:0:0:0: SCSI error: return code = 0x000e0000

Feb  6 15:18:16 bat-cvracdb02 kernel: Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Feb  6 15:18:16 bat-cvracdb02 kernel: Buffer I/O error on device sda, logical block 0

Feb  6 15:18:17 bat-cvracdb02 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)

Feb  6 15:18:19 bat-cvracdb02 iscsid: Could not online LUN 0 err 2.

Feb  6 15:18:19 bat-cvracdb02 iscsid: connection1:0 is operational after recovery (1 attempts)

I've also tried to reinstall ASMLib, Grid Insfrastructure, and the iscsi tools.

I'm tired of this problem, please kindly share any ideas. On the Centos forum nobody could help me.

This post has been answered by danrs2011 on Apr 14 2014
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on May 12 2014
Added on Feb 26 2014
5 comments
3,747 views