Can't cancel/remove stuck jobs
721339Sep 8 2009 — edited Sep 17 2009This is my first venture into OSB and the first backup I've tried to do. I'm learning quickly.
I have two clusters each with two nodes. One cluster has nodes sdorac2a and sdorac2b; the other sdorac4a and sdorac4b. Both are configured the same and each has 3 OCFS2 cluister filesystems mounted to /usr/local, /home and /apps.
I created a dataset containing the following:
include host sdorac4b
include host sdorac2b
include path /home
include path /usr/local
include path /apps
exclude name core
exclude name *~
I have two tape drives configured in OSB on 2a and 2b and created a schedule to do a one time backup starting at 18:00 last night using that dataset to backup to the tape drive on 2a only. At least I think I did. I might have made a mistake and told it to use the tape drive on 2b also before changing it to use only 2a. I suspect that because of the catxcr output.
This morning both backups were still running. catxcr for jobs 1.1 and 1.2 both show the same output, but on different media nodes.
# obtool catxcr 1.1
2009/09/07.18:00:59 ______________________________________________________________________
2009/09/07.18:00:59
2009/09/07.18:00:59 Transcript for job 1.1 running on sdorac2a
2009/09/07.18:00:59
Backup started on Mon Sep 07 2009 at 18:01:01
Volume label:
Volume UUID: 918979c6-7df5-102c-9a98-0024817dfd32
Volume ID: VOL000001
Volume sequence: 1
Volume set owner: root
Volume set created: Mon Sep 07 18:01:01 2009
Original UUID: 918979c6-7df5-102c-9a98-0024817dfd32
Archive label:
File number: 1
File section: 1
Owner: root
Client host: sdorac4b
Backup level: 0
S/w compression: no
Archive created: Mon Sep 07 18:01:01 2009
Encryption: off
Dumping all files in /home
Dumping all files in /usr/local
Dumping all files in /apps
Opening device /dev/sg0 failed - device is busy (OB scsi device driver)
# obtool catxcr 1.2
2009/09/07.18:01:00 ______________________________________________________________________
2009/09/07.18:01:00
2009/09/07.18:01:00 Transcript for job 1.2 running on sdorac2b
2009/09/07.18:01:00
Backup started on Mon Sep 07 2009 at 18:01:03
Volume label:
Volume UUID: 93f53858-7df5-102c-bab1-0024817e168a
Volume ID: VOL000002
Volume sequence: 1
Volume set owner: root
Volume set created: Mon Sep 07 18:01:03 2009
Original UUID: 93f53858-7df5-102c-bab1-0024817e168a
Archive label:
File number: 1
File section: 1
Owner: root
Client host: sdorac2b
Backup level: 0
S/w compression: no
Archive created: Mon Sep 07 18:01:03 2009
Encryption: off
Dumping all files in /home
Dumping all files in /usr/local
Dumping all files in /apps
Opening device /dev/sg0 failed - device is busy (OB scsi device driver)
I've no idea why this would happen, but decided to cancel the jobs before looking into it further as they've just been sitting there all night.
I tried cancelling the jobs but this hasn't worked. They're just in a pending state now.
# obtool lsjob
Job ID Sched time Contents State
---------------- ----------- ------------------------------ ---------------------------------------
1.1 09/07.18:00 backup sdorac4b running since 2009/09/07.18:00; cancellation pending
1.2 09/07.18:00 backup sdorac2b running since 2009/09/07.18:00; cancellation pending
catxcr has added an extra line for each to the transcript (just different pid numbers):
Error: [27325] killed
How do I get rid of them now? Is there a way to force remove them?
Rgds,
John