|
|
|
|
| Author |
Message |
Charles C. Guest
|
Posted: Fri Aug 11, 2006 4:01 am Post subject: Megaraid crash on linux |
|
|
Hi,
We got a crash and can't decipher the logs. Any help appreciated.
OS = Linux Gentoo 2.6.17-gentoo-r4 PIII
There are two megaraid adapters both U160.
Megaraid Adapter 0 = megaraid express 500 (single channel, uses both
sides of the channel internal/external).
Megaraid Adapter 1 = Megaraid elite 1600, (dual channel, channel 0 is used)
There is also an Adaptec AIC-7892P built on the mobo, enabled on PC's
bios but not used.
The logs reported by the kernel show....
megaraid: aborting-7399451 cmd=2a <c=2 t=0 l=0>
megaraid abort: 7399451:9[255:0], fw owner
megaraid: aborting-7399452 cmd=2a <c=2 t=0 l=0>
megaraid abort: 7399452:57[255:0], fw owner
....
megaraid: aborting-7752089 cmd=28 <c=2 t=0 l=0>
megaraid abort: 7752089:45[255:0], fw owner
megaraid: 64 outstanding commands. Max wait 300 sec
megaraid mbox: Wait for 64 commands to complete:300
megaraid mbox: Wait for 64 commands to complete:295
megaraid mbox: Wait for 64 commands to complete:290
megaraid mbox: Wait for 64 commands to complete:285
megaraid mbox: Wait for 64 commands to complete:280
megaraid mbox: Wait for 64 commands to complete:275
megaraid mbox: reset sequence completed sucessfully
Any ideas as to what "c=2" stands for? (Channel 2, but then what is
Channel 2?).
Any other pointers appreciated.
TIA
Charles
--
Please remove _removeme_ to reply. |
|
| Back to top |
|
 |
|
|
Michael Baeuerle Guest
|
Posted: Fri Aug 11, 2006 1:30 pm Post subject: Re: Megaraid crash on linux |
|
|
"Charles C." wrote:
| Quote: |
[...]
The logs reported by the kernel show....
megaraid: aborting-7399451 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399451:9[255:0], fw owner
megaraid: aborting-7399452 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399452:57[255:0], fw owner
...
megaraid: aborting-7752089 cmd=28 <c=2 t=0 l=0
megaraid abort: 7752089:45[255:0], fw owner
megaraid: 64 outstanding commands. Max wait 300 sec
megaraid mbox: Wait for 64 commands to complete:300
megaraid mbox: Wait for 64 commands to complete:295
megaraid mbox: Wait for 64 commands to complete:290
megaraid mbox: Wait for 64 commands to complete:285
megaraid mbox: Wait for 64 commands to complete:280
megaraid mbox: Wait for 64 commands to complete:275
megaraid mbox: reset sequence completed sucessfully
Any ideas as to what "c=2" stands for? (Channel 2, but then what is
Channel 2?).
|
The megaraid driver creates this message using the function
"megaraid_abort_and_reset()":
----------------------------------------------------------------------
static
int
megaraid_abort_and_reset(adapter_t *adapter, Scsi_Cmnd *cmd, int
aor)
{
struct list_head *pos,
*next;
scb_t
*scb;
printk(KERN_WARNING "megaraid: %s-%lx cmd=%x <c=%d t=%d
l=%d>\n",
(aor == SCB_ABORT)? "ABORTING":"RESET",
cmd->serial_number,
cmd->cmnd[0],
cmd->device->channel,
cmd->device->id, cmd->device->lun);
[...]
----------------------------------------------------------------------
(this is from 2.6.8 but should work similar as in your kernel)
As you have written "c=2" means channel 2.
There can be logical and physical channels on the megaraid:
----------------------------------------------------------------------
[...]
/*
* The theory: If physical drive is chosen for boot, all the
physical * devices are exported
before the logical drives, otherwise
physical
* devices are pushed after logical drives, in which case - Kernel
sees
* the physical devices on virtual channel which is obviously
converted
* to actual channel on the
HBA.
*/
if( adapter->boot_pdrv_enabled )
{
if( islogical )
{
/* logical channel
*/
channel = cmd->device->channel
-
adapter->product_info.nchannels;
}
else
{
/* this is physical channel
*/
channel =
cmd->device->channel;
target = cmd->device->id;
[...]
----------------------------------------------------------------------
Unfortunately there is no documentation included in the kernel. My
interpretation is:
"nchannels" should be the number of channels. If the machine do not boot
from a physical drive and your controller have 2 channels, the logical
drives for the primary bus should show on channel 0 and the
corresponding physical drives on channel 2 (same for channel 1 and 3 on
the secondary bus).
If we assume that this is correct, the disk with ID0 on the primary bus
(channel 0/2) have failed on the elite 1600 (the express 500 have only
one bus so there should be channels 0 and 1).
Micha |
|
| Back to top |
|
 |
Michael Baeuerle Guest
|
Posted: Fri Aug 11, 2006 2:08 pm Post subject: Re: Megaraid crash on linux |
|
|
"Charles C." wrote:
| Quote: |
[...]
The logs reported by the kernel show....
megaraid: aborting-7399451 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399451:9[255:0], fw owner
megaraid: aborting-7399452 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399452:57[255:0], fw owner
...
megaraid: aborting-7752089 cmd=28 <c=2 t=0 l=0
megaraid abort: 7752089:45[255:0], fw owner
megaraid: 64 outstanding commands. Max wait 300 sec
megaraid mbox: Wait for 64 commands to complete:300
megaraid mbox: Wait for 64 commands to complete:295
megaraid mbox: Wait for 64 commands to complete:290
megaraid mbox: Wait for 64 commands to complete:285
megaraid mbox: Wait for 64 commands to complete:280
megaraid mbox: Wait for 64 commands to complete:275
megaraid mbox: reset sequence completed sucessfully
Any ideas as to what "c=2" stands for? (Channel 2, but then what is
Channel 2?).
|
The megaraid driver creates this message using the function
"megaraid_abort_and_reset()":
----------------------------------------------------------------------
static int
megaraid_abort_and_reset(adapter_t *adapter, Scsi_Cmnd *cmd, int aor)
{
struct list_head *pos, *next;
scb_t *scb;
printk(KERN_WARNING "megaraid: %s-%lx cmd=%x <c=%d t=%d l=%d>\n",
(aor == SCB_ABORT)? "ABORTING":"RESET", cmd->serial_number,
cmd->cmnd[0], cmd->device->channel,
cmd->device->id, cmd->device->lun);
[...]
----------------------------------------------------------------------
(this is from 2.6.8 but should work similar as in your kernel)
As you have written "c=2" means channel 2.
There can be logical and physical channels on the megaraid:
----------------------------------------------------------------------
[...]
/*
* The theory: If physical drive is chosen for boot, all the physical
* devices are exported before the logical drives, otherwise physical
* devices are pushed after logical drives, in which case - Kernel
* sees the physical devices on virtual channel which is obviously
* converted to actual channel on the HBA.
*/
if( adapter->boot_pdrv_enabled )
{
if( islogical )
{
/* logical channel */
channel = cmd->device->channel
- adapter->product_info.nchannels;
}
else
{
/* this is physical channel */
channel = cmd->device->channel;
target = cmd->device->id;
[...]
----------------------------------------------------------------------
Unfortunately there is no documentation included in the kernel. My
interpretation is:
"nchannels" should be the number of channels. If the machine do not boot
from a physical drive and your controller have 2 channels, the logical
drives for the primary bus should show on channel 0 and the
corresponding physical drives on channel 2 (same for channel 1 and 3 on
the secondary bus).
If we assume that this is correct, the disk with ID0 on the primary bus
(channel 0/2) have failed on the elite 1600 (the express 500 have only
one bus so there should be channels 0 and 1).
Micha |
|
| Back to top |
|
 |
Charles C. Guest
|
Posted: Fri Aug 18, 2006 2:26 pm Post subject: Re: Megaraid crash on linux |
|
|
Charles C. wrote:
| Quote: | Hi,
We got a crash and can't decipher the logs. Any help appreciated.
OS = Linux Gentoo 2.6.17-gentoo-r4 PIII
There are two megaraid adapters both U160.
Megaraid Adapter 0 = megaraid express 500 (single channel, uses both
sides of the channel internal/external).
Megaraid Adapter 1 = Megaraid elite 1600, (dual channel, channel 0 is used)
There is also an Adaptec AIC-7892P built on the mobo, enabled on PC's
bios but not used.
The logs reported by the kernel show....
megaraid: aborting-7399451 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399451:9[255:0], fw owner
megaraid: aborting-7399452 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399452:57[255:0], fw owner
...
megaraid: aborting-7752089 cmd=28 <c=2 t=0 l=0
megaraid abort: 7752089:45[255:0], fw owner
megaraid: 64 outstanding commands. Max wait 300 sec
megaraid mbox: Wait for 64 commands to complete:300
megaraid mbox: Wait for 64 commands to complete:295
megaraid mbox: Wait for 64 commands to complete:290
megaraid mbox: Wait for 64 commands to complete:285
megaraid mbox: Wait for 64 commands to complete:280
megaraid mbox: Wait for 64 commands to complete:275
megaraid mbox: reset sequence completed sucessfully
Any ideas as to what "c=2" stands for? (Channel 2, but then what is
Channel 2?).
Any other pointers appreciated.
TIA
Charles
|
Hi,
Thanks for the help :-)
For anyone reading the above and having similar problems.
The problem appears to have been caused by uneven lengths of cables
between enclosures disks etc. The configuration was :
Adapter --> 1m external cable --> raid tower (0.5m cable used to
interconnect two assemblies of the tower) --> 1m external cable to PC
case --> internal twisted pair cable 0.5m lead + 11 connectors or so +
terminator.
8 disks were in raid tower, 5 disks in external PC case (after the
tower). The 5 disks were at the start of the cable ... followed by
unused connectors followed by terminator.
12 disks used in a single raid 5 array, 1 disk (the very first in the
sequence) used as hot spare.
The above crashed after 4 hours of heavy use.
========
One possible solution. (has been holding for over 30 hours of heavy use).
External PC case --> 1m external cable --> (internal connector of)
adapter (external connector) --> 1m cable --> Raid Tower.
Both ends are terminated, and the megaraid (493 model) insists on being
terminated too else it hangs (???), perhaps it is autosensing (no info
on docs). The tower is still bridged with a 0.5m cable.
External PC case has 6 disks (1 more than before) all disks are set at
the end of the cable (nearest the terminator). It is now running with a
total of 13 (70GB) disks in the array and 1 disk as hot spare.
=========
One point to note, the documentation says ... 3-8 disks may be used in a
raid 5 array.
Regards
Charles
--
Please remove _removeme_ to reply. |
|
| Back to top |
|
 |
Folkert Rienstra Guest
|
Posted: Fri Aug 18, 2006 7:45 pm Post subject: Re: Megaraid crash on linux |
|
|
"Charles C." <c.k.christacopoulos.removeme.@dundee.ac.uk> wrote in message news:44e59f06$0$1382$da0feed9@news.zen.co.uk
| Quote: | Charles C. wrote:
Hi,
We got a crash and can't decipher the logs. Any help appreciated.
OS = Linux Gentoo 2.6.17-gentoo-r4 PIII
There are two megaraid adapters both U160.
Megaraid Adapter 0 = megaraid express 500 (single channel, uses both
sides of the channel internal/external).
Megaraid Adapter 1 = Megaraid elite 1600, (dual channel, channel 0 is used)
There is also an Adaptec AIC-7892P built on the mobo, enabled on PC's
bios but not used.
The logs reported by the kernel show....
megaraid: aborting-7399451 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399451:9[255:0], fw owner
megaraid: aborting-7399452 cmd=2a <c=2 t=0 l=0
megaraid abort: 7399452:57[255:0], fw owner
...
megaraid: aborting-7752089 cmd=28 <c=2 t=0 l=0
megaraid abort: 7752089:45[255:0], fw owner
megaraid: 64 outstanding commands. Max wait 300 sec
megaraid mbox: Wait for 64 commands to complete:300
megaraid mbox: Wait for 64 commands to complete:295
megaraid mbox: Wait for 64 commands to complete:290
megaraid mbox: Wait for 64 commands to complete:285
megaraid mbox: Wait for 64 commands to complete:280
megaraid mbox: Wait for 64 commands to complete:275
megaraid mbox: reset sequence completed sucessfully
Any ideas as to what "c=2" stands for? (Channel 2, but then what is
Channel 2?).
Any other pointers appreciated.
TIA
Charles
Hi,
Thanks for the help :-)
For anyone reading the above and having similar problems.
The problem appears to have been caused by uneven lengths of cables
between enclosures disks etc. The configuration was :
Adapter --> 1m external cable --> raid tower (0.5m cable used to
interconnect two assemblies of the tower) --> 1m external cable to PC
case --> internal twisted pair cable 0.5m lead + 11 connectors or so +
terminator.
8 disks were in raid tower, 5 disks in external PC case (after the
tower). The 5 disks were at the start of the cable ... followed by
unused connectors followed by terminator.
12 disks used in a single raid 5 array, 1 disk (the very first in the
sequence) used as hot spare.
The above crashed after 4 hours of heavy use.
========
One possible solution. (has been holding for over 30 hours of heavy use).
External PC case --> 1m external cable --> (internal connector of)
adapter (external connector) --> 1m cable --> Raid Tower.
Both ends are terminated, and the megaraid (493 model) insists on being
terminated too
|
You know what is wrong with that, don't you.
| Quote: | else it hangs (???),
|
Sounds like no terminator power to the other terminators.
| Quote: | perhaps it is autosensing (no info on docs). The tower is still bridged with a 0.5m cable.
External PC case has 6 disks (1 more than before) all disks are set at
the end of the cable (nearest the terminator). It is now running with a
total of 13 (70GB) disks in the array and 1 disk as hot spare.
=========
One point to note, the documentation says ... 3-8 disks may be used in a
raid 5 array.
Regards
Charles |
|
|
| Back to top |
|
 |
Fix your Windows Problems - FAST.
FREE Safe Scan Registry Check. Locate & Fix Errors in Minutes!
|
|
|
|
| |