HTFC Forums

H.T.F.C.

How To Fix Computers





Go Back   HTFC Forums > Hardware Newsgroups > Storage > SCSI

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1  
Old 04-19-2007, 01:42 AM
mrceolla@gmail.com
Guest
 
Posts: n/a
Default Please Help...RAID5 degraded again, I ******* up drive replacement the first time.

Hello,

I am a newbie to RAID recovery. I did a lot of research before my
first attempt, but it failed so I am hoping for some further advice.
I know this is long, but please bear with me.

I have a 3 drive RAID 5 with Fujitsu MAN3367MP drives as my server's
system drive. When it became degraded the first time, I booted into
the Adaptec 2110S RAID controller's bios and rebuilt the array no
problem. The status returned to optimal for maybe a week.

Then I decided to try to replace the failing drive with another
identical drive that I had lying around which use to be with the other
3 drives in a RAID 0 in a previous machine.

According to the Adaptec Storage Manager application GUI (not the
bios), the failed drive was the drive closest to the controller, and
was labeled as "ID 0". The other two drives were ID 4 and ID 8. This
all confused me as I have a 4 drive cable with terminator, and the
first connector from the controller was empty, followed by the 3
drives and finally the terminator. I am not assigning any SCSI IDs on
the drive via the jumpers on the back, so I had assumed they were
assigned based on placement in the chain. But this can't be right
considering the IDs that I was seeing. Either way I figured the
Storage Manager GUI was telling me it was the first drive from the
controller.

To my surprise...THIS WAS NOT THE CASE!

I replaced the first drive from the controller and then booted into
the controller's bios again. My RAID 5 was dead, and my old RAID 0
showed up...and it showed up as optimal believe it or not. That
couldn't be possible. Regardless, I was very confused.

I decided to try to format the drive that showed in the controller's
bios as 0,0,0,0 thinking that this was the drive that I just replaced
and it must have some old RAID 0 information on it. The only this is
that when I began the format, the light on a different drive went
solid...indicating formatting of that drive. The drive that began
formatting was the drive at the end of the cable right before the
terminator.

I still believed that this was the failing drive that I was
formatting, and that I should be able to just replace the other good
RAID 5 drive that I had just removed and I would be back to degraded
status, but with a working RAID 5.

Again...THIS WAS NOT THE CASE!

So I add the drive that I mistakenly removed, booted into the
controller's bios, and the RAID5 was still dead indicating missing
components.

Please answer this...is it a given that if you remove a good drive
from a degraded RAID 5 and boot up the machine, that this RAID 5 will
be permanently broken?

I never tried to boot further than the controller's bios when swapping
drives, so I figured any disc information would be unaltered and I
could simply put in the original drives and they would continue to
operate as degraded. I am confused as to how I permanently broke this
RAID.

More importantly is that I am facing this problem again.

I have confirmed this time through the lights on the drives themselves
that the drive at the end of the chain which was formatted earlier is
failing again. Storage Manager shows the failed drive as ID
0...again. The light does not blink on this drives but continue to on
the other two drives.

I want to try to replace THIS drive with the drive I tried to use as a
replacement before. I have some concerns though. I expect that the
old RAID 0 will show up again and I do not know what SCSI ID this
drive will have. I believe it has an ID of 2 when I last hooked it
up. I am worried this drive's RAID 0 info will somehow trump the RAID
5 info and take away one of the 2 remaining good RAID 5 discs. Is
this possible? Is there a way to prevent this?

What if I disconnect all drives, connect the drive to be added and
then run a format. Will this remove the RAID 0 information from that
drive, and will my RAID 5 show up again when reconnecting the 2 good
drives?

I understand that if all goes well and I can still see the RAID5 as
degraded and the new drive shows up, that all I should need to do is
mark the drive as a hot spare somehow and then rebuild the RAID
5...right? Will this still work if the ID # is not 0 like the
original drive?

Any advise or suggestions will be greatly appreciated. Fortunately,
my previous experience forced me to test my backups for the first
time. All went amazingly well, but it sure takes a long time and I'd
love to avoid that happening again. After all, this is what RAID 5 is
for isn't it...easy recovery from a failed drive.

Thanks in advance,
Mike

Reply With Quote
Sponsored Links
  #2  
Old 04-19-2007, 09:50 AM
Michael Baeuerle
Guest
 
Posts: n/a
Default Re: Please Help...RAID5 degraded again, I ******* up drive replacementthe first time.

mrceolla@gmail.com wrote:
>
> [I have a 3 drive RAID 5 with Fujitsu MAN3367MP drives]
>
> Then I decided to try to replace the failing drive with another
> identical drive that I had lying around which use to be with the other
> 3 drives in a RAID 0 in a previous machine.
>
> According to the Adaptec Storage Manager application GUI (not the
> bios), the failed drive was the drive closest to the controller, and
> was labeled as "ID 0". The other two drives were ID 4 and ID 8. This
> all confused me as I have a 4 drive cable with terminator, and the
> first connector from the controller was empty, followed by the 3
> drives and finally the terminator. I am not assigning any SCSI IDs on
> the drive via the jumpers on the back, so I had assumed they were
> assigned based on placement in the chain.


No, only for drives with 80Pin SCA connector the ID can be assigned via
the backplane to a physical slot. For cables with drives like yours the
placement of the drives is independent of their IDs, even if the ID is
assigned by the Host via SCAM protocol and not with Jumpers.

> But this can't be right
> considering the IDs that I was seeing. Either way I figured the
> Storage Manager GUI was telling me it was the first drive from the
> controller.
>
> To my surprise...THIS WAS NOT THE CASE!


It's not surprising.

> I replaced the first drive from the controller and then booted into
> the controller's bios again. My RAID 5 was dead, and my old RAID 0
> showed up...and it showed up as optimal believe it or not. That
> couldn't be possible. Regardless, I was very confused.


It is very dangerous to replace working drives of an array without
telling the controller before doing so. The controller may come to wrong
clues of the array configuration if the new disk also contains a valid
signature (because it was used with this controller before). Using
hotplug the controller don't even recognise the disk change in worst
case and use the wrong data without rebuilding.

> I decided to try to format the drive that showed in the controller's
> bios as 0,0,0,0 thinking that this was the drive that I just replaced
> and it must have some old RAID 0 information on it. The only this is
> that when I began the format, the light on a different drive went
> solid...indicating formatting of that drive. The drive that began
> formatting was the drive at the end of the cable right before the
> terminator.


If you don't know what you are doing and what drive have what ID, it
should be better to use fixed IDs configured by Jumpers.

> I still believed that this was the failing drive that I was
> formatting, and that I should be able to just replace the other good
> RAID 5 drive that I had just removed and I would be back to degraded
> status, but with a working RAID 5.
>
> Again...THIS WAS NOT THE CASE!


You are playing dangerous games ... keep your backup in reach.

> So I add the drive that I mistakenly removed, booted into the
> controller's bios, and the RAID5 was still dead indicating missing
> components.
>
> Please answer this...is it a given that if you remove a good drive
> from a degraded RAID 5 and boot up the machine, that this RAID 5 will
> be permanently broken?


Theoretically there is still enough data available to recover the array.
But if the controller is thinking that the array is dead, it may be
difficult to convince him ...

> [...]
> I want to try to replace THIS drive with the drive I tried to use as a
> replacement before. I have some concerns though. I expect that the
> old RAID 0 will show up again and I do not know what SCSI ID this
> drive will have. I believe it has an ID of 2 when I last hooked it
> up. I am worried this drive's RAID 0 info will somehow trump the RAID
> 5 info and take away one of the 2 remaining good RAID 5 discs. Is
> this possible? Is there a way to prevent this?


I consider the current configuration as dead, so you are free to do a
new setup and restore the backup. Delete all arrays in the controllers
setup so that the controller do not search for them next time. Then use
3 good disks and configure the SCSI IDs with the jumpers so that you
know them. Finally create a new RAID5 array.

> What if I disconnect all drives, connect the drive to be added and
> then run a format. Will this remove the RAID 0 information from that
> drive, and will my RAID 5 show up again when reconnecting the 2 good
> drives?


Maybe Adaptec can tell you ...

> [...]
> Fortunately,
> my previous experience forced me to test my backups for the first
> time. All went amazingly well, but it sure takes a long time and I'd
> love to avoid that happening again. After all, this is what RAID 5 is
> for isn't it...easy recovery from a failed drive.


Yes, but not "easy recovery from a configuration that was played to
death". You have to recover from your failure now not from that of the
drive.


Micha
Reply With Quote
  #3  
Old 04-19-2007, 03:56 PM
mrceolla@gmail.com
Guest
 
Posts: n/a
Default Re: Please Help...RAID5 degraded again, I ******* up drive replacement the first time.

Micha,

Thank you for taking the time to reply to my very long message.

Regarding your statement, "I consider the current configuration as
dead..."

Currently, the RAID 5 is operational and in degraded status. I did
completely botch the drive replacement once and had to rebuild the
raid and restore from backup as described in my previous message. But
right now I am hoping to retry the drive replacement on the correct
drive this time. So do you still consider the current config as dead
with the RAID still operational?

I would call Adaptec, but my support period for this product is over
and they won't even accept email support at this point.

Thanks again,
Mike

Reply With Quote
  #4  
Old 04-19-2007, 04:42 PM
Ray
Guest
 
Posts: n/a
Default Re: Please Help...RAID5 degraded again, I ******* up drive replacement the first time.

> I am a newbie to RAID recovery. I did a lot of research before my
> first attempt, but it failed so I am hoping for some further advice.


Your understanding of how your drives have their ID's assigned is flawed.
You need to read the manual on your drive, paying attention to the section
on assigning SCSI ID's and the locations (there are two locations for that
drive) where you may do so.

Your understanding of your RAID software/hardware is incomplete/flawed. The
RAID GUI presents a logical (not a physical) layout. It identifies the drive
ID's; it does not (and cannot)identify the drive position on the cable. You
need to read the manual on your RAID controller. It will answer your
questions on how to replace a failed member.

If you need help interpreting something the manuals say, I'm sure people
here can help you further. But your issues lead me to infer that you have
not read these manuals.


Reply With Quote
  #5  
Old 04-19-2007, 05:10 PM
Michael Baeuerle
Guest
 
Posts: n/a
Default Re: Please Help...RAID5 degraded again, I ******* up drive replacementthe first time.

mrceolla@gmail.com wrote:
>
> Micha,
>
> Thank you for taking the time to reply to my very long message.
>
> Regarding your statement, "I consider the current configuration as
> dead..."
>
> Currently, the RAID 5 is operational and in degraded status. I did
> completely botch the drive replacement once and had to rebuild the
> raid and restore from backup as described in my previous message. But
> right now I am hoping to retry the drive replacement on the correct
> drive this time. So do you still consider the current config as dead
> with the RAID still operational?


No, my comment was for the original one. Seems that I don't have read
your message correctly.
As I wrote, I recommend to set the IDs with Jumpers. You can also try to
do this with the current configuration. Probably the Adaptec firmware
will accept changed IDs and detect the disks for your RAID by their
signatures.

Then set the array to degraded state manually (by removing one drive
from the configuration with the setup tool). If you don't have set the
IDs with Jumpers, access the RAID now and the LEDs on the disks will
tell you which disk was disabled. Now you know what drive you have to
touch and you can replace it physically. The RAID must stay in degraded
state and you can rebuild it now.

Simply try, all you can lose is the time for restoring the backup again.
But you hopefully know how your disks are configured and how the
controller works at the end.

> I would call Adaptec, but my support period for this product is over
> and they won't even accept email support at this point.


But a manual for the controller should exist on Adaptecs Inet server.
You should find a description how to correctly disable a drive before
physically removing it there.


Micha
Reply With Quote
  #6  
Old 04-23-2007, 11:19 PM
mrceolla@gmail.com
Guest
 
Posts: n/a
Default Re: Please Help...RAID5 degraded again, I ******* up drive replacement the first time.

I posted the following a few days ago, but I guess it never made it:

Thanks again guys for the info.

Ray, the only manual I haven't read is the HD manual, and thank you
for pointing out the 2nd location for setting SCSI IDs. I was led
astray by another poster a while back who seemed to say that the
position on cable and ID were not important. But I wasn't asking him
about replacing a failed drive. I see now how my IDs are set (they
are set manually via the jumpers under the drive) and I will be able
to properly set my spare drive before inserting it.

Also, thank you Ray for explaining why the GUI shows what it does. My
understanding IS incomplete, leading to flawed assumptions that
clearly got me in trouble once. But I'm certainly learning more than
I would've had I not made that mistake the first time. Even still,
thanks for the scolding.

I have read the controller manual and scoured Adaptec's site for the
correct procedure. One of my many confusions was the two slightly
different procedures I found on Adaptec's site. Both are for my
controller.

----
How to replace a failed array drive on the 2100s, 3200s, 3400s RAID
Controller. (my 2110s is listed as "applies to" on this page, but not
in this title)

The steps for replacing a failed RAID-1 or RAID-5 drive are as
follows:

Using SMOR (Storage Manager On ROM) Configuration Utility to replace
the failed hard disk drive.

1. Remove and replace the failed hard disk drive according to the
procedure in the computer hardware documentation. The drive must be
set to the same SCSI ID as the original failed drive. The drive must
be the same size or larger. If the drive is not, then it will not
allow the RAID array to be rebuilt.

2. Boot the system selecting the Ctrl+A to enter the SMOR
Configuration Utility option.

3. On the left side highlight the Array icon.

4. Hold down the Alt key and press the 'R' key to see vertical RAID
menu.

5. Select the Rebuild option to start the process to reconstruct the
replaced drive from the remaining members (drives) of the array.
-----

And the other one...

----
Replacing failed drive - Array includes a missing member and cannot
rebuild

Ensure that the new drive inserted in the system has a unique SCSI ID
and thus does not conflict with any of the present drives or the RAID
controller (SCSI ID 7). Even if the new drive does not appear as a
member of the array, ensure that it is detected as a single
independent drive.

In order to rebuild the array, the newly added drive must be assigned
as a hot spare.

In the controller Bios (SMOR), highlight the RAID controller listing
beneath Configuration / Local and press Enter. Locate the new hard
drive listed beneath the controller and choose "Action" - "Make
Hotspare". Select 'file' then 'set system config' to initiate a
rebuild to the newly added hot spare. Exit the utility and reboot the
system. The new drive should take the place of the Missing Component
and the array should rebuild.
----

Other instructions for the Storage Manager software is included in
these articles also. But since my RAID 5 is a system drive, I wasn't
sure if I could do any disk maintainence like this while booted into
the OS. Please correct me if that is wrong. To play it safe I just
used the SMOR.

The first article seems to say the SCSI ID of the replaced drive must
be the same, and the second mentions nothing of that...just that it
needs to be a unique ID which I think everyone knows. Fortunately,
this shouldn't matter anymore now that I understand how to set the IDs
on my drives.

I do not see anything in either article regarding disabling a drive
before removing it. They simply say replace the drive according to
the computer hardware documentation. Do they mean the hard drive
documentation? Micha, are you saying I need to tell the controller
that I am going to remove a drive before I do it?

I am still worried about that old RAID 0 information on this drive
that I will be using as a replacement. Is it possible that I could
loose my RAID 5 for good if the controller detects the old RAID 0? I
suppose anything is possible, but to your knowledge, is it likely? If
so, is there any good way to prevent it?

One last question in anticipation of the replacement drive failing as
well. I am currently in the process of attempting to rebuild the
array with the currently failed drive still in the array. Last time I
did this, it worked, but only for a few days. Then after I killed the
array, formatted the failed drive, and rebuild the system, it lasted
for a few weeks. My question...I assume this drive is bad and will be
replacing it, but are there any other factors that could cause this
drive to keep failing if the drive is in fact good? I wonder this
because the rebuilds with the failed drive succeed and the drive
status returns to optimal. Is this just common behavior?

Thanks again very much to the both of you. I'm getting there.

Mike

Reply With Quote
  #7  
Old 04-24-2007, 12:40 AM
Ray
Guest
 
Posts: n/a
Default Re: Please Help...RAID5 degraded again, I ******* up drive replacement the first time.

> I was led astray by another poster a while back who seemed
> to say that the position on cable and ID were not important.


In one sense, this is a true statement. The order of the drives on the cable
is not material in any way meaningful to you. The SCSI ID is not material in
any way meaningful to you as long as there are no duplications. But the SCSI
ID is how the software/hardware knows which drive is which. If you were to
change the physical order of drives on the cable, your software would not
know that anything had happened.

> I have read the controller manual and scoured Adaptec's site for the
> correct procedure. One of my many confusions was the two slightly
> different procedures I found on Adaptec's site. Both are for my
> controller.


There is no conflict between the two procedures. One procedure uses the hot
spare approach, the other uses an "in place" approach. Both get the job
done. In the hot spare approach, the raid software substitutes the hot spare
for the failed member and rebuilds the array. (My array is ID's 2, 3 and 4.
4 has failed, but 1 is available as a hot spare. So now I'll change the
array to be ID's 1, 2 and 3, and rebuild it.) In the in place approach, the
raid software is told to rebuild the array around the drive that
"mysteriously" is now missing its data. (My array is ID's 2, 3 and 4. 4 has
failed, but now that I look again, 4 is good but has no data. So I'll
rebuild the array using the "same" drives as before.)

> I do not see anything in either article regarding disabling a drive
> before removing it. They simply say replace the drive according to
> the computer hardware documentation. Do they mean the hard drive
> documentation? Micha, are you saying I need to tell the controller
> that I am going to remove a drive before I do it?


They just mean that you follow the appropriate hardware maintenance
procedure to replace the drive. For example, you could have hot pluggable
drives, in which case you'd simply yank the bad one out and insert a good
one in its place. Or in other cases you'd need to power down the entire
system before you swapped a drive. The correct procedure is dependent on
your particular hardware.

> I am still worried about that old RAID 0 information on this drive
> that I will be using as a replacement. Is it possible that I could
> loose my RAID 5 for good if the controller detects the old RAID 0? I
> suppose anything is possible, but to your knowledge, is it likely? If
> so, is there any good way to prevent it?


If I were that paranoid, I'd install that drive by itself (leave the other
drives powered off), boot into the SMOR, and wipe out any existing array
information on the drive. You could define it as a hot spare at this point
if you wanted to take that approach, or just make sure it's seen as an
individual drive for the other approach.

> One last question in anticipation of the replacement drive failing as
> well. I am currently in the process of attempting to rebuild the
> array with the currently failed drive still in the array. Last time I
> did this, it worked, but only for a few days. Then after I killed the
> array, formatted the failed drive, and rebuild the system, it lasted
> for a few weeks. My question...I assume this drive is bad and will be
> replacing it, but are there any other factors that could cause this
> drive to keep failing if the drive is in fact good? I wonder this
> because the rebuilds with the failed drive succeed and the drive
> status returns to optimal. Is this just common behavior?


You are tempting fate. That drive has already failed twice. Rebuild the
array with your other drive. Flaky hardware is flaky hardware. If you want
to mess with that drive further, feel free, but I wouldn't use it anywhere
where I had to depend on it. If you have all four drives installed, you
could tell the controller to make the flaky one an individual disk and mess
with it until you're satisfied one way or the other. Or if you have a
non-critical system, install it and play with it there.


Reply With Quote
Sponsored Links
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes



All times are GMT. The time now is 12:23 AM.


Powered by vBulletin® Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
© 2004 - 2007 Web-S-Sense Pty. Ltd. Usenet and forums posts © their respective authors.
Ad Management by RedTyger