HTFC Forums

H.T.F.C.

How To Fix Computers





Go Back   HTFC Forums > Hardware Newsgroups > Storage > SCSI

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
Thread Tools Display Modes
  #1  
Old 10-27-2007, 02:48 PM
Yan Seiner
 
Posts: n/a
Default Brand new machine mystery lockup

I just built a server that seems to be posessed, or at least flaky.

It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 4600+
CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The SCSI
adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The power
supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W.

Once in a while (like every 2-5 days) the machine locks up:

Screen goes black, all fans go to full-on, and neither the power nor the
reset button will work. It takes a flip of the power switch on the PS to
restart it.

Normally I would say that it's the PS, but sometimes - only sometimes,
though - the system won't boot because mdadm can't find any of the md
devices to boot. At this point the kernel's already booted off the SCSI
drives, so I know they're spinning; just mdadm can't find them. This
typically happens on a soft-reboot; again, I have to fully power cycle
the machine to get it to boot.

Of course there are no errors anywhere at any time in any log. The
machine just stops.

Google says people have had trouble with that SCSI adapter under windows
but that seems to be a driver problem and it's reported to work fine with
linux.

So, I have 3 possible culprits:

Power Supply
Mobo
SCSI adapter

Any place I can look? Any diagnostics I can do? I have about 2 weeks
left of Newegg's 30 day return timeframe, so I can do some testing....

Reply With Quote
Sponsored Links
Fix your Windows Problems - FAST.
FREE Safe Scan Registry Check. Locate & Fix Errors in Minutes!
  #2  
Old 10-27-2007, 03:17 PM
Robert M. Riches Jr.
 
Posts: n/a
Default Re: Brand new machine mystery lockup

On 2007-10-27, Yan Seiner <yan@NsOeSiPnAeMr.com> wrote:
> I just built a server that seems to be posessed, or at least flaky.
>
> It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 4600+
> CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The SCSI
> adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The power
> supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W.
>
> Once in a while (like every 2-5 days) the machine locks up:
>
> Screen goes black, all fans go to full-on, and neither the power nor the
> reset button will work. It takes a flip of the power switch on the PS to
> restart it.
>
> Normally I would say that it's the PS, but sometimes - only sometimes,
> though - the system won't boot because mdadm can't find any of the md
> devices to boot. At this point the kernel's already booted off the SCSI
> drives, so I know they're spinning; just mdadm can't find them. This
> typically happens on a soft-reboot; again, I have to fully power cycle
> the machine to get it to boot.
>
> Of course there are no errors anywhere at any time in any log. The
> machine just stops.
>
> Google says people have had trouble with that SCSI adapter under windows
> but that seems to be a driver problem and it's reported to work fine with
> linux.
>
> So, I have 3 possible culprits:
>
> Power Supply
> Mobo
> SCSI adapter
>
> Any place I can look? Any diagnostics I can do? I have about 2 weeks
> left of Newegg's 30 day return timeframe, so I can do some testing....


Running memtest86 for several hours may be useful.

HTH

--
Robert Riches
spamtrap42@verizon.net
(Yes, that is one of my email addresses.)
Reply With Quote
  #3  
Old 10-27-2007, 09:00 PM
Tarkin
 
Posts: n/a
Default Re: Brand new machine mystery lockup

On Oct 27, 9:48 am, Yan Seiner <y...@NsOeSiPnAeMr.com> wrote:
> I just built a server that seems to be posessed, or at least flaky.
>
> It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 4600+
> CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The SCSI
> adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The power
> supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W.
>
> Once in a while (like every 2-5 days) the machine locks up:
>
> Screen goes black, all fans go to full-on, and neither the power nor the
> reset button will work. It takes a flip of the power switch on the PS to
> restart it.
>
> Normally I would say that it's the PS, but sometimes - only sometimes,
> though - the system won't boot because mdadm can't find any of the md
> devices to boot. At this point the kernel's already booted off the SCSI
> drives, so I know they're spinning; just mdadm can't find them. This
> typically happens on a soft-reboot; again, I have to fully power cycle
> the machine to get it to boot.
>
> Of course there are no errors anywhere at any time in any log. The
> machine just stops.
>
> Google says people have had trouble with that SCSI adapter under windows
> but that seems to be a driver problem and it's reported to work fine with
> linux.
>
> So, I have 3 possible culprits:
>
> Power Supply
> Mobo
> SCSI adapter
>
> Any place I can look? Any diagnostics I can do? I have about 2 weeks
> left of Newegg's 30 day return timeframe, so I can do some testing....


Have you updated the mobo'a firmware? At least a couple of years
ago, (some) mobo's shipped w/ outdated BIOS - it was up to the
end-user to get updates from the OEM.

Other things to do:
-Check dumb things. Completely disassemble and subsequently
reassemble the entire system, looking for HW 'bugs' along the
way; is the CPU heatsink tight? Is there enough thermal compund
on the CPU-heatsink interface? Are boards and memory modules
inserted firmly? Are cable connectors inserted firmly? The
principle here is to rule out the obvious, dumb things that bite
people who don't check for them.

-Did you calculate total system power load? Is your power supply rated
high enough for peak load? Do you have another, higher-power,
compatible
unit to swap it with?

-Read the manual on the BIOS settings, or last go through all the
items in the menu. Do they make sense? Did you tweak any voltage,
speed, or memory access settings? If you have the inclination, return
them all to 'default' or 'normal' settings, and apply each tweak
one by one. Any memory tweaks sould be followed with a decent round
of memtest86.

(Actually, BIOS update step should be here, then repeat the step
above)

-Software: I am unfamiliar with mdadm (is that a Minix or *BSD
boot manager?), but if all of the above checks out okay, that's
the next place to look for bugs. Is the software 64-bit compatible?
Are there documentation notes/extra settings/etc for 64 bit systems?
Have you run some searches on appropriate user lists/web sites/docs?

I hope you don't the impression I'm talking down to you- I've
learned the hard way, several times, to check obvious, 'dumb'
things first. And there is a certain amount of 'magic' to
completely disassembling and reassembling a system. But the
steps I described, taken in order, are exactly what I do when
hunting subtle bugs.

Good hunting and HTH,
Tarkin

Reply With Quote
  #4  
Old 10-29-2007, 01:57 AM
Yan Seiner
 
Posts: n/a
Default Re: Brand new machine mystery lockup

On Sat, 27 Oct 2007 20:00:04 +0000, Tarkin wrote:

> On Oct 27, 9:48 am, Yan Seiner <y...@NsOeSiPnAeMr.com> wrote:
>> I just built a server that seems to be posessed, or at least flaky.

<snip>
>> Any place I can look? Any diagnostics I can do? I have about 2 weeks
>> left of Newegg's 30 day return timeframe, so I can do some testing....

>
> Have you updated the mobo'a firmware? At least a couple of years ago,
> (some) mobo's shipped w/ outdated BIOS - it was up to the end-user to
> get updates from the OEM.


Good idea, I think I'll do that anyway. Read on.

>
> Other things to do:
> -Check dumb things. Completely disassemble and subsequently
> reassemble the entire system, looking for HW 'bugs' along the way; is
> the CPU heatsink tight? Is there enough thermal compund on the
> CPU-heatsink interface? Are boards and memory modules inserted firmly?
> Are cable connectors inserted firmly? The principle here is to rule out
> the obvious, dumb things that bite people who don't check for them.
>
> -Did you calculate total system power load?


Yes.

> Is your power supply rated
> high enough for peak load?


Yes. It should provide power to all 8 drives in the box, ATM it only has
2.

> Do you have another, higher-power, compatible
> unit to swap it with?


No.

>
> -Read the manual on the BIOS settings, or last go through all the
> items in the menu. Do they make sense? Did you tweak any voltage, speed,
> or memory access settings? If you have the inclination, return them all
> to 'default' or 'normal' settings, and apply each tweak one by one. Any
> memory tweaks sould be followed with a decent round of memtest86.
>
> (Actually, BIOS update step should be here, then repeat the step above)
>
> -Software: I am unfamiliar with mdadm (is that a Minix or *BSD
> boot manager?),


It's linux's softraid manager.

> but if all of the above checks out okay, that's the next
> place to look for bugs. Is the software 64-bit compatible? Are there
> documentation notes/extra settings/etc for 64 bit systems? Have you run
> some searches on appropriate user lists/web sites/docs?


It's pretty bulletproof - I've not had any problems with mdadm in years
of using it.

>
> I hope you don't the impression I'm talking down to you- I've learned
> the hard way, several times, to check obvious, 'dumb' things first. And
> there is a certain amount of 'magic' to completely disassembling and
> reassembling a system. But the steps I described, taken in order, are
> exactly what I do when hunting subtle bugs.


No, not offended. Exactly the procedure I followed - and discovered that
the culprit is most likely a bad SCSI cable. I have /tmp on a raid0
partition striped across 2 drives, and the scsi drives would just
disappear, bringing the whole systme down.

I reseated the cable and found the drives wouldn't boot at all. So I've
slowed the whole SCSI bus down to a crawl and I have my system back. New
cable on order.

Fingers crossed.
Reply With Quote
  #5  
Old 10-29-2007, 02:14 AM
thunder
 
Posts: n/a
Default Re: Brand new machine mystery lockup

On Mon, 29 Oct 2007 00:57:58 +0000, Yan Seiner wrote:


>> Have you updated the mobo'a firmware? At least a couple of years ago,
>> (some) mobo's shipped w/ outdated BIOS - it was up to the end-user to
>> get updates from the OEM.

>
> Good idea, I think I'll do that anyway. Read on.


You might want to wait until you clear up the flakiness. If the system hangs while you are
updating the bios . . .
Reply With Quote
  #6  
Old 10-29-2007, 02:36 AM
General Schvantzkopf
 
Posts: n/a
Default Re: Brand new machine mystery lockup

On Sat, 27 Oct 2007 13:48:48 +0000, Yan Seiner wrote:

> I just built a server that seems to be posessed, or at least flaky.
>
> It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2
> 4600+ CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The
> SCSI adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The
> power supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W.
>
> Once in a while (like every 2-5 days) the machine locks up:
>
> Screen goes black, all fans go to full-on, and neither the power nor the
> reset button will work. It takes a flip of the power switch on the PS
> to restart it.
>
> Normally I would say that it's the PS, but sometimes - only sometimes,
> though - the system won't boot because mdadm can't find any of the md
> devices to boot. At this point the kernel's already booted off the SCSI
> drives, so I know they're spinning; just mdadm can't find them. This
> typically happens on a soft-reboot; again, I have to fully power cycle
> the machine to get it to boot.
>
> Of course there are no errors anywhere at any time in any log. The
> machine just stops.
>
> Google says people have had trouble with that SCSI adapter under windows
> but that seems to be a driver problem and it's reported to work fine
> with linux.
>
> So, I have 3 possible culprits:
>
> Power Supply
> Mobo
> SCSI adapter
>
> Any place I can look? Any diagnostics I can do? I have about 2 weeks
> left of Newegg's 30 day return timeframe, so I can do some testing....


I wrote a system stress test that you can run,

http://www.polybus.com/sys_basher_web/

Sys_basher puts all of the subsystems except graphics under maximum load.
It's multithreaded so it can keep all of your cores at maximum load. It
also does a good job of stressing memory and disk subsystems. The log
file records the temperatures after each test and it writes the log to
disk between tests so that you'll have a record if the system crashes.

Reply With Quote
  #7  
Old 10-29-2007, 11:34 AM
The Natural Philosopher
 
Posts: n/a
Default Re: Brand new machine mystery lockup

General Schvantzkopf wrote:
> On Sat, 27 Oct 2007 13:48:48 +0000, Yan Seiner wrote:
>
>> I just built a server that seems to be posessed, or at least flaky.
>>
>> It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2
>> 4600+ CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The
>> SCSI adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The
>> power supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W.
>>
>> Once in a while (like every 2-5 days) the machine locks up:
>>
>> Screen goes black, all fans go to full-on, and neither the power nor the
>> reset button will work. It takes a flip of the power switch on the PS
>> to restart it.
>>
>> Normally I would say that it's the PS, but sometimes - only sometimes,
>> though - the system won't boot because mdadm can't find any of the md
>> devices to boot. At this point the kernel's already booted off the SCSI
>> drives, so I know they're spinning; just mdadm can't find them. This
>> typically happens on a soft-reboot; again, I have to fully power cycle
>> the machine to get it to boot.
>>
>> Of course there are no errors anywhere at any time in any log. The
>> machine just stops.
>>
>> Google says people have had trouble with that SCSI adapter under windows
>> but that seems to be a driver problem and it's reported to work fine
>> with linux.
>>
>> So, I have 3 possible culprits:
>>
>> Power Supply
>> Mobo
>> SCSI adapter
>>
>> Any place I can look? Any diagnostics I can do? I have about 2 weeks
>> left of Newegg's 30 day return timeframe, so I can do some testing....

>
> I wrote a system stress test that you can run,
>
> http://www.polybus.com/sys_basher_web/
>
> Sys_basher puts all of the subsystems except graphics under maximum load.
> It's multithreaded so it can keep all of your cores at maximum load. It
> also does a good job of stressing memory and disk subsystems. The log
> file records the temperatures after each test and it writes the log to
> disk between tests so that you'll have a record if the system crashes.
>


I would say that some piece of hardware is defunct, or just possibly you
have a driver that has a bug....I wrote code that exhibited this kind of
behaviour..we left a hardware analyser on it..if a timer interrupt
happened in one, and one only byte of the BIOS code, it went onto a
'deadly embrace'.


Try seeing if any updated drivers or firmware exist for the SCSI adapter.
Reply With Quote
  #8  
Old 10-29-2007, 05:36 PM
Tarkin
 
Posts: n/a
Default Re: Brand new machine mystery lockup

On Oct 28, 7:57 pm, Yan Seiner <y...@NsOeSiPnAeMr.com> wrote:
> On Sat, 27 Oct 2007 20:00:04 +0000, Tarkin wrote:
> > On Oct 27, 9:48 am, Yan Seiner <y...@NsOeSiPnAeMr.com> wrote:
> >> I just built a server that seems to be posessed, or at least flaky.

> <snip>
> >> Any place I can look? Any diagnostics I can do? I have about 2 weeks
> >> left of Newegg's 30 day return timeframe, so I can do some testing....

>
> > Have you updated the mobo'a firmware? At least a couple of years ago,
> > (some) mobo's shipped w/ outdated BIOS - it was up to the end-user to
> > get updates from the OEM.

>
> Good idea, I think I'll do that anyway. Read on.
>
>
>
> > Other things to do:
> > -Check dumb things. Completely disassemble and subsequently
> > reassemble the entire system, looking for HW 'bugs' along the way; is
> > the CPU heatsink tight? Is there enough thermal compund on the
> > CPU-heatsink interface? Are boards and memory modules inserted firmly?
> > Are cable connectors inserted firmly? The principle here is to rule out
> > the obvious, dumb things that bite people who don't check for them.

>
> > -Did you calculate total system power load?

>
> Yes.
>
> > Is your power supply rated
> > high enough for peak load?

>
> Yes. It should provide power to all 8 drives in the box, ATM it only has
> 2.
>
> > Do you have another, higher-power, compatible
> > unit to swap it with?

>
> No.
>
>
>
> > -Read the manual on the BIOS settings, or last go through all the
> > items in the menu. Do they make sense? Did you tweak any voltage, speed,
> > or memory access settings? If you have the inclination, return them all
> > to 'default' or 'normal' settings, and apply each tweak one by one. Any
> > memory tweaks sould be followed with a decent round of memtest86.

>
> > (Actually, BIOS update step should be here, then repeat the step above)

>
> > -Software: I am unfamiliar with mdadm (is that a Minix or *BSD
> > boot manager?),

>
> It's linux's softraid manager.
>
> > but if all of the above checks out okay, that's the next
> > place to look for bugs. Is the software 64-bit compatible? Are there
> > documentation notes/extra settings/etc for 64 bit systems? Have you run
> > some searches on appropriate user lists/web sites/docs?

>
> It's pretty bulletproof - I've not had any problems with mdadm in years
> of using it.
>
>
>
> > I hope you don't the impression I'm talking down to you- I've learned
> > the hard way, several times, to check obvious, 'dumb' things first. And
> > there is a certain amount of 'magic' to completely disassembling and
> > reassembling a system. But the steps I described, taken in order, are
> > exactly what I do when hunting subtle bugs.

>
> No, not offended. Exactly the procedure I followed - and discovered that
> the culprit is most likely a bad SCSI cable. I have /tmp on a raid0
> partition striped across 2 drives, and the scsi drives would just
> disappear, bringing the whole systme down.
>
> I reseated the cable and found the drives wouldn't boot at all. So I've
> slowed the whole SCSI bus down to a crawl and I have my system back. New
> cable on order.
>
> Fingers crossed.


Right on. One the on the one hand, I'm sorry you're having systems
problems;
on the other, I can appreciate the irony that in this new wunder era
of 64 bit
megamachines, silly things like a bad cable can bring them down
like a ton of bricks ;^)

Good luck and TTFN,
Tarkin

Reply With Quote
  #9  
Old 10-29-2007, 06:17 PM
CptDondo
 
Posts: n/a
Default Re: Brand new machine mystery lockup

Tarkin wrote:
> On Oct 28, 7:57 pm, Yan Seiner <y...@NsOeSiPnAeMr.com> wrote:


>> I reseated the cable and found the drives wouldn't boot at all. So I've
>> slowed the whole SCSI bus down to a crawl and I have my system back. New
>> cable on order.
>>
>> Fingers crossed.

>
> Right on. One the on the one hand, I'm sorry you're having systems
> problems;
> on the other, I can appreciate the irony that in this new wunder era
> of 64 bit
> megamachines, silly things like a bad cable can bring them down
> like a ton of bricks ;^)


Oh, my bad. The cable is one that I had laying around; I chose not to
buy a new one with the system. So I have no one to blame but myself for
being a scrooge and trying to save $50.

--Yan
Reply With Quote
  #10  
Old 10-29-2007, 06:37 PM
Cydrome Leader
 
Posts: n/a
Default Re: Brand new machine mystery lockup

In comp.periphs.scsi Yan Seiner <yan@nsoesipnaemr.com> wrote:
> I just built a server that seems to be posessed, or at least flaky.
>
> It's built on an Asus M2N-SLI DELUXE mobo, with an AMD Athlon 64 X2 4600+
> CPU, 2 gig RAM, and an Adaptec ASC-29320ALP SCSI adapter. The SCSI
> adapter has 2 Fujitsu 36 GB 15K drives in a software RAID-1. The power
> supply is a SILVERSTONE ST50EF-SC ATX12V / EPS12V 500W.
>
> Once in a while (like every 2-5 days) the machine locks up:
>
> Screen goes black, all fans go to full-on, and neither the power nor the
> reset button will work. It takes a flip of the power switch on the PS to
> restart it.
>
> Normally I would say that it's the PS, but sometimes - only sometimes,
> though - the system won't boot because mdadm can't find any of the md
> devices to boot. At this point the kernel's already booted off the SCSI
> drives, so I know they're spinning; just mdadm can't find them. This
> typically happens on a soft-reboot; again, I have to fully power cycle
> the machine to get it to boot.
>
> Of course there are no errors anywhere at any time in any log. The
> machine just stops.
>
> Google says people have had trouble with that SCSI adapter under windows
> but that seems to be a driver problem and it's reported to work fine with
> linux.
>
> So, I have 3 possible culprits:
>
> Power Supply
> Mobo
> SCSI adapter
>
> Any place I can look? Any diagnostics I can do? I have about 2 weeks
> left of Newegg's 30 day return timeframe, so I can do some testing....


newegg is prtty good about returns. just send it back, and try again.


Reply With Quote
Sponsored Links
Reply


Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
Internet Lockup ljones135 Windows Vista 5 08-26-2007 05:16 AM
Lockup at start kah Windows XP Basics 1 07-14-2007 10:18 PM
mirror lockup - 30d vs D80 THO Digital Photo 4 06-01-2007 05:26 PM
Lockup John Smith Windows XP Basics 4 05-31-2007 04:59 PM
Networking an xp machine with a windows 98 second edition machine probs david17 XP Networking 2 04-29-2004 02:51 AM


All times are GMT. The time now is 07:48 PM.


Powered by vBulletin® Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.1.0
© 2004 - 2007 Web-S-Sense Pty. Ltd. Usenet and forums posts © their respective authors.
Ad Management by RedTyger