This is not really OC related, but I assume you have experience with the
topic.
Here's my problem in short: I'm pretty sure I have a defective SO-DIMM
DDR2 RAM module, but memtest86+ ran a complete pass w/o any error. The
module came with the Notebook which still has warranty on it, but I'd
like to have some kind of proof that this was really the source of the
problem.
And here it is again, but with a bit of added history: A few days ago,
my Windows XP SP2 system started behaving unstable. At that time, I was
using two 1GiB-Modules of DDR2-Ram, which the MB automatically used in
dual channel configuration. The modules were not exactly the same , one
of those had come with the Notebook (bought 7 months ago), the other I
had added recently. The MB used the slower of the two timings.
The problems began: Firefox would crash (which is not unusual since I
use the latest nightly builds), twice I got the same bluescreen (Page
fault in nonpaged area) and upon reboot I was notified that the registry
had had to be restored. This was all within half a day. i tried a few
things (fsck, registry cleaning etc.) but the problems persisted, so I
wiped the partition and began reinstalling the system.
While copying from CD, the installer claimed it was unable to copy some
files to the HD, but the CD was in good condition. I tried another CD,
also basically unscratched, but got read errors there as well. After
trying a few times, the files were finally copied w/o errors, but in the
next part of the installation, I got a bluescreen. Pagefault in nonpaged
area. It became clear then that I had a hardware problem.
I ran memtest86+ for a full pass, and had the HD run a SMART extended
self-test, all without indication of a problem. Linux ran stable all the
time, though I didn't use it much, so it could be coincidence. I removed
the new RAM-Module and reinstalled, and this time it went cleanly. But
while installing the drivers, odd errors kept occuring again, just a
little too frequently, and then I got the messages about restoring the
registry again after reboot, and the soundcard driver kept crashing...
and firefox too. I wiped the system again, took out the old RAM-Module
which came with the Notebook, and replaced it with the new one.
This time, the installation of the system, drivers, updates and
everything was as smooth as it could be. I haven't had a single crash or
strange error in three days now.
Sorry for the long story, but I wanted to give as much background info
as possible. Now I want to return the probably defective module, but I'd
like to have some kind of proof first that it really is at fault. If all
I can say is "Windows crashed", then they'll probably look at me as if
I'd said water was wet. Also, I'd like to know for sure so I can stop
worrying if my system really is good again. So, what do you suggest?
24-hour-memtest? Statistical crash analysis? I'm open to all your
suggestions.
Simeon Maxein wrote:
> Hi all.
>
> This is not really OC related, but I assume you have experience with the
> topic.
>
> Here's my problem in short: I'm pretty sure I have a defective SO-DIMM
> DDR2 RAM module, but memtest86+ ran a complete pass w/o any error. The
> module came with the Notebook which still has warranty on it, but I'd
> like to have some kind of proof that this was really the source of the
> problem.
>
> And here it is again, but with a bit of added history: A few days ago,
> my Windows XP SP2 system started behaving unstable. At that time, I was
> using two 1GiB-Modules of DDR2-Ram, which the MB automatically used in
> dual channel configuration. The modules were not exactly the same , one
> of those had come with the Notebook (bought 7 months ago), the other I
> had added recently. The MB used the slower of the two timings.
>
> The problems began: Firefox would crash (which is not unusual since I
> use the latest nightly builds), twice I got the same bluescreen (Page
> fault in nonpaged area) and upon reboot I was notified that the registry
> had had to be restored. This was all within half a day. i tried a few
> things (fsck, registry cleaning etc.) but the problems persisted, so I
> wiped the partition and began reinstalling the system.
>
> While copying from CD, the installer claimed it was unable to copy some
> files to the HD, but the CD was in good condition. I tried another CD,
> also basically unscratched, but got read errors there as well. After
> trying a few times, the files were finally copied w/o errors, but in the
> next part of the installation, I got a bluescreen. Pagefault in nonpaged
> area. It became clear then that I had a hardware problem.
>
> I ran memtest86+ for a full pass, and had the HD run a SMART extended
> self-test, all without indication of a problem. Linux ran stable all the
> time, though I didn't use it much, so it could be coincidence. I removed
> the new RAM-Module and reinstalled, and this time it went cleanly. But
> while installing the drivers, odd errors kept occuring again, just a
> little too frequently, and then I got the messages about restoring the
> registry again after reboot, and the soundcard driver kept crashing...
> and firefox too. I wiped the system again, took out the old RAM-Module
> which came with the Notebook, and replaced it with the new one.
>
> This time, the installation of the system, drivers, updates and
> everything was as smooth as it could be. I haven't had a single crash or
> strange error in three days now.
>
> Sorry for the long story, but I wanted to give as much background info
> as possible. Now I want to return the probably defective module, but I'd
> like to have some kind of proof first that it really is at fault. If all
> I can say is "Windows crashed", then they'll probably look at me as if
> I'd said water was wet. Also, I'd like to know for sure so I can stop
> worrying if my system really is good again. So, what do you suggest?
> 24-hour-memtest? Statistical crash analysis? I'm open to all your
> suggestions.
>
> Simeon
Memtest86+ (and a tester that Microsoft provides) are good, in the sense
that both testers work without an OS. That means, that the maximum amount
of memory gets tested.
But a more strenuous test, is Prime95 or Orthos. Both of them do a calculation
with a known answer, and they can check for calculation errors. The error could
be due to a bad CPU, a bad Northbridge (memory controller) or bad memory. Since
the test is a bit more strenuous than Memtest86+, stability problems can be
detected a bit better.
Prime95 (use Torture Test option - available for Linux or Windows)
Orthos (Basically multiple copies of Prime95 - designed for dual core)
It is possible that Prime95 will make it easier for your warranty
repair people to see the problem. A computer in good working order,
should be able to run Prime95 for hours and hours, without it detecting
an error.
'Simeon Maxein' wrote, in part:
| Hi all.
|
| This is not really OC related, but I assume you have experience with the
| topic.
|
| Here's my problem in short: I'm pretty sure I have a defective SO-DIMM
| DDR2 RAM module, but memtest86+ ran a complete pass w/o any error. The
| module came with the Notebook which still has warranty on it, but I'd
| like to have some kind of proof that this was really the source of the
| problem.
_____
I agree with the post from 'Paul'. There are many problems that could cause
the symptoms you report.. At the only moment, you have only a coincidence,
and only a megre one at that, if the problem did not start IMMEDIATELY after
installing the new memory. You did cause mechanical stress when installing
the new memory module, so that is another possibility for the association in
time, and another indication of possible motherboard mechanical problems.
Motherboard problem; perhaps it only appears when TWO modules are installed;
controller problems (CD read problem, I/O errors when copying files).
You don't really have the reponsibility of diagnosing the problem, your
warranty guarantor does. Your knowing the exact diagnosis mearly helps you
get faster service.
Since you have multiple kinds of error, mainly associated with data
transfer, I'd suspect the motherboard - a mechanical fault in the
motherboard is far more likely than an intermittent memory problem. The
failure rate of notebook computers is several precent in the first year of
operation, the failure rate of memory modules magnitudes lower.
Things you can easily do for differential diagnosis
1. try to recreate the problem with just the original memory module.
2. try to recreate the problem with just the new memory module
3. swap the positions of the memory module.
4. RMA the new memory module, then try to recreate the problem with the
replacement
Use Orthos
Orthos: http://sp2004.fre3.com/beta/beta2.htm
as Paul suggested, but be sure to pick the 'Blend - stress CPU and Memory'
option, otherwise very little of the installed memory will be used. Orthos
will stress the system, but it and programs like Prime95 are not really the
correct kind of test because they make no attempt to test all of memory, and
are mainly useful for CPU stability tests.
Remove the new memory module and get warranty service on your notebook. You
could just skip to this step, as it is the likely solution.
Phil Weldon
"Simeon Maxein" <smaxein@uni-koblenz.de> wrote in message
news:f6ekuq$i4t$1@cache.uni-koblenz.de...
| Hi all.
|
| This is not really OC related, but I assume you have experience with the
| topic.
|
| Here's my problem in short: I'm pretty sure I have a defective SO-DIMM
| DDR2 RAM module, but memtest86+ ran a complete pass w/o any error. The
| module came with the Notebook which still has warranty on it, but I'd
| like to have some kind of proof that this was really the source of the
| problem.
|
| And here it is again, but with a bit of added history: A few days ago,
| my Windows XP SP2 system started behaving unstable. At that time, I was
| using two 1GiB-Modules of DDR2-Ram, which the MB automatically used in
| dual channel configuration. The modules were not exactly the same , one
| of those had come with the Notebook (bought 7 months ago), the other I
| had added recently. The MB used the slower of the two timings.
|
| The problems began: Firefox would crash (which is not unusual since I
| use the latest nightly builds), twice I got the same bluescreen (Page
| fault in nonpaged area) and upon reboot I was notified that the registry
| had had to be restored. This was all within half a day. i tried a few
| things (fsck, registry cleaning etc.) but the problems persisted, so I
| wiped the partition and began reinstalling the system.
|
| While copying from CD, the installer claimed it was unable to copy some
| files to the HD, but the CD was in good condition. I tried another CD,
| also basically unscratched, but got read errors there as well. After
| trying a few times, the files were finally copied w/o errors, but in the
| next part of the installation, I got a bluescreen. Pagefault in nonpaged
| area. It became clear then that I had a hardware problem.
|
| I ran memtest86+ for a full pass, and had the HD run a SMART extended
| self-test, all without indication of a problem. Linux ran stable all the
| time, though I didn't use it much, so it could be coincidence. I removed
| the new RAM-Module and reinstalled, and this time it went cleanly. But
| while installing the drivers, odd errors kept occuring again, just a
| little too frequently, and then I got the messages about restoring the
| registry again after reboot, and the soundcard driver kept crashing...
| and firefox too. I wiped the system again, took out the old RAM-Module
| which came with the Notebook, and replaced it with the new one.
|
| This time, the installation of the system, drivers, updates and
| everything was as smooth as it could be. I haven't had a single crash or
| strange error in three days now.
|
| Sorry for the long story, but I wanted to give as much background info
| as possible. Now I want to return the probably defective module, but I'd
| like to have some kind of proof first that it really is at fault. If all
| I can say is "Windows crashed", then they'll probably look at me as if
| I'd said water was wet. Also, I'd like to know for sure so I can stop
| worrying if my system really is good again. So, what do you suggest?
| 24-hour-memtest? Statistical crash analysis? I'm open to all your
| suggestions.
|
| Simeon
Thanks so far, I am just now running Orthos with only the old memory
module inserted. It's been running in Blend mode for 40 minutes now,
without reporting an error. However, I had an error installing the JRE
for Firefox just now (again, could be coincidence, that's the trouble
with problems you can't reproduce).
And I've had another idea. Most errors occured when large ammounts of
data were transferred from/to the HD. The disk itself claims to be
innocent (by SMART data and self-test), but I thought recreating similar
stress should produce some result. And it worked, too: I just tried to
create QuickPar ecc-data for a large file (700mb), and it failed with a
checksum error. This should exclude dual-channel-problems, at least as a
single source of the trouble. I'll try to repeat the test a few times
once Orthos finishes one round, both on the internal HD and on my
USB-drive (which is also a 2,5" HD). Then, I'll swap the memory modules
again and repeat the tests. 10 repetitions per configuration should
already give results which are statistically significant, and I can
probably finish this today.
Simeon Maxein wrote:
> Hello again.
>
> Thanks so far, I am just now running Orthos with only the old memory
> module inserted. It's been running in Blend mode for 40 minutes now,
> without reporting an error. However, I had an error installing the JRE
> for Firefox just now (again, could be coincidence, that's the trouble
> with problems you can't reproduce).
>
> And I've had another idea. Most errors occured when large ammounts of
> data were transferred from/to the HD. The disk itself claims to be
> innocent (by SMART data and self-test), but I thought recreating
> similar stress should produce some result. And it worked, too: I just
> tried to create QuickPar ecc-data for a large file (700mb), and it
> failed with a checksum error. This should exclude
> dual-channel-problems, at least as a single source of the trouble.
> I'll try to repeat the test a few times once Orthos finishes one
> round, both on the internal HD and on my USB-drive (which is also a
> 2,5" HD). Then, I'll swap the memory modules again and repeat the
> tests. 10 repetitions per configuration should already give results
> which are statistically significant, and I can probably finish this
> today.
>
> Simeon
HD seems to be the culprit. The Abit NF7-S was notorious for corrupting
data during SATA RAID, but not PATA RAID. Timings on the HD controller had
to be increased to 1ms to prevent corruption. Man that was annoying. Maybe
your mobo has the same problem.
Phil schrieb:
> HD seems to be the culprit. The Abit NF7-S was notorious for corrupting
> data during SATA RAID, but not PATA RAID. Timings on the HD controller had
> to be increased to 1ms to prevent corruption. Man that was annoying. Maybe
> your mobo has the same problem.
I've just excluded that. My external USB-drive gave the same problem. I
was able to recreate this error ten times in a row, five times on my
internal and five times on my external HD. The verification failed at
different points through the test each time.
However, after that the problem didn't show up anymore. I'm still
testing, and have a new suspect (need more testing before I tell you
something misleading), but the HD is quite safely marked OK now.
I've meanwhile excluded my new potential culprit (CPU voltage), but was
unable to recreate the problem after the QuickPar test stopped failing.
I'm running on the new memory module again (which is stable, assuming a
single point of failure), and should the problem occur again, I know
it's either the mainboard or the CPU. In fact, I agree with Phil now
that the motherboard is most likely the defective part (Southbridge?),
because several devices were making trouble.
I think QuickPar already got wrong data from the HD, because Orthos,
running at the same time I did some failing QuickPar tests, never showed
a problem at all. Also, my WLAN card failed once when I booted up.
I've already called Toshiba today, and when the problem next occurs, I
will send the device in for warranty service. If it doesn't happen
anymore, I'll send it in anyway after my exams, with the old memory
installed. Something IS wrong with it, after all.
"Simeon Maxein" <smaxein@uni-koblenz.de> wrote in message
news:f6jjcl$k43$1@cache.uni-koblenz.de...
> And hello again.
>
> I've meanwhile excluded my new potential culprit (CPU voltage), but was
> unable to recreate the problem after the QuickPar test stopped failing.
> I'm running on the new memory module again (which is stable, assuming a
> single point of failure), and should the problem occur again, I know
> it's either the mainboard or the CPU. In fact, I agree with Phil now
> that the motherboard is most likely the defective part (Southbridge?),
> because several devices were making trouble.
>
> I think QuickPar already got wrong data from the HD, because Orthos,
> running at the same time I did some failing QuickPar tests, never showed
> a problem at all. Also, my WLAN card failed once when I booted up.
>
> I've already called Toshiba today, and when the problem next occurs, I
> will send the device in for warranty service. If it doesn't happen
> anymore, I'll send it in anyway after my exams, with the old memory
> installed. Something IS wrong with it, after all.
>
> Simeon