Bad PC Tech

I've never been much of a hardware geek. I prefer the software side of computers. For me, what makes a computer fun is programming it to do things. Fun things, useful things, things that make me money, it doesn't really matter. I like to hack software. This did not stop me from building my current web server from parts back in 1999. Yes, it is an old box by modern standards. But it is still going strong with just a CPU fan replacement so far.

The experience of building my own machine out of parts from various vendors in search of the cheapest supplier of the various pieces taught me a valuable lesson. I don't ever want to build a PC from scratch again. So when the time came that I wanted a more powerful system, I decided to shop around online for a bare bones system builder. The vendor I found that seemed to have the right stuff was Monarch Computer Systems.

ASUS A7M266-D Mother Board Manual Page

Monarch built me a fly tower system based on the ASUS A7M266-D Mother Board enshrined in a brushed aluminum Cooler Master case. Plugged into the CPU sockets are two Athlon MP 1800+ CPUs which were top of the line when this machine was assembled. To give the CPUs some breathing room, I also specified 1GB of memory in the form of two Corsair 512MB ECC Registered 184-pin SDRAM DIMMs (PC-2100, 266 DDR) and an 80GB Maxtor drive. Finishing off the package was a top end NVidia GeForce 4 AGP graphics card and Plextor CD burner. To make the system go, I installed Debian Linux. This was the machine that calculated the frames for my first fractal movie. I still have the invoice for this system. It's dated April of 2002. Where does time go?

Bad Things Happen

It was around March of 2003 when I purchased my Apple 12" G4 PowerBook. While not nearly as powerful, I find the laptop to be a much more convenient machine to work on. The powerful Athlon system was turned into a server. The Athlon system was playing the roll of a server when it was used to compute the frames for Fractal Zoom. During a thunderstorm, I unplugged it so that there would be no chance of it getting fried. The Athlon remained unplugged for a couple weeks.

I finally decided that I would like to use the Athlon again. SBCL had gotten thread support and I thought it would be cool to try it out on a dual CPU system. So I powered up the box. Since it had been a while since I had done any software updates, I ran Debian's apt-get update and apt-get dist-upgrade. Things started to go wrong. At first I wasn't too alarmed. I was running the unstable branch of Debian and sometimes it takes a few goes to get everything right. But with each go, things got worse. The machine was pitching some kind of fit. I had to power down a few times to keep running the dist-upgrade (I can be stubborn). It wasn't long before the file system was complaining that things were just not right. I was fscked.

The problem was, fsck was not able to fix things. In fact, things were getting much, much worse. I was losing inodes. The system was getting less and less stable. Then, after a final power cycle, all I got out of the machine was a constant beeping. Beep beep beep beep beep beep beep. No video. Just the damn beeping. I gave up on vega, the host name I had given to the box in honor of Carl Sagan's Contact.

Time To Wake Up Vega

If you've read the story behind my second fractal movie, you know that I used an under-powered web server to calculate the frames. This took a long time. Too long really. Now I have more ambitious plans. I need vega online again. The problem is, when plugged in it just gives me the beep beep beep beep. It was time to get serious about diagnosing and fixing the problem.

The first thing I did was a Google search for the mother board manual. Well, I did it in a sort of round about fashion. My search terms included the board, beep, and codes. In due course I found a PDF copy of the ASUS A7M266-D User's Manual. The beep code I was getting was described as indicating that the memory was not being detected. That should have been my answer right there.

Well, I kind of lied about the Google search being the first thing I did. The first thing I really did was remove all the boards from the computer so I could try and get the machine to boot. I sometimes have to do things the hard way. With all the boards removed, I was able to see the board model name and then do the Google search because I was having no luck.

Once I found out that the beep code meant that the memory wasn't being detected, I removed and reinserted the memory. Maybe it had come loose when the computer was moved or something. Yeah, right. Well, that seemed to do the trick. The computer booted right up. Well, not quite. The disk was still messed up and messed up badly. At this point, I was resigned to loosing all my data on the drive. That includes the C program that was used for the first fractal movie. It includes a lot of data I would like to have kept.

I downloaded and burned an ISO of the Debian Sarge net install CD. Then I booted up vega and tried to install the system. I wasn't out of the woods. I was unable to make a file system on the disk. The system kept getting stuck. I concluded that I needed to replace the drive. That must have been the problem all along. Again, I gave up for the day. Just a day this time. I also decided I wanted to have a closer look at the disk so I got the Knoppix Live CD.

Beep Beep Beep Beep Beep Beep

When I powered vega up again, you'll never guess what happened. Oh yeah. The shrill sound of me screaming over the beep code. What was happening? Didn't I fix this? Was I cursed? I had no clue, no, and yes. Right! Back to ripping the guts out to see what's what. In the grand tradition of Cargo Cult Science, I figured all I had to do was repeat the sequence of seating the memory again. This time it didn't work. I was puzzled. Why would vega be beeping at me like this? It was time to try something drastic. It was time to be methodical. The beep code meant the memory wasn't being detected. So that was the place to look.

At this point I would like to take an aside and mention a rather minor annoyance that by this time was really aggravating me. Take a look at the picture of the mother board. Component number 20 is the AGP slot for the video board. The two memory modules go into the left two slots of component number six. Well, the video board is actually quite large. It is so large in fact that it goes right under the left two slots for the memory. The plastic retaining tabs collide with the video board when they are opened. Someone needs to be smacked in the head for that.

Back to the memory. By testing all permutations of the two memory modules, I discovered that if one of them is installed I get the beep code. So half my memory doesn't work. With the good module installed, I put things back together again and booted with the Knoppix CD. And guess what happened?

Vega booted!

I proceeded to use the run level two mode of Knoppix. Thanks to such utilities as mke2fs -c -c -j and memtest I was able to confirm that the disk was OK after all. It looks like failing memory poisoned the disk cache and corrupted the file system. So much for ECC.

So what is the state of vega now?

Vega X11 Applications on Mac

I've got Debian Sarge installed with the appropriate SMP kernel. Vega is up and running on only 512MB of RAM until I can get a warranty replacement either from Monarch or Corsair. So far I have not heard back from Monarch about my RMA inquiry. More to come as the situation develops.

In the meantime, I've added a new Seagate 120GB drive. I did this on September 22, 2005. It was a fun install. All that nice tying up of the power cables that Monarch's assembly guy did, whoever he was, meant that I had to cut some of the cable ties. They only zip one way. Also, for a power plug to reach the new drive, I rearranged some of the power cabling by unplugging things and plugging them into other things. I really don't know anything about power supplies. I was just hoping that all the color codes on the wire harnesses and such were consistent and that it didn't matter what was plugged into what so long as the plugs were the same shape. Anyway, the system didn't explode when I powered it up with the Seagate installed.

I was also lucky that I didn't have to fiddle with any jumper blocks. It looks like the Seagate was already configured to be a slave drive on the primary controller. All I had to do was enable it in the BIOS. But the drive did not ship with any mounting hardware. I had to raid an old PC to get four screws to hold the drive in place. One good thing, after all these years, screws haven't changed. The PC I stole the four screws from was an old Gateway Pentium 66 system from the mid 1990's. It has one of those CPUs with the FDIV bug.

Once the drive was installed and recognized by the BIOS, I was on slightly more familiar territory. I had already gone through the process of partitioning and formating the Maxtor just a few days before. This time though, I decided to put a little trust in Seagate and use just one -c for the mke2fs command. I didn't want the machine to spend 24 hours or more doing read-write tests of all the blocks. Once the file system was created, it was a simple matter of editing /etc/fstab to assign the Seagate to /opt.

Now I'm just waiting on a resolution for the memory situation.

Monarch Responds

From support-email@monarchcomputer.com Thu Sep 22 15:55:10 2005
Received: from psmtp.com [64.18.0.184] by email.steuber.com
  (SMTPD32-8.05) id A33E25500A0; Thu, 22 Sep 2005 11:52:30 -0400
Received: from source ([63.243.20.29]) by exprod5mx29.postini.com ([64.18.4.10]) with SMTP;
	Thu, 22 Sep 2005 08:52:19 PDT
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: Request for tech support and returns
X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0
Date: Thu, 22 Sep 2005 11:53:30 -0400
Message-ID: <A737050EC3E0344488602D65A02D9CCB325F78@exserver.atl.monarchcomputer.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Request for tech support and returns
Thread-Index: AcW8/yuBdzeOvxhBQ2u51LY98BKZGwB5SvpQACpagLA=
From: <support-email@monarchcomputer.com>
Sender: <charles@monarchcomputer.com>
To: <david@david-steuber.com>
X-pstn-levels:     (S:99.90000/99.90000 R:95.9108 P:95.9108 M:97.0232 C:98.7678 )
X-pstn-settings: 5 (2.0000:2.0000) s gt3 gt2 gt1 r p m c 
X-pstn-addresses: from <support-email@monarchcomputer.com> [202/11] 
X-RCPT-TO: <david@david-steuber.com>
Status: U
X-UIDL: 388432803

David,

Corsair will take this memory back directly and I would recommend going
this route.  This memory has been discontinued for sale for some time so
we don't have any locally.  This means that we would have to receive
your module, send it to Corsair in California wait for them to replace
it, receive it back into Monarch and then ship it back to you.  It could
take a long time to complete this transaction.  It would be much faster
to return the memory direct to Corsair.  You can get service on this
memory here:

http://www.corsair.com/corsair/rma_request.html


Thank you,

Charles H.
Monarch Computer Systems

I have jumped through Corsair's hoops and filed an RMA request with them on Thursday September 22, 2005. I have this nagging fear that I will need to send back both memory modules and wait on a return. This would mean that vega will be down while the memory is sorted out. I asked Corsair to send me replacement modules first so that I could simply return the modules I have in their packaging as I don't have any anti-static bags or other proper shipping materials for memory modules. I suppose I could wrap them in aluminum foil.