St. Olaf Beowulf Blog

Saturday, April 14, 2007

Temperature Sensor

A reoccurring problem with the air conditioning system that cools the helios cluster is that it occasionally just shuts off. We do not own the unit, so we cannot fix the problem ourselves. Until the problem is fixed, we have to have some way of monitoring the temperature of the room. If it gets too hot, we want to shut down all of the machines to prevent damage. While we could use the temperature sensor of the IPMI cards, there are problems associated with that that made it unfeasible.

In my quest to find a USB temperature sensor that worked under Linux, I came across a tutorial for making a Linux USB driver for the Go!Temp USB device. The device itself can be purchased at the manufacturer's website.

We purchased this thermometer and, using the driver created in that tutorial, connected it to our admin machine. With all of the nodes OFF, it is 85°F in the room at the moment. In other words, it's way too dang hot in there.

Eventually, we'll have an automated system that emails us after the temperature reaches a certain threshold, and shuts off the machines at another threshold. And, the sensor will probably be moved to the gateway machine. No point in having extra load be put on the admin machine.

I think the ldusb driver can be used (instead of the one created using that tutorial), but I personally do not know how to use it, so this works for our needs for now.

Friday, October 20, 2006

R Installation Procedure

The following procedures will lead to a working R installation (complete with rsnow and rmpi):

0) Wherever you see the word 'version' in the following instructions, replace it with the applicable version number
1) Download the latest R from here
2) Uncompress
3) type
$ ./configure --prefix=/usr/local/R-version CFLAGS=-O3 FFLAGS=-O3 && make && sudo make install
4) R should now be successfully installed in /usr/local/R-version
5) Download the latest snow and rmpi from here
6) Install snow like this:
$ sudo /usr/local/R-version/bin/R CMD INSTALL snow_version.tar.gz
7) Now install Rmpi
sudo /usr/local/R-version/bin/R CMD INSTALL rmpi_version.tar.gz --configure-args="--with-mpi=/usr/local/lam-version CC=mpicc"
8) All done, should be good to go.

Long time no update . . .

We haven't updated the blog in a few months, but I assure you, we have been quite busy. Hopefully we can post our updates here (if we remember them!).

On the new cluster, I recompiled the linux kernel and exluded the ethernet driver for the ethernet port that contains the management card. We decided to use both ethernet ports since we have the management card installed. It is possible to have the management card and linux use the same ethernet port, but it requires a software solution that would probably break quite easily.

It was necessary to exclude the ethernet driver because merely disabling that ethernet card within linux worked only for that one machine; as soon as we imaged the other machines, it was reenabled. Sometimes, as well, eth0 would show up as eth1, and eth1 would show up as eth0. Headaches all around.

With that change, the 'new' cluster is completely ready to go (outside of software being installed). And so is the old cluster, which our senior capstone course is actively working on. One group is working on things related to BLAST, and one group is working on R.

One issue at the moment that we are running into is that we cannot seem to compile WWWBLAST. It doesn't happen when we do ncbi/make/makedis.csh, as per the instructions. So we're tracking that down. There are precompiled binaries around, but they are for every platform except ppc-linux (there's ppc64 linux though . . . ).

Rmpi and whatnot is just getting up and running, so we'll be able to propogate this new software soon.

Thursday, August 03, 2006

More Lam MPI goodness

I forgot to mention how to create the wolf-lam account in my last post. Simply copy the line in /etc/passwd for the wolf account and change the home directory to /home/wolf/.lam and the user name to wolf-lam and then also edit /etc/shadow to include a password for lam-wolf (copy the wolf line and change the name to wolf-lam)

For RMPI to run, the shared library "/lib/libutil.so.1" has to be loaded and perhaps /usr/local/lam-7.1.2/lib/liblam.so.0 so I added:
LD_PRELOAD=/usr/local/lam-7.1.2/lib/liblam.so.0:/lib/libutil.so.1
into the switch-lammpi command

Tuesday, August 01, 2006

Multi MPI installation

Oh geeze what a headache, here's the important things to know:

each mpi installation is installed with the prefix /usr/local, meaning that the binarys for lam mpi are found in "/usr/local/lam-7.1.2/bin" and libraries in "/usr/local/lam-7.1.2/lib"

The default environment is altered in the system global script "/etc/bashrc" to set open mpi's binary's to be in the path and libraries in the LD_LIBRARY_PATH.

to enable lamboot to work, the PATH and LD_LIBRARY_PATH to include lam, so that it can first boot, but since our default is open-mpi this causes some problems. Our work around is to create a second account with the same uid that has a home directory that is the sub directory ~/.lam of the first account that includes a bashrc that sets LAM to be in the default environment.

Also the LAMRSH needs to be set to equal "ssh -l $USER{-wolf}" so that when lamboot issues the ssh comand it sshes in as the wolf user.

Future areas of exporlation include seeing if including lam binaries after open mpi libraries allows it to boot (then run mpi with a script that chanes the appropriate environment varibles before running the lam executable) or seeing if the ssh comand used by lamboot can be run artificaly to simulate lamboot behavior.

-William Voorhees

Friday, July 21, 2006

ATi Rage 128 on Powermac G4s

We're using some old Powermac G4s as our Gateway machines for both our old and new clusters.  The issue with these machines is that they have an ATi Rage 128 card in them, and they are notoriously finicky in Linux.  I found this site helped tremendously:
http://www.jonh.net/lppcfom-serve/cache/1043.html

Our final yaboot start?
linux video=aty128fb:vmode:20,cmode:32
Failure to do this will result in the monitor shutting off shortly after boot.

Thursday, July 13, 2006

Repairs

A Sun technician came in the other day to fix some of our x2100s with issues, and he'll be coming back 'soon' as well (hopefully today).  One of the units was fixed by replacing the power supply, while the others need some more work (hence the two visits).  This and the upcoming beowulf conference next week (which my research partner, professor, and I are putting on) are sucking up all of my time . . .

Wednesday, July 12, 2006

Useful tool for convering an IP to hex

via the command line:
$ gethostip whateveripyouwant