Status
- www.everlab.org
- Running: evgsicsstg.sics.se
TODO
- Make openafs rpms part of the standard install package
- Fix cache setup for openafs (/etc/openafs/cacheinfo and /etc/init.d/afs)
- Create code to automatically mount /afs in each slice via mount --bind
Steps to Make EverLab work
- TAU
- Done - Trial installation at TAU (B10)
- Done - Get TAU servers moved to outside of firewall
- Test AFS as a distributed filesystem under PlanetLab
- Done - Build and install on FC4_4.
- Done - Use mount --bind to put /afs on each slice
- Done - Install Server on SICS storage node (18/12/05, node ready)
- Done - test client on planet9
- Add to root distribution
- Implement PXE Boot for Clusters
- Done - Build initrd
- Done - Trial installation on evgsics13
- FIXED - Cannot write to /tmp (its r/o because its from the initrd. ) Need to add "rw" the end of the INITRD field of the PXE Boot config file
- Tested on local machine (complex-1). It works correctly if you use dhcp to find the network.
- Done - Document process for use everywhere
- Done - Implement CPU quotas
- Done - Send mail to devel@planet-lab.org,
Use the variable general_prop_share, which can also be called nm_cpu_share. "It lets you assign a weight to each slice, which is then used to assign CPU cycles in a weighted fair queue scheduler i.e., slices get CPU in proportion to their weight divided by the total active weights."
- Done - Move Server to CS machine room
- We will use complex-1 (2CPU Athalon 1.7Ghz) as everlab. The current everlab will be used as a backup
- Done - Clean up complex-1
- Done - Move to Internet and reload as FC4 everlab
- Done - Debug the Ping of Death
- The file /usr/local/planetlab/bin/pl-reboot which is called by plc_www/db/control/index.php is missing.
- Checkout plc_scripts. It needs lots of tweaking for location and for passwords. Must write /usr/local/planetlab/bin/pl_poddoit by hand as a wrapper for rebooot.py.
- Done - Implement Ganglia on EverLab nodes (in root partition?)
- Put the ganglia-gmond-3.0.2-1.i386.rpm in /var/www/html/install-rpms/planetlab-v3
- Run yum-arch in that directory
- Edit the yumgroups.xml file in that directory and add a line that says
'\ganglia-gmond\'
within the PlanetLab group.
- Create a gmond.conf file and put it /var/www/html/PlanetLabConf. Make each gmond deaf and have it send a udp result to the ganglia server.
- Add the gmond.conf file to planetlab configuration files via the web interface. Have the update command be '/etc/rc.d/init.d/gmond restart'
- Install gmond on your server and have it listen to the port listed in the client configuration
- Check by telenting to the server at port 8649
- How to remotely power-down/up the blades
- Test Condor as a workload management system in a PlanetLab Slice
- PlanetLab Central is working on this for a March deliverable.
- Install master on www.everlab.org -- DONE
- Create initscript -- DONE
- Install el_condor_execute slice on each node. Leave one node on each
cluster as a submission node -- DONE
-
- Install el_condor, huji_condor, tau_condor, sics_condor and ucl_condor as
submission slices on each cluster -- DONE
- The mailing list options do not work (/planetlab/plc/scripts/*)
Notes
To add a new site, use the following command:
plcmdline.py -u -r admin -p -c 'AdmAddSite("Hebrew University of Jerusalem", "HUJI", "huji")'
Building a PLC from scratch
- Install FC4 using the server option. Add the development tools and postgress.
- Enable ssh, http, and https using the system-config-network script.
- Update the system using "yum update"
- Turn off selinux by editing /etc/selinux/config and setting to disabled
- Follow the instructions from PlanetLab
- When editing the config file, the package_source variable should be the default planet-lab site. This will be copied to the local host during the rpm_repo step.
- Run /root/pl_box/update_rpm_repo.sh --createrepo
- After a failure in bootmanager, comment out the curl line in /planetlab/bootmanager/support-scripts/buildnode.sh and edit the yum.conf file, switching the location boot.planet-lab.org with your new machine name. Restart setup_plc.sh
OpenAFS Server notes
- Use the file /usr/afs/local/NetInfo to restrict a multi-homed node to a subset of its available interfaces.
Building an AFS client on FC4
I needed to patch rx_vnet.c to include linux/vs_cvirt.h on LINUX26 in order to defeine vx_rmap_pid.
I also needed to modify src/afsd/afs.rc.linux to match the locations of the kernel libraries and to correctly identify the processor type.
- Download an afs build
curl 'http://dl.openafs.org/dl/openafs/candidate/1.4.1-rc8/openafs-1.4.1-rc8-src.tar.gz' > openafs-1.4.1-rc8-src.tar.gz
- Use the correct openafs.spec
- yum install rpm-build kernel-devel kernel-smp-devel pam-devel ncurses-devel byacc flex gcc krb5-devel
- cp openafs-1.4.1-rc8-src.tar.gz /usr/src/redhat/SOURCES
- Create the files
- /usr/src/redhat/SOURCES/openafs-CellServDB
- /usr/src/redhat/SOURCES/openafs-ThisCell
- /usr/src/redhat/SOURCES/openafs-CellAlias
These files will be installed in the source RPMs. A simple default option
is to get a CellServDB from grand.central.org and use "openafs.org" for
ThisCell.
- rpmbuild -ba openafs.spec
- Install the openafs and openafs-client rpms from /usr/src/redhat/RPMS/i386