The question you will ask straight away is what is a master server? Most of Beowulf systems have only one server and gateway to the world outside the cluster, but some have multiple server for redundancy and reliability reasons. In a large disk-less client cluster you might want to use multiple NFS servers to serve system files to the client nodes. In a more distributed environment it is possible for all nodes to act as both client and servers. If you are going to use only one server node you can simply remove the word 'master', and think of a master server simply as the server.
Master server will be the most important node in your beowulf system. It will NFS serve file systems to the client nodes, it will be used for compiling the source code, starting parallel jobs and it will be your access point from the outside world. The following are the steps to installing and configuring master server.
Important part of the installation process is choosing the partition sizes. It is very important to choose partition size which are correct for your needs because it might be very difficult to change this at a later stage when your cluster will be running production code.
My recommended partition sizes for the disk-less client configuration using Red Hat Linux 5.2 are as follows :
/ - 150 MB
. This /
partition will contain
/bin, /boot, /dev, /etc, /lib, /root, /sbin, /var
and
/tftpboot
directories and their contents. In most cases you
can include /tmp
in /
as well. It is very important
for the disk-less client configuration that the /tftpboot
is
on the same partition as /
. If we have these two directories
mounted on separate partitions we will not be able to create some of
the hard links which are needed for the described here NFS root
configuration to work.
/usr - 1 GB
. This might seem as a over-kill but
remember that most additional rmp's will install in /usr
and
not in /usr/local
. If you are planning to install large
packages, you should make /usr
partition even larger. There
is nothing worse than running out of disk space on a production system.
running a very large job.
/usr/local - from 500 MB to 2 GB
. The exact size will
really depend on how much additional software (not included in the
distribution) you have to install.
swap
- Swapping is really bad for performance of you
system. Unfortunately there might be a time when the server is
computing a very large job and you just don't have enough memory. You
should probably make the swap partition no more than twice the size of
physical RAM. For example we have 384 MB of RAM and four 128 MB swap
partitions on node1 in our topcat system
I will not go into the details of Red Hat Linux 5.2 installation as these are well described in Red Hat Linux Installation Manual http://www.redhat.com/support/docs/rhl/. I recommend installing the full Red Hat 5.2 distribution to save time now and later, when you will look for a package you'll need but did not install.
In most cases, nodes in a beowulf cluster use private IP addresses. The only node which has a "real" IP address, visible from the outside world, is the server node. All other nodes (clients) can only see nodes with in the beowulf cluster. An example of a five-node beowulf cluster is shown below. As you can see, node1 has two network interfaces, one for the cluster, and one for the outside world. I use the 10.0.0.0/8 private IP range, but others can also be used (please see RFC 1918 http://www.alternic.net/rfcs/1900/rfc1918.txt.html)
eth0 123.45.67.89 ----------------[node1] | eth1 10.0.0.1 | 10.0.0.2 ------ 10.0.0.5 [node2]---------|SWITCH|---------[node5] ------ | | | | 10.0.0.3 | | 10.0.0.4 [node3] [node4]
If you haven't already done so, you should now configure both of your
Ethernet cards. One of your cards should have a "real" IP address
allocated to you by your network administrator (most probably you
:), and the other a private IP (e.g. 10.0.0.1) visible only by the
nodes within the cluster. You can configure your network interface by
either using GUI tools shipped with Red Hat Linux, or simply create or
edit /etc/system/network-scripts/ifcfg-eth*
files. A simple
Beowulf system might use 10.0.0.0/8 private IP address range with
10.0.0.1 being the server and 10.0.0.2 up to 10.0.0.254 being the IP
addresses of client nodes. If you decide to use this IP range you
will probably want to use 255.255.255.0 netmask, and 10.0.0.255
broadcast addresses. On Topcat eth0
is the
interface connecting the cluster to the outside world, and
eth1
connects to the internal cluster network. The routing
table looks like this:
[jacek@topcat jacek]$ /sbin/route Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.0.0.0 * 255.255.255.0 U 0 0 9 eth1 139.x.x.0 * 255.255.248.0 U 0 0 7 eth0 127.0.0.0 * 255.0.0.0 U 0 0 2 lo default 139.x.x.1 0.0.0.0 UG 0 0 18 eth0
I no longer run DNS on Topcat (our Beowulf cluster).
Originally I though that having a dedicated DNS domain and server for
your Beowulf cluster simplified administration, but since then have
configured Topcat without DNS, and it seems to work well. It
is up to you to choose your configuration. I left this section on DNS
for reference purposes, but will no longer maintain it. I believe
that my DNS configuration files will not work with the latest version of
named
.
Setting up DNS is very straight forward. Your server (node1) will be
the DNS server. It will resolve the names and IP addresses for the
whole beowulf cluster. DNS Configuration files can be downloaded from
ftp://ftp.sci.usq.edu.au/pub/jacek/beowulf-utils. The
configuration files listed are the ones I used on our topcat
system but you can include them in your system if you don't mind use
the same names for your nodes as me. As you can see I use a private
IP address range 10.0.0.0/8, with local subnet mask set to
255.255.255.0. Our domain will not be visible from outside (unless
someone uses our node1 as their name server) so we can call it what
ever we want. I chose beowulf.usq.edu.au
for my domain name.
There are few configuration files which you will have to modify for
you DNS to work and you can find them
ftp://ftp.sci.usq.edu.au/pub/jacek/beowulf-utils . After
installing the configuration files restart the named
daemon
by executing /etc/rc.d/init.d/named restart
.
Test your DNS server :
[root@node1 /root]# nslookup node2 Server: node1.beowulf.usq.edu.au Address: 10.0.0.1 Name: node2.beowulf.usq.edu.au Address: 10.0.0.2 [root@node1 /root]# nslookup 10.0.0.5 Server: node1.beowulf.usq.edu.au Address: 10.0.0.1 Name: node5.beowulf.usq.edu.au Address: 10.0.0.5
/etc/hosts
If you decide not use DNS server, then you will have to enter all of
the nodes and their corresponding IP addresses in /etc/hosts
file. If you use disk-less client configuration, the setup_template
and adcn scripts will create hard links to this file, so it will
be used by all nodes. Example /etc/hosts
file from
Topcat is shown below.
127.0.0.1 localhost localhost.localdomain 139.x.x.x topcat.x.x.x. topcat 10.0.0.1 node1.beowulf.usq.edu.au node1 10.0.0.2 node2.beowulf.usq.edu.au node2 10.0.0.3 node3.beowulf.usq.edu.au node3 10.0.0.4 node4.beowulf.usq.edu.au node4 10.0.0.5 node5.beowulf.usq.edu.au node5 10.0.0.6 node6.beowulf.usq.edu.au node6 10.0.0.7 node7.beowulf.usq.edu.au node7 10.0.0.8 node8.beowulf.usq.edu.au node8 10.0.0.9 node9.beowulf.usq.edu.au node9 10.0.0.10 node10.beowulf.usq.edu.au node10 10.0.0.11 node11.beowulf.usq.edu.au node11 10.0.0.12 node12.beowulf.usq.edu.au node12 10.0.0.13 node13.beowulf.usq.edu.au node13
/etc/resolv.conf
If you have a DNS server running on the master server then your
resolv.conf
file should point to local name server first.
This is the /etc/resolv/conf
I had when I ran DNS on
Topcat
search beowulf.usq.edu.au eng.usq.edu.au sci.usq.edu.au usq.edu.au nameserver 127.0.0.1 nameserver 139.x.x.2 nameserver 139.x.x.3
/etc/resolv/conf
file.
search eng.usq.edu.au sci.usq.edu.au usq.edu.au nameserver 139.x.x.2 nameserver 139.x.x.3
/etc/hosts.equiv
In order to allow remote shells (rsh) from any node to any other in
the cluster, for all users, you should list all host in
/etc/hosts.equiv
.
node1.beowulf.usq.edu.au node2.beowulf.usq.edu.au node3.beowulf.usq.edu.au node4.beowulf.usq.edu.au node5.beowulf.usq.edu.au node6.beowulf.usq.edu.au
The general security policy for Beowulf clusters should be such that all the nodes within the cluster should fully trust each other. The reason you can relax the security inside the cluster is because none of the client nodes are directly connected to the outside world, and all nodes are basically the same. If someone hacks into the master node they will not get any more information from any of the client nodes, therefore you don't have to worry about the security at this level. It is practically impossible for anyone to access any of your client nodes without actually sitting at the console, or going via the server node first. The main advantages of relaxing the security within the cluster are flexibility and ease of use and administer. The server node on the other hand should trust its client nodes but not the outside world. There are few things you can do to relax the security within the cluster and to protect your self from outside.
The tcpd daemon, commonly known as TCP wrapper, is the first line of
defense, and is the simplest way of limiting access to your machine and
therefore increasing security. It comes as part of Red Hat
installation and is simple to configure. There are three
configuration files: /etc/hosts.allow
which checks for hosts
which are allowed connections, /etc/hosts.deny
which is read
if the host was not found in /etc/hosts.allow
and checks for
hosts which are to be refused connection, and /etc/inetd.conf
which you should not have to modify to configure tcpd
.
/etc/hosts.allow
hosts_access(5)
man page provides good source of information
on the syntax of these two files.
# # hosts.allow This file describes the names of the hosts which are # allowed to use the local INET services, as decided # by the '/usr/sbin/tcpd' server. # # we fully trust ourself and all the other nodes within the cluster ALL : localhost, 10.0.0., 10.0.1., 10.0.2.
/etc/hosts.deny
The /etc/hosts.deny
file is checked for matches when no match
was found in /etc/hosts.allow
. The best way of using the TCP
wrappers is to deny everything that has not been allowed or matched by
/etc/hosts.allow
. In our cases we not only deny match, and
therefore deny everything, but for every denied connection we send an
e-mail to the administrator.
ALL: ALL: spawn ( \ echo -e "\n\ TCP Wrappers\: Connection Refused\n\ By\: $(uname -n)\n\ Process\: %d (pid %p)\n\ User\: %u\n\ Host\: %c\n\ Date\: $(date)\n\ " | /bin/mail -s "From tcpd@$(uname -n). %u@%h -> %d." root)
If a connection is attempted from a host not listed in
/etc/hosts.allow
the match will occur in
/etc/hosts.deny
, so connection will be closed and I will
receive an e-mail with notification. An example of such an e-mail is
shown below.
Date: Sat, 15 Aug 1998 15:31:08 +1000 From: Administrator <root@topcat.eng.usq.edu.au> Message-Id: <199808150531.PAA20980@topcat.eng.usq.edu.au> To: jacek@usq.edu.au Subject: From tcpd@topcat.eng.usq.edu.au X-Mozilla-Status: 0001 Content-Length: 197 On Sat Aug 15 15:31:08 EST 1998 user jacek from host agatka.usq.edu.au attempted an unauthorised connection to topcat.eng.usq.edu.au. Attempted connection was to process in.rlogind (pid 20972)
/etc/inetd.conf
A very simple, but effective way of improving your security is to
disable unwanted services. The rule of thumb is to disable every
thing you don't need. Most daemons are started by the inetd
super server and should be turned off by commenting out lines in
inetd.conf
. Example below show part of inetd.conf
with login, exec, talk, and ntalk disabled.
shell stream tcp nowait root /usr/sbin/tcpd in.rshd #login stream tcp nowait root /usr/sbin/tcpd in.rlogind #exec stream tcp nowait root /usr/sbin/tcpd in.rexecd #talk dgram udp wait root /usr/sbin/tcpd in.talkd #ntalk dgram udp wait root /usr/sbin/tcpd in.ntalkd
inetd
daemon. The simplest way to do it on Linux is to send
a hang up single to the daemon which will force it to re-read its
configuration file.
[root@topcat root]# killall -HUP inetdDo not try this on other Unix system without reading the killall man page first! You can check which daemons are running by getting a list of all listening ports. You can easily get this list by running:
[root@topcat root]# netstat -a | grep LISTEN | grep -v unix
Servers like httpd
start as rc scripts. Normally each should be
disabled by deleting these link.
ipfwadm
ipfwadm
program allows blocking packets from specific IP
addresses to specific ports and is the most flexible way of
controlling security. The example firewall
rc script should
be started automatically at boot time.
[root@topcat init.d]# cp /home/jacek/firewall /etc/rc.d/init.d [root@topcat init.d]# chmod u+rx firewall [root@topcat init.d]# ln -s /etc/rc.d/init.d/firewall /etc/rc.d/rc3.d/S05firewall [root@topcat init.d]# ln -s /etc/rc.d/init.d/firewall /etc/rc.d/rc5.d/S05firewall