Need a machine that can perform trillions of floating-point calculations in a second? Or do you need a cool story about your personal supercomputer that turned off the lights in your village? Building a supercomputer is an interesting challenge if you're a wealthy genius with some spare time. Technically, a multiprocessor supercomputer is a network of computers that work together to solve a problem. This article will briefly cover each stage of its creation, focusing on hardware and software.
Step
Step 1. First, find out what hardware components you will need
One main node, at least a dozen identical compute nodes, an Ethernet switch, a power distribution unit (PDU), and a server rack. Also find out about electricity, cooling, and space needs. Specify the IP address for the private network, the node names, the software packages you want to install, and what technology you want to use to make them all work together to perform parallel computing (more on that below).
- Although the hardware you will need is expensive, the software in this guide is all free, and most of it is open source.
- If you want to see how fast your supercomputer will be (in theory), use this:
Step 2. Build compute nodes
You can assemble the compute nodes you need yourself or use a ready-made server.
- Choose a computer server framework that maximizes space, cooling, and power efficiency.
- Or you can use roughly a dozen obsolete servers. When used together, they are much more useful than when used individually, and you can save quite a bit of money. All processors, network adapters, and motherboards must be the same to ensure the system runs smoothly. Of course, don't forget the RAM and storage capacity for each node and at least one optical drive for the main node.
Step 3. Mount the server you have built into the server rack
Start at the bottom to avoid the objections at the top. Invite a friend to help you out, as congested server sets can become so heavy that it makes it difficult to fit them in drawers.
Step 4. Mount the Ethernet switch on top of the server frame
Take this opportunity to configure it: give it a frame size of 9000 bytes, set the IP address to the static address you specified in step 1, and turn off unnecessary routing protocols like SMTP Snooping.
Step 5. Install the power distribution unit
You may need 220 volts for high-performance computing, depending on how much current the node requires at maximum load.
Step 6. Once everything is installed, you can start the configuration process
Linux is a must-use operating system for high-performance computing clusters, because apart from being ideal for scientific computing, Linux is also 100% free. With nodes reaching hundreds or even thousands, it would be very expensive if you use Windows!
- Start by installing the latest version of the motherboard BIOS and firmware. The installed version must be the same for all nodes. Begin with installing the latest version of the motherboard BIOS and firmware, which should be the same on all nodes.
- Install the Linux distro you want on each node, with a graphical interface on the main node. Popular choices are CentOS, OpenSuse, Scientific Linux, RedHat, and SLES.
- The author strongly recommends using the Rocks Cluster Distribution. Rocks will instantly install all the programs your supercomputer needs to function, and use a nifty way to 'share' itself across all existing nodes using Red Hat's PXE boot and 'Kick Start' procedure.
Step 7. Install the messaging interface, resource management, and other essential software libraries
If you didn't install Rocks in the previous step, you'll have to prepare the software needed to power the parallel computing mechanism yourself.
- First, you'll need a portable bash management system like Torque Resource Manager, which will do the job sharing among the machines.
- Pair Torque with the Maui Cluster Scheduler to complete setup.
- Next, you need to install the messaging interface, which is needed to make separate compute nodes share the same data. OpenMP is a definite choice.
- Don't forget the multi-threading math libraries and compilers to build the parallel computing programs you need. Or, just install Rocks to make it even easier.
Step 8. Combine all compute nodes into a network
The main node will send computational tasks to the compute node, which must then send back the results while exchanging messages with each other. The sooner the better.
- Use a private ethernet network to connect all the nodes in your supercomputer cluster.
- The primary node can be an NFS, PXE, DHCP, TFTP, and NTP server in the ethernet network.
- You must separate this network from the public network to ensure that the packets sent do not interfere with other networks on your local network.
Step 9. Test the supercomputer you have created
Before being used by others, we recommend that you test the performance of your supercomputer first. HPL (High Performance Linpack) is a popular benchmark for measuring the computing speed of supercomputers. You will need to compile from source, with all the optimization options offered by the compiler you are using for the architecture you have chosen.
- Of course, you need to compile from source with all possible optimization options for your platform. For example, if using an AMD CPU, compile it using Open64 with optimization level -0fast.
- Compare your test results on TOP500.org to compare your supercomputer with the 500 fastest supercomputers in the world!
Tips
- For high network speeds, take a look at the InfiniBand network interface. Of course, you have to be prepared to pay a premium price.
- IPMI can simplify the administration of large supercomputer clusters by providing KVM-over-IP, remote power cycle control, and other features.
- Use Ganglia to monitor compute load on nodes.