Thursday, April 2, 2015

Pi Cluster Week 1: Doopliss, A hadoop-based Raspberry Pi cluster

Conveniently, just as the Prolog Enigma Machine project is coming to a close in Programming Languages, my Networking professor has set up a project for his course! I consider myself a total chump in the realm of OSI model beyond IP, so as opposed to going for something application layer-y (such as expanding my PHP-based site), I chose to turn back the clock and choose an application that doesn't even require the www, except for downloading packages and the like: A Raspberry Pi "Supercomputer" built from several raspberry pis.

This is very much an "exploratory" expedition in that I'm handling a lot of new ground here, including networking with the Raspberry Pi, configuring and running Hadoop programs, and networking multiple raspberry pis. My hopeful end goal is to be at a point when the cluster becomes "expandable" in that I can readily add nodes with ease.

Today we had time in class in work on the projects, so my first objective was to get to a point where I could ssh into my Pi, so I wouldn't have to use the only (gigantic) HD monitor, which was drawing the attention of my classmates. -_-;

Tutorial: SSH connection to Raspberry Pi

(The following instructions are between Ubuntu 12.04 on ethernet and Pi on ethernet)

 This was as easy as adding to 

/etc/network/interfaces

a few lines specifying an IP you'd like to use.
But first, you'll want to check the network your Linux machine is using by entering

ifconfig

and paying attention to the inet address.




Now, in the interfaces file, add a few lines to specify an IP from the same network.  Using the read above as example let's do this:

iface eth0 inet static
address 10.40.48.75
netmask 255.255.255.0
gateway 10.40.48.1


It's important you keep the netmask as is (or rather that it matches the PC), but you'll need to change the address to one of your preference. I chose 75. The gateway needs to be the address of your router, which will be the first valid IP in the network. If you're unfamiliar with these words, I encourage you to learn more about this kind of networking (particularly IPs), as it may prove useful to you in other applications.


next, make sure you enable ssh from

sudo rasp-config

under advanced options. At this point you can go ahead and reboot using the command

sudo shutdown -r now

 The most annoying part about configuring a raspberry Pi is that you'll find yourself rebooting... a lot. So make sure to find a proper diversion. Once we successfully ssh we can distract ourselves with the other crap on our PC.

With the Pi rebooted, enter

hostname -I

to verify that the Pi is using the IP you asked for. If your IP you entered was taken, you'll like be assigned another one... I found that I was attempting to use another student's in the classroom, so it gave me a slightly different one.

Now, on the PC, use the command

ssh pi@<IP_Address>

inserting in the IP address of the Raspberry Pi. You should be asked some yes/nos and the password of your Pi, but then you'll be in. You may now stow your monitor for most/all intents and purposes.

Configuring the first node for the cluster

From here I'll speak more generally about I did the rest of the class, which was configuring my Raspberry Pi as the first node of a Hadoop Cluster. In few words, Hadoop is software for running distributed java programs. I followed this great tutorial up to running my first Hadoop program, which was counting the words in the license file. Quite a bit more interesting than your average HelloWorld!


During this time I had to add a new user to the pi for interfacing with Hadoop, at which point I figured it was time to give the project a name of sorts. I decided to go with Doopliss, to follow that convention of taking one syllable from the software and then adding something useless at the end. It also happens to be one of the many charming characters from Paper Mario: The Thousand Year Door, a particularly annoying one, who steals Mario's identity, and refuses to give it back unless Mario can spell his name.

 My least favorite chapter, but one of my favorite boss themes.

An annoying character for what I imagine will evolve into an annoying project! I say that, but I am really interested in the potential of stacking several multi-core processors to analyze large datasets. Next week will be about working with hardware to add more physical nodes. I currently have three Pis in my possession, two of which belong to the school. I'm happy to say that I was able to run a program on my own (single node) supercomputer today, though!

All in two hours!

No comments:

Post a Comment