Friday, September 23, 2011

Oneiric server, Deploy Server fleets p2

Welcome to the second installment of this article series. In the first part of this article we installed an Ubuntu server instance, made sure it became an orchestra installation server. If this is new to you, Orchestra is a new Oneiric server feature that enables admins to very easily deploy fleets of Ubuntu servers. Let's pick up where the first article stopped

First, let's check where we are. You see installing the orchestra server, it automatically downloads and imports various Ubuntu server ISOs and creates all the needed structure (distros, profiles ...etc) in the underlying cobbler system. Let's see what have we

$ sudo cobbler list
distros:
   hardy-i386
   hardy-x86_64  
   lucid-i386
   lucid-x86_64  
   maverick-i386 
   maverick-x86_64
   natty-i386
   natty-x86_64  
   oneiric-i386  
   oneiric-x86_64

profiles:
   hardy-i386
   hardy-i386-juju
   hardy-x86_64
   hardy-x86_64-juju
   lucid-i386
   lucid-i386-juju
   lucid-x86_64
   lucid-x86_64-juju
   maverick-i386
   maverick-i386-juju
   maverick-x86_64
   maverick-x86_64-juju
   natty-i386
   natty-i386-juju
   natty-x86_64
   natty-x86_64-juju
   oneiric-i386
   oneiric-i386-juju
   oneiric-x86_64
   oneiric-x86_64-juju

systems:

repos:
   hardy-i386
   hardy-i386-security
   hardy-x86_64
   hardy-x86_64-security
   lucid-i386
   lucid-i386-security
   lucid-x86_64
   lucid-x86_64-security
   maverick-i386
   maverick-i386-security
   maverick-x86_64
   maverick-x86_64-security
   natty-i386
   natty-i386-security
   natty-x86_64
   natty-x86_64-security
   oneiric-i386
   oneiric-i386-security
   oneiric-x86_64
   oneiric-x86_64-security

images:

mgmtclasses:
   orchestra-juju-acquired
   orchestra-juju-available
woah! that sure makes my life easier. If you're interested to see where the isos were downloaded (like I was) here you are
ls /var/lib/cobbler/isos/
hardy-i386-mini.iso    lucid-i386-mini.iso    maverick-i386-mini.iso    natty-i386-mini.iso    oneiric-i386-mini.iso
hardy-x86_64-mini.iso  lucid-x86_64-mini.iso  maverick-x86_64-mini.iso  natty-x86_64-mini.iso  oneiric-x86_64-mini.iso

Let's create a new virtual box VM, to serve as our new "server" that needs to be installed. Here's how it looks for me
12-oneiric01-vboxsettings

One thing is worth noting however, it's that the NIC is placed on the "intnet" network, which has the IP range 192.168.77.0/24 that we configured in the first part of this article
13-vbox-natty01-netsettings

now the only "real" thing you have to do, is to add a profile on the orchestra server for your new bare server. The profile binds its mac address, to a name and an installation profile (think OS to install, kickstart ..etc)

sudo cobbler system add --name="oneiric01.ubuntu.lan" --mac-address="08:00:27:B7:76:2A" --ip-address="192.168.77.33" --dns-name="oneiric01.ubuntu.lan" --hostname="oneiric01.ubuntu.lan" --profile="oneiric-x86_64-juju" --mgmt-classes="orchestra-juju-available" --kopts=" DEBCONF_DEBUG=developer netcfg/dhcp_timeout=120 netcfg/choose_interface=eth0"
Boot the server, choose PXE (For vbox that's F12 then "l" that's an L)

14-natty-PXEbooting

Watch the installer fly by (look ma hands free)

15-installer-running

and your box is ready!
16-Oneiric01-ready

That's how easy it is to install a fresh server off your orchestra box! So basically the only thing you need to do per server, is to attach it to a profile and that's it. Boot it and it installs whatever you provisioned for it. Of course any good admin already did that manually before, but it took effort and it wasn't standardized. Now you can count on Ubuntu server covering your back when you're tasked with installing a hundred servers

How cool was that! Got thoughts, comments or rotten tomatoes ? Shoot me a comment

Wednesday, September 21, 2011

Oneiric server, Deploy Server fleets p1

I'm gonna be posting a series of articles on new features and cool technology bits that are landing in Ubuntu Oneiric (11.10) server. Why? I like servers, I like cloud, I like Ubuntu, it all mixes well, what's not to like :)

During this first article, I'll be demoing (in a graphically intensive way :) what it takes (hint: not much!) to deploy a server fleet with Oneiric server. Orchestra is the name of a wonderful piece of technology that lands in Oneiric, that's been created on top of the open-source cobbler project. Orchestra is super easy to install and get started with, and enables you to very rapidly deploy tens or hundreds of physical servers. I'll be using virtualbox to build a small test "lab" on my laptop for purposes of this article. I did actually try KVM first, but faced some trouble getting PXE booting reliably, so I opted for virtualbox which worked flawlessly (kudos vbox guys, you rock!)

Let's get started, I created a VM to represent the very first "head node", that will install the rest of all nodes. Here is a summary of its configuration
1-orchestra
Pop in the virtual CD, boot it, press F6, add "priority=critical locale=en_US url=http://bit.ly/uquick" (Thanks Dustin!) so it looks like
2-orchestra-bootoptions
The uquick profile answers all the installer questions, such that the installation is fully automatic. Since the VM contains two NICs however, we'll need to select a primary one (eth0 in my case)
3-orchestra-whicheth
The installation runs like a champ, fully automated, give it a few minutes till it finishes everything and reboots into the server OS (oh that was easy!)
4-orchestra-login
Now I configure eth1 to have a static IP address of 192.168.77.1/24 (I made any address up), here is a snapshot of /etc/network/interfaces and I started eth1 using ifup
5-orchestra-eth1up
At this stage, I rebooted the server but you definitely don't have to. Let's start actually installing Orchestra

sudo apt-get update
sudo apt-get install ubuntu-orchestra-server -y

Everything proceeds automatically, for any question you get during package installation, I'll provide a picture with the answer :)
6-cobbler-password
7-nextserver
8-enable-dns-dhcp
9-dhcp-range
10-dhcp-gw
11-domain-name

That's it! You've just installed and configured your first Ubuntu Orchestra server, and you're now ready to install a fleet of Ubuntu servers the easy way! In part 2 of this article, I'll go through creating a second server, PXE booting and installing it from the orchestra server. (Extra credit: If you can't wait, try PXE booting a fresh server right now. Note that after installation, orchestra actually downloads and auto-imports a few Ubuntu mini ISOs, thus will need a few minutes depending on your internet connection speed)

So, what do you think of this coolness? Is this easier than the last time you tried building yourself an automated network installer? Shoot me a comment, let me know what you think

Monday, September 12, 2011

Torrent download Cloud appliance

The Why



A friend of mine who's a Linux systems geek as well, was tasked with building a library of Linux distro ISOs, this involved downloading tens of ISOs many of which were only offered in torrent form. Even if that were not the case, it would still be good practice to download such large binaries from torrent to avoid loading a certain mirror too much. Anyway, we were chatting about it, and since where I live bandwidth (especially upload) is a scarce resource, he was considering paying some service to download the torrents he needed and convert them to HTTP!

I mentioned I could build something to do just that in about an hour! It wouldn't even be complex. Armed with Ensemble I can very simply launch an EC2 instance, install rtorrent (my fav cli torrent client) and rtgui (rtorrent Web UI) and have it ready to crunch on any of your torrenting needs. We both became interested in seeing how well that would work and so here we go...

The How


I'll assume you already know how to get started with Ensemble. Let's see what it takes to deploy my torrent appliance

$ bzr branch lp:~kim0/+junk/rtgui
$ ensemble bootstrap
# Wait for ec2 to catch up (2~5 mins)
$ ensemble deploy --repository . rtgui
$ ensemble expose rtgui

That is basically all you need to "use" this appliance! Give another few minutes for the rtgui appliance to boot, install and configure itself. You can check status with

$ ensemble status 
2011-09-12 13:23:53,868 INFO Connecting to environment.
machines:
  0: {dns-name: ec2-50-19-19-234.compute-1.amazonaws.com, instance-id: i-2c37ee4c}
  1: {dns-name: ec2-107-20-96-125.compute-1.amazonaws.com, instance-id: i-1c28f17c}
services:
  rtgui:
    exposed: true
    formula: local:rtgui-9
    relations: {}
    units:
      rtgui/0:
        machine: 1
        open-ports: [80/tcp, 55556/tcp, 55557/tcp, 55558/tcp, 55559/tcp, 55560/tcp,
          6881/udp]
        relations: {}
        state: started
2011-09-12 13:24:01,334 INFO 'status' command finished successfully

The import bit to watch for is "state: started", if it's something else, that means the ec2 instance is still being configured. It's nice to note that the following ports have been opened 55556-55560 since rtorrent is configured to use those ports, port 80 was opened for the Web UI, and port 6881 UDP was opened for the DHT network. I am in no way a torrent expert, so this could be completely unoptimized, but hey it seems to work

Ready to test? Machine 1 runs rtgui, so go ahead and visit it in a browser, for me that's http://ec2-107-20-96-125.compute-1.amazonaws.com/rtgui (replace that DNS name with the right one for your instance, and don't forget the trailing /rtgui like I always do). Click "Add torrent" and pass it the URL to a torrent file, I'm gonna be testing with Ubuntu 11.10 beta1 amd64 torrent file. Once the torrent is added, click the green play button to start it. Since EC2 instances have quite some bandwidth available to them, this Ubuntu torrent downloaded in a just few seconds. I am shipping a default configuration with rtorrent that limits upload speed to 100KB (since you're paying for bandwidth), but you can change that from the web UI. Here's how the whole thing looks


Once a torrent file is downloaded, you can download it through http://ec2-107-20-96-125.compute-1.amazonaws.com/complete (again replace the machine DNS name, with the correct name in your case)

A single torrent appliance is not ofcourse limited to a single torrent! You can keep adding as much as you want, however eventually you're going to hit some limit (disk IO, network IO, disk space ...etc). As such (probably only if you're after downloading really large number of torrents) you may need to "scale up" this torrent download appliance (well it's a cloud for God's sake!). If that's what you wish for, you only need to
$ ensemble add-unit rtgui

Simple as that, as with everything Ensemble! So now you know how you can download your 11.10 copy without loading Ubuntu's servers, actually you'd be helping them and all millions of Ubuntu users if you use this method on release day. Once you're done playing with the appliance, you need to destroy it (to stop paying Amazon for the machines)

$ ensemble destroy-environment
WARNING: this command will destroy the 'sample' environment (type: ec2).
This includes all machines, services, data, and other resources. Continue [y/N]y
2011-09-12 13:53:33,018 INFO Destroying environment 'sample' (type: ec2)...
2011-09-12 13:53:36,641 INFO Waiting on 2 EC2 instances to transition to terminated state, this may take a while
2011-09-12 13:54:18,617 INFO 'destroy_environment' command finished successfully

Want to improve it?


Things I wish I had time to improve:
  • Once a file is downloaded, upload it to S3. You can then terminate the appliance, and still download the files at your own pace
  • Parameterize rtorrent rc configuration file, such that you can pass it parameters from Ensemble (such as upload rate...etc)
  • Integrate notification upon download completion (SMS me, email me, IM me ...etc)
  • Add an auto-redirect to /rtgui :)
  • Figure out a way to download completed files from within the rtgui web UI

If you're interested in improving that appliance, drop by in #ubuntu-ensemble on IRC/Freenode and ping me (kim0) or any of the friendly folks around.

What kind of skills do you need to hack on that project? Just bash shell scripting foo! The feature I love most about Ensemble as a cloud orchestration tool, is that it doesn't twist you into using some abstracted syntax. You get to write in whatever language you feel like using, for me that's bash. You can find the script that does all of the above right here.

Interested to learn more about Ensemble and automating Ubuntu server deployments in the cloud or on physical servers ?
Want to hack on this torrent appliance, or do something similar?
Have comments or a better idea?

Let me know about it, just drop me a comment right here! You can also grab me (kim0) over Freenode irc

Sunday, August 21, 2011

Battling Hunger in the Horn of Africa

Hungry children in the Horn of Africa

A humanitarian crisis has slowly unfolded in the Horn of Africa. Drought, conflict, and rising food prices have affected more than 13 million people in the region. On 20 July, famine conditions were declared in several southern regions of Somalia. The Food Security and Nutrition Analysis Unit (FSNAU) forecasts that famine conditions will spread if humanitarian assistance does not increase. In response, WFP is planning to feed over 11.5 million people, including 3.7 million people in Somalia, 3.7 million in Ethiopia, and 2.7 million in Kenya.

Restricted aid access

Access to some vital areas is restricted to humanitarian aid organizations. The hatched area on the map shows areas in which some aid organizations are unable to work— including the places where people are most in need of assistance.

Operational efficiency

The figure of USD $0.50 per person per day is based on the average combined daily costs of World Food Programme's operations within Somalia, Ethiopia, & Kenya, as well as the number of people reached by those efforts.


If you cannot see the embedded map above, click here: http://horn.wfp.org/main.html

Save a child today!

How can Govs help FOSS businesses


Dear Lazyweb,

A group of FOSS ambassadors (of which I am one) have been invited to visit government officials who are "interested" in open-source. The goal will be to pitch open-source and why adopting it would have various benefits on a national level.

A more specific point that should be discussed is How can the government help local FOSS businesses grow in order to help, support and grow a FOSS-oriented eco-system (Developers, Support professionals, VARs...)

If you were in that meeting, what points would you make ?

Thanks!

Monday, August 8, 2011

Ensemble meets Hadoop on the cloud

Hadoop

So you wanted to play with hadoop to crunch on some big-data problems, except that, well getting a hadoop cluster up and running in not exactly a one minute thing! Let me show you how to make it "a one minute thing" using Ensemble! Since Ensemble now has formulas for creating hadoop master and slave nodes, spinning up a hadoop cluster could not be easier! Check this video out


If you can't see the embedded video, here's a direct link http://youtu.be/e8IKkWJj7bA

Yep that's how simple it is! If you want to scale-out the cluster, you only need to ask Ensemble to do it for you:
$ ensemble add-unit hadoop-slave

So is this easier than configuring a hadoop cluster manually? Leave me a comment, let me know your thoughts! Also let me know what you'd like to see deployed next with Ensemble

Friday, July 29, 2011

Ubuntu takes UFOs to the cloud


I've always believed in UFOs as a kid, and while I've never seen one (yet?) I am still more on the believer side! I was interested to stumble upon a database of UFO sightings at http://www.infochimps.com/tags/ufo# A shout-out at infochimps (you guys are great!). Downloading the sightings DB (around 80MB), I found a listing of 60,000 documented sightings, hmm interesting! I started thinking I could crunch on this data in some useful and fun way, what about finding the most commonly spotted UFO shape?! Sounds like I could use hadoop for that, just for the coolness factor really, the data is not that large anyway, but hey why not! I had no-idea how to get started with hadoop though and wasn't really interested in learning up all the gory details!

Well Ensemble to the rescue, hadoop master and slave formulas exist, which means someone else packaged the knowledge needed to setup and run a hadoop cluster for me. All I needed to do was ask Ensemble to deploy me a couple of cloud instances and start playing. Let's see how you can do that for yourself

I won't repeat the instructions to get started with Ensemble, since the documentation is a good place for that (and it's so easy anyway!). If you feel you need more help there, this little video should be helpful. If you're still stuck, you can always drop by on irc/freenode at #ubuntu-ensemble and ask your questions

Hadoop node, with an extra slave please


So, let's start ensembling
bzr branch lp:~negronjl/+junk/hadoop-master
bzr branch lp:~negronjl/+junk/hadoop-slave
ensemble bootstrap
wait a minute or two for EC2 to spin up the instance, then
ensemble status
which’ll give you output like
$ ensemble status
2011-07-12 15:20:54,978 INFO Connecting to environment.
The authenticity of host 'ec2-50-17-28-19.compute-1.amazonaws.com (50.17.28.19)' can't be established.
RSA key fingerprint is c5:21:62:f0:ac:bd:9c:0f:99:59:12:ec:4d:41:48:c8.
Are you sure you want to continue connecting (yes/no)? yes
machines:
  0: {dns-name: ec2-50-17-28-19.compute-1.amazonaws.com, instance-id: i-8bc034ea}
services: {}
2011-07-12 15:21:01,205 INFO 'status' command finished successfully

Now let's deploy a two node hadoop cluster
ensemble deploy --repository . hadoop-master
ensemble deploy --repository . hadoop-slave
ensemble add-relation hadoop-master hadoop-slave

Yeah it's that easy! Ensemble formulas manage all the kung-fu for you. The hadoop cluster is ready, let's ssh into the master node and switch to user hdfs
ensemble ssh hadoop-master/0
sudo -su hdfs

Downloading UFOs


Download the infochimps sightings database here, unzip it and locate the TSV file (tab separated values) file. Note that you can download the file from infochimps without registering on their website (didn't I say these guys were great :)

Upload the TSV DB to hadoop's distributed filesystem
hadoop dfs -copyFromLocal ufo_awesome.tsv ufo_awesome.tsv

Almost ready, the corpus has been uploaded. Now we need to write some map/reduce jobs to do the actual crunching. Not being a pro developer, the thought of writing that in java was like (oh no ew), so python to the rescue! Thanks to the great instructions at Michael Noll's blog, I was able to massage some of that code to get it to do what I wanted. I pushed my code to launchpad, so that you can grab it directly from the hadoop master node

cd /tmp
bzr branch lp:~kim0/+junk/ufo-ensemble-cruncher
cd ufo-ensemble-cruncher

Unleashing the elephant


Now for the big moment, let's launch the elephant
hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-*.jar -file ./mapper.py -mapper mapper.py -file ./reducer.py -reducer reducer.py -input ufo_awesome.tsv -output ufo-output
packageJobJar: [./mapper.py, ./reducer.py, /tmp/hadoop-hdfs/hadoop-unjar1418682529553378062/] [] /tmp/streamjob5701745574334998473.jar tmpDir=null
11/07/29 12:27:52 INFO mapred.FileInputFormat: Total input paths to process : 1
11/07/29 12:27:53 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hdfs/mapred/local]
11/07/29 12:27:53 INFO streaming.StreamJob: Running job: job_201107290935_0010
11/07/29 12:27:53 INFO streaming.StreamJob: To kill this job, run:
11/07/29 12:27:53 INFO streaming.StreamJob: /usr/lib/hadoop-0.20/bin/hadoop job  -Dmapred.job.tracker=domU-12-31-39-10-81-8E.compute-1.internal:8021 -kill job_201107290935_0010
11/07/29 12:27:53 INFO streaming.StreamJob: Tracking URL: http://domU-12-31-39-10-81-8E.compute-1.internal:50030/jobdetails.jsp?jobid=job_201107290935_0010
11/07/29 12:27:54 INFO streaming.StreamJob:  map 0%  reduce 0%
11/07/29 12:28:11 INFO streaming.StreamJob:  map 10%  reduce 0%
11/07/29 12:28:12 INFO streaming.StreamJob:  map 19%  reduce 0%
11/07/29 12:28:14 INFO streaming.StreamJob:  map 72%  reduce 0%
11/07/29 12:28:16 INFO streaming.StreamJob:  map 100%  reduce 0%
11/07/29 12:28:33 INFO streaming.StreamJob:  map 100%  reduce 100%
11/07/29 12:28:37 INFO streaming.StreamJob: Job complete: job_201107290935_0010
11/07/29 12:28:37 INFO streaming.StreamJob: Output: ufo-output

Woohoo success! Now let's grab the results, sorting it to easily see the most popular sighting shape

Is the answer really 42


hadoop dfs -cat ufo-output/part-00000 | sort -k 2,2 -nr
light   12202
triangle        6082
circle  5271
disk    4825
other   4593
unknown 4490
sphere  3637
fireball        3452
oval    2869
formation       1788
cigar   1782
changing        1546
flash   990
cylinder        982
rectangle       966
diamond 915
chevron 760
egg     664
teardrop        595
cone    265
cross   177
delta   8
round   2
crescent        2
pyramid 1
hexagon 1
flare   1
dome    1
changed 1

The answer is "light" then! Wow that was a blast! I had fun doing this exercise. Now I am no hadoop expert in any way (so direct those hadoopy questions to someone who can actually answer them), however I was quite pleased Ensemble could help me get up and running that fast. The Ensemble community is doing a great job wrapping many free software with formulas, such that you can always get up and running with any app you need in seconds rather than days (months?). You too can write Ensemble formulas for your favorite (server?) application. Hop on to #ubuntu-ensemble and grab me (kim0) or any of the dev team and ask any questions on your mind! We're a happy community

So was that fun? Can you think of something cooler you want to see done? Leave me a comment, let me know about it