Friday, November 23, 2012

High-Availability Ngnix using Heartbeat on Amazon EC2

Background: I have mentioned earlier that we were baffled by Amazon's Elastic Load Balancer(ELB). We tested with Nginx as reverse proxy, it seems to be hands down better than ELB. (I will have Nginx performance in another post.) To scale this setup horizontally, we configured multiple Nginx with Weighted Round Robin (WRR) DNS configuration; to alleviate some load off Nginx server.

We later decided to have redundant Nginx server in N+1 setup, so that if one server goes down the redundant takes over. I stumbled on Heartbeat. Unfortunately there is no good tutorial on how this can be done on Amazon EC2.

Warning: I will keep this tutorial brief. Limited to Heartbeat configuration, as the Wiki mentioned:
In order to be useful to users, the Heartbeat daemon needs to be combined with a cluster resource manager (CRM) which has the task of starting and stopping the services (IP addresses, web servers, etc.) that cluster will make highly available. Pacemaker is the preferred cluster resource manager for clusters based on Heartbeat.
Goal: We need to make sure auxiliary node takes over, as soon as a main node dies. This is what should happen:
  1. We assign Elastic IP(EIP) to main node that runs Nginx, initially.
  2. When main node goes down, we want
    1. the aux node gets assingned the EIP. (refer /etc/init.d/updateEIP script below)
    2. the aux node's nginx starts up. (refer /etc/init.d/nginx script below)
  3. When main node comes back up,
    1. aux node relinquishes the EIP
    2. aux node shuts down Nginx
    3. main node acquires EIP.
    4. main node starts Nginx. 
Preparation: Assuming you have launched two instances of your favorite Unix flavor.
  1. Get elastic IP. We will use this to switch over.
    ec2-allocate-address
    ADDRESS 50.17.213.152 
  2. Install Nginx on both machines. I will install from source from here http://nginx.org/en/download.html
    wget http://nginx.org/download/nginx-1.2.5.tar.gz
    tar xvzf nginx-1.2.5.tar.gz 
    cd nginx-1.2.5
    ./configure 
    make
    make install
  3. Configure Nginx
    cd /usr/local/nginx/conf
    mv nginx.conf nginx.conf.orig
    vi nginx.conf

    My nginx.conf looks like this.
    worker_processes  2;
    worker_rlimit_nofile 100000;
    
    events {
        worker_connections  1024;
    }
    
    http {
    
      access_log off;
    
      upstream myapp {
        server ip-10-10-225-146.ec2.internal weight=10 max_fails=3 fail_timeout=10s;  # Reverse proxy to app01
        server ip-10-226-95-45.ec2.internal weight=10 max_fails=3 fail_timeout=10s;   # Reverse proxy to app02
        server ip-10-244-145-130.ec2.internal weight=10 max_fails=3 fail_timeout=10s; # Reverse proxy to app03
      }
    
      server {
        listen 80;
        add_header X-Whom node_1;
        location / {
          proxy_pass http://myapp;
        }
      }
    }
    This configuration is same on both Nginx server except one thing, the add_header X-Whom node_1; part. This is to identify which Nginx loadbalancer is serving. This will help to debug the situation later. On the second Nginx, we have add_header X-Whom node_2;. This line says Nginx to inject a header X-Whom to each response.
  4. Install Heartbeat:
    #RHEL, CentOS
    yum install heartbeat
    
    #Debian, Ubuntu
    sudo apt-get install heartbeat-2

Configure Heartbeat: There are three things you need to configure to get stuffs working with Heartbeat.
  1. ha.cf: /etc/ha.d/ha.cf is main configuration file. It has list of nodes, features to be enabled. And the stuffs in it is order sensitive. 
  2. authkeys: It is basically to maintain security of cluster. It provide a mean to authenticate nodes in a cluster. 
  3. haresources: List of resources to be managed by heartbeat. It looks like this preferredHost service1 service2. where preferredHost is the hostname where you prefer the subsequent services to be executed. service1 service2 are the service that stay inside /etc/init.d/ and stops and starts the service when called in /etc/init.d/service1 stop|start.

    When a node is woken up, the service is called from left to right. Means service1 first; and then service2. And when a node is brought down service2 is terminated first, and then service1.
---------
 For sake of simplicity lets call one node main node and the other aux node.

On main node,
uname -n
ip-10-144-75-85

On aux node,
uname -n
ip-10-36-5-11

----
/etc/ha.d/ha.cf
On main node,
logfile /tmp/ha-log
debugfile /tmp/ha-debug
logfacility local0
keepalive 2
deadtime 30
initdead 120
udpport 694
ucast eth0 10.36.5.11
ucast eth0 10.144.75.85
auto_failback on
node ip-10-36-5-11
node ip-10-144-75-85

On aux node, exact as on main node
logfile /tmp/ha-log
debugfile /tmp/ha-debug
logfacility local0
keepalive 2
deadtime 30
initdead 120
udpport 694
ucast eth0 10.36.5.11
ucast eth0 10.144.75.85
auto_failback on
node ip-10-36-5-11
node ip-10-144-75-85

where,
  deadtime: seconds after which a host is considered dead if not responding.
  warntime: seconds after which the late Heartbeat warning will be issued.
  initdead: seconds to wait for the other host after starting Heartbeat, before it is considered dead.
  udpport: port number used for bcast/ucast communication. The default value for this is 694.
  bcast/ucast: interface on which to broadcast/unicast.
  auto_failback: if ‘on’, resources are automatically failed back to its primary node.
  node: nodes in the HA set-up, dentified by uname -n.
----

/etc/ha.d/authkeys must be the same on both the machines. You may generate a random auth-secretkey using, date|md5sum

authkeys on both the machines:
auth 1
1 sha1 1e8d28a4627ed7f83faf1d57f5b11645
----

/etc/ha.d/haresources

haresources on both the machines:
ip-10-144-75-85 updateEIP nginx
updateEIP and nginx are shell script that I wrote and stored under /etc/init.d/.

updateEIP
#!/bin/bash

# description: this script associates $eip to this instance.
# it assumes you have JDK and AWS API-tools installed.
# location of these are given as $JAVA_HOME and $EC2_HOME
# author: Nishant Neeraj

export EC2_HOME=/home/ec2
export JAVA_HOME=/usr/java/default

eip="50.17.213.152"
pk="/mnt/pk-2ITPGLG6XXXXXXXXXXXXQUEK2PUOVFJB.pem"
cert="/mnt/cert-2ITPGLG6XXXXXXXXXXXXQUEK2PUOVFJB.pem"

function updateEIP(){
        instance="$(curl -s http://169.254.169.254/latest/meta-data/instance-id)"
        echo "Instace ID is: ${instance}"

        echo "Assingning $eip to ${instance}..."
        /home/ec2/bin/ec2-associate-address -K ${pk} -C ${cert} -i ${instance} ${eip}

        echo "done!"
}

param=$1

if [ "start" == "$param" ] ; then
  echo "Starting..."
  updateEIP
  exit 0
elif [ "stop" == "$param" ] ; then
  echo "stopping..."
  exit 0;
else
  echo "no such command $param"
  exit 1
fi

nginx
#/bin/bash

function start(){
  echo "starting nginx..."
  /usr/local/nginx/sbin/nginx 
}

function stop(){
  echo "stoping nginx..."
  /usr/local/nginx/sbin/nginx -s stop
}

param=$1

if [ "start" == "$param" ] ; then
  start
  exit 0
elif [ "stop" == "$param" ] ; then
  stop
  exit 0
else
  echo "no such command: $param"
  exit 1
----

Start services:
on main,
service heartbeat start
/etc/init.d/nginx start

on aux,
service heartbeat start

You can tail -f /tmp/ha-debug on these nodes to watch things rolling.

Test: Time to test. Note that we are just watching nodes, not Nginx service. So, we need to make heartbeat service on 'aux' look like as if 'main' server is down.

Keep tailing the debug file in 'aux' machine. This will show you the transition. Now, it's time to kill 'main' node. On main, so this:
service heartbeat stop
/etc/init.d/nginx stop

You can see the debug file on 'aux' shows that 'aux' is taking over 'main'. Now, to see how service switches back to main node when it comes up. Start just the heartbeat service, you will see: a. aux: EIP gets detached. (not really, but you can), b. aux: nginx stops, c. main: EIP is assigned, d. main: Nginx is started.

Note: While we flip EIPs, you may get disconnected from SSH connection. So, keep looking into AWS web console to get new assigned public DNS for node which we revoked the EIP from. And use EIP to connect to the node which assigned it.

Debug Logs: Here is how log on auxiliary machine look like
heartbeat[9133]: 2012/11/21_15:12:44 info: Received shutdown notice from 'ip-10-144-75-85'.
heartbeat[9133]: 2012/11/21_15:12:44 info: Resources being acquired from ip-10-144-75-85.
heartbeat[9133]: 2012/11/21_15:12:44 debug: StartNextRemoteRscReq(): child count 1
heartbeat[10055]: 2012/11/21_15:12:44 info: acquire local HA resources (standby).
heartbeat[10055]: 2012/11/21_15:12:44 info: local HA resource acquisition completed (standby).
heartbeat[9133]: 2012/11/21_15:12:44 info: Standby resource acquisition done [foreign].
heartbeat[9133]: 2012/11/21_15:12:44 debug: StartNextRemoteRscReq(): child count 1
heartbeat[10056]: 2012/11/21_15:12:44 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys ip-10-36-5-11] to acquire.
heartbeat[10081]: 2012/11/21_15:12:44 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10081]:    2012/11/21_15:12:44 info: Running /etc/ha.d/rc.d/status status
mach_down[10097]:    2012/11/21_15:12:44 info: Taking over resource group updateEIP
ResourceManager[10123]:    2012/11/21_15:12:45 info: Acquiring resource group: ip-10-144-75-85 updateEIP nginx
ResourceManager[10123]:    2012/11/21_15:12:45 info: Running /etc/init.d/updateEIP  start
ResourceManager[10123]:    2012/11/21_15:12:45 debug: Starting /etc/init.d/updateEIP  start
Starting...
Instace ID is: i-6ff07d10
Assingning 50.17.213.152 to i-6ff07d10...
ADDRESS    50.17.213.152    i-6ff07d10
done!
ResourceManager[10123]:    2012/11/21_15:12:58 debug: /etc/init.d/updateEIP  start done. RC=0
ResourceManager[10123]:    2012/11/21_15:12:58 info: Running /etc/init.d/nginx  start
ResourceManager[10123]:    2012/11/21_15:12:58 debug: Starting /etc/init.d/nginx  start
starting nginx...
ResourceManager[10123]:    2012/11/21_15:12:58 debug: /etc/init.d/nginx  start done. RC=0
mach_down[10097]:    2012/11/21_15:12:58 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[10097]:    2012/11/21_15:12:58 info: mach_down takeover complete for node ip-10-144-75-85.
heartbeat[9133]: 2012/11/21_15:12:58 info: mach_down takeover complete.
heartbeat[9133]: 2012/11/21_15:13:16 WARN: node ip-10-144-75-85: is dead
heartbeat[9133]: 2012/11/21_15:13:16 info: Dead node ip-10-144-75-85 gave up resources.
heartbeat[9133]: 2012/11/21_15:13:16 info: Link ip-10-144-75-85:eth0 dead.
...
heartbeat[9133]: 2012/11/21_15:19:59 info: Heartbeat restart on node ip-10-144-75-85
heartbeat[9133]: 2012/11/21_15:19:59 info: Link ip-10-144-75-85:eth0 up.
heartbeat[9133]: 2012/11/21_15:19:59 info: Status update for node ip-10-144-75-85: status init
heartbeat[9133]: 2012/11/21_15:19:59 info: Status update for node ip-10-144-75-85: status up
heartbeat[9133]: 2012/11/21_15:19:59 debug: StartNextRemoteRscReq(): child count 1
heartbeat[9133]: 2012/11/21_15:19:59 debug: get_delnodelist: delnodelist=
heartbeat[10262]: 2012/11/21_15:19:59 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10262]:    2012/11/21_15:19:59 info: Running /etc/ha.d/rc.d/status status
heartbeat[10278]: 2012/11/21_15:19:59 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10278]:    2012/11/21_15:19:59 info: Running /etc/ha.d/rc.d/status status
heartbeat[9133]: 2012/11/21_15:20:00 info: Status update for node ip-10-144-75-85: status active
heartbeat[10294]: 2012/11/21_15:20:00 debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc[10294]:    2012/11/21_15:20:00 info: Running /etc/ha.d/rc.d/status status
heartbeat[9133]: 2012/11/21_15:20:00 info: remote resource transition completed.
heartbeat[9133]: 2012/11/21_15:20:00 info: ip-10-36-5-11 wants to go standby [foreign]
heartbeat[9133]: 2012/11/21_15:20:01 info: standby: ip-10-144-75-85 can take our foreign resources
heartbeat[10310]: 2012/11/21_15:20:01 info: give up foreign HA resources (standby).
ResourceManager[10323]:    2012/11/21_15:20:01 info: Releasing resource group: ip-10-144-75-85 updateEIP nginx
ResourceManager[10323]:    2012/11/21_15:20:01 info: Running /etc/init.d/nginx  stop
ResourceManager[10323]:    2012/11/21_15:20:01 debug: Starting /etc/init.d/nginx  stop
stoping nginx...
ResourceManager[10323]:    2012/11/21_15:20:01 debug: /etc/init.d/nginx  stop done. RC=0
ResourceManager[10323]:    2012/11/21_15:20:01 info: Running /etc/init.d/updateEIP  stop
ResourceManager[10323]:    2012/11/21_15:20:01 debug: Starting /etc/init.d/updateEIP  stop
stopping...
ResourceManager[10323]:    2012/11/21_15:20:01 debug: /etc/init.d/updateEIP  stop done. RC=0
heartbeat[10310]: 2012/11/21_15:20:01 info: foreign HA resource release completed (standby).
heartbeat[9133]: 2012/11/21_15:20:01 info: Local standby process completed [foreign].
heartbeat[9133]: 2012/11/21_15:20:18 WARN: 4 lost packet(s) for [ip-10-144-75-85] [17:22]
heartbeat[9133]: 2012/11/21_15:20:18 info: Other node completed standby takeover of foreign resources.
heartbeat[9133]: 2012/11/21_15:20:18 info: remote resource transition completed.
heartbeat[9133]: 2012/11/21_15:20:18 info: No pkts missing from ip-10-144-75-85!
heartbeat[9133]: 2012/11/21_15:20:20 WARN: 1 lost packet(s) for [ip-10-144-75-85] [22:24]
heartbeat[9133]: 2012/11/21_15:20:20 info: No pkts missing from ip-10-144-75-85!
heartbeat[9133]: 2012/11/21_15:20:28 WARN: 3 lost packet(s) for [ip-10-144-75-85] [24:28]
heartbeat[9133]: 2012/11/21_15:20:28 info: No pkts missing from ip-10-144-75-85! 
watch debug logs, watch header

You may want to use curl -I to ensure if switching occures. Here is what I see when I kill main node, aux node takes overs, and finally main node comes back:
~$ curl -I 50.17.213.152
HTTP/1.1 200 OK
Server: nginx/1.2.5
Date: Wed, 21 Nov 2012 15:10:39 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: JSESSIONID=j1qov13qkfca1pxm6usm3ulyi;Path=/
Expires: Thu, 01 Jan 1970 00:00:00 GMT
X-Whom: node_1

~$ curl -I 50.17.213.152
HTTP/1.1 200 OK
Server: nginx/1.2.5
Date: Wed, 21 Nov 2012 15:13:38 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: JSESSIONID=1ujszsyow5rc0ws4v2rir06sk;Path=/
Expires: Thu, 01 Jan 1970 00:00:00 GMT
X-Whom: node_2

~$ curl -I 50.17.213.152
HTTP/1.1 200 OK
Server: nginx/1.2.5
Date: Wed, 21 Nov 2012 15:22:09 GMT
Content-Type: text/html
Content-Length: 0
Connection: keep-alive
Set-Cookie: JSESSIONID=khbf8ovc6nr1rwr4o3hzmflt;Path=/
Expires: Thu, 01 Jan 1970 00:00:00 GMT
X-Whom: node_1

Conclusion: So we have basic HA configuration ready, we need to add toppings to this setup to be able to monitor services and respond them. You need to add a Cluster Resource Management (CRM) layer over it to be useful.

Wednesday, October 24, 2012

Adding First Slave to Running MySQL

Problem: We had a MySQL server running for sometime now, we wanted to add a slave to it.

Steps: It's really simple.
  1. Configure server-id and binary logging: vi /etc/my.cnf. If there exists none, create one.
    [mysqld]
     
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    user=mysql
     
    # master id
    server-id=1
     
    # binary logging for replication)
    log-bin=mysql-bin
     
     
    # Default to using old password format for compatibility with mysql 3.x
    # clients (those using the mysqlclient10 compatibility package).
    old_passwords=1
     
    # Disabling symbolic-links is recommended to prevent assorted security risks;
    # to do so, uncomment this line:
    # symbolic-links=0
     
    [mysqld_safe]
    log-error=/var/log/mysql/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    restart MySQL service. service mysql restart
  2. Create a replicator user in master:
    GRANT REPLICATION SLAVE ON *.* TO 'replicator' IDENTIFIED BY 'replicatorPassword';
    FLUSH PRIVILEGES;
  3. Get master binary log position:
    mysql> SHOW MASTER STATUS;
    +------------------+----------+--------------+------------------+
    | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
    +------------------+----------+--------------+------------------+
    | mysql-bin.000013 |     107|              |                  |
    +------------------+----------+--------------+------------------+
    1 row in set (0.00 sec)

  4. Take a MySQL dump:
    mysqldump --user USERNAME --password=PASSWORD --add-locks --flush-privileges \ 
    --add-drop-table --complete-insert --extended-insert --single-transaction  --database DBNAME \
    -h MASTER_HOST_IP_DNS > backup.sql
    Stop the master MySQL at this point to avoid data alteration.
  5. Configure slave: Keep the configuration same as master #1, except give it a different server-id. You may not include log-bin in slave configuration. But it is good idea to keep it. It will be useful in case where you want to make this slave master.

    Restart slave and login to it. Drop the database which you have taken dump of, if exists; and recreate it. Load the db dump.
    mysql -u USERNAME -pPASSWORD -h SLAVE_IP_DNS DBNAME < backup.sql
  6. Setup replication: Login to slave MySQL. Configure replication:
    CHANGE MASTER TO
       MASTER_HOST='MASTER_IP_OR_DNS',
       MASTER_USER='replicator',
       MASTER_PASSWORD='replicatorPassword',
       MASTER_LOG_FILE='mysql-bin.000013',
       MASTER_LOG_POS=107; 
        
    START SLAVE;


    And start master.
Done!

Tuesday, October 23, 2012

Key Learning from AWS Outage

"A service outage for Amazon’s Web Service (AWS) affected websites like Reddit, Flipboard, FastCompany, Heroku, Airbnb, Imgur, Pinterest and others late last night. It also seems that the outage affected some of Amazon’s storage units on the US East coast, and thus these websites which are hosted by AWS went down." - FirstPost

This was pretty exciting night of US Prez debate on October 22, 2012. We were having all our infrastructure and code-base made of solid steel. At least that's what we thought, until we realized our infrastructure wasn't as solid as Amazon claimed. About six hours before the the debate, 3PM EDT we started to see smaller glitches. And we realized that we can't reboot one of our EBS backed servers. We quickly launched another, but then ELB (Elastic Load Balancer) was was paralyzed. Then complete infrastructure started to fall like dominoes.

What went down? Some of the AWS's most marketed and claimed to be the best alternatives available, failed.
  1. ELB: Elastic load balancers hit hard. This is the gateway to our 20+ servers. It was hosed. It was unpredictably throwing instance in-service/out-of-service. And complete blackouts. We launched new ELBs, no avail.

    On one of the ELBs, we had to continuously re-save the instances in ELB to get them active once ELB marks them as unhealthy and throws out. (They were healthy).

    If you think the last para was bad. Prepare for the worse. One other load balancer just went inactive. Showing registering instances. Forever. Nothing worked. Then we launched another ELB, in the healthy zone us-east-1a, same problem for next 4 hour.
  2. EBS Backed Instances:  They were complete zombies. A couple if EBS backed instances would just do not do anything. SSH access, Nagios' NRPE checks all gone.
  3. EBS Volumes: Talk about tall claims, talk about EBS. Amazon market it as a better alternative to instance store backed. And this is not the first time EBS gave us a pain in the neck. It's slow for very fast IO like we do on Cassandra. It's expensive. And opposite to what Amazon says, it's not rock solid yet. The only guarantee you get is that your EBS volume will not be terminated unexpectedly. So, you will have your data safe. But, as we found, it may be unavailable/inaccessible for long enough time to spoil the game. In fact, one of the EBS volumes attached to our MySQL could not be detached until now (about 10 hours). And I do not know what's going to happen. We terminated the instance, tried force-detach. No luck.
How did we survive? We were little overcautious.
  1. Copy of a copy of a copy: We just needed one MySQL server. And stuffs were cached efficiently in MemcacheD. So, we initially thought that having one instance-store backed AMI with MySQL data directory in external EBS volume is safe enough. Since AWS says EBS won't lose data. Later, our CTO suggested to have a slave, created in much the same way. It would just be an extra layer of security.

    It turns out that we could not use MySQL master. No access. Pending queries. The data containing EBS volume was inaccessible. Cloudwatch was showing no R/RW active while we were loading MySQL with decent load. After 90 minutes of exercise, we realized that shit hit the fan. we turned our slave into new master. One front fixed.

    We were lucky that slave was not screwed the same way as master did. It was quite likely that slave would fail. It had the same configuration, lived in same availability zone.
  2. Never break a protocol: One of very crucial server was having EBS root device (not a instance-store with extra EBS volume like the last case). This machine went into coma. This was a complete setup of Wordpress, with all Wordpress tweaking, plug-ins, custom code, Apache mod-rewrite, mod-proxy el al, and MySQL.

    The survival trick lies in one of consistent behavior: whenever make any change to instance, create an AMI of it. So, we relaunched the the server from latest AMI. But data? Our daily backup script was just ready for this particular situation. So we launched a new instance, took the latest MySQL-dump file. Loaded. Updates DNS records. back up.
  3. ELB Outage: ELB outage made us helpless. There is nothing much that one can do with ELB. We were lucky that the ELB on top of our main application was not completely dead. ELB was just going crazy and marking instances out of service -- randomly. So, I jot down to the ELB screen  and refresh it regularly. If instances go out of service, I will re-save, and ELB get back up.

    You must be wondering if our instances were sick, or health-check was failing. None. These were instance-store machines, unaffected by the outage. The direct access to machines were fast and correct. There are two parts of it. 1. In past, I have observed ELB does not scale, with a surge of traffic. It ramp up slowly. I observed that during load tests. High latency of connection via ELB compared to direct access was also observed. 2. ELBs were sick during the outage as per Amazon's status.
Key Takeaway:
  1. EBS ain't that great. Stick to instance store, fault tolerant design. 
  2. Backup. Everything that matters, should be backed up in such a way they can be reloaded really fast when  required.
  3. Spread out. If you can, scatter instances in different availability zone. The best system would have the system designed in such a way that if an availability zone goes complete berserk, the user wouldn't feel the impact.
  4. ELB can be SPOF. If ELB is entry point to your application. It's single point of failure. Even if there is no downtime, we have seen poor scaling/slow ramp up on surge traffic. I am not sure if I have another alternative. But I am actively looking.

Monday, October 22, 2012

Dutch National Flag Problem

Here is the problematic flag:


It certainly does not says what the problem is. It's actually a three way partitioning problem.

Definition: An array A[1..n] needs to be rearranged in such a way that moving from left to right, we see three subgroups in this order: (-∞, low), [low, high], (high, ∞).

Algorithm: I came to this solution while working on a StackOverflow question about rearranging an array in such that all the elements less than zero are in left zero; and all the elements greater than zero lies to the right of it. My initial intuition was that I can do it by the improvising the partition part of Quicksort mechanism. It turns out while partitioning works for single pivot element. If you have a dupe of pivot, it may not work.

So, here is algorithm for three way partitioning:
A[1..n]

def three-way-part (A, low, high):
  left = 0
  right = n+1
  i = 1
  
  while i < right :
    if A[i] < low :
      swap(i, ++left)
    else if A[i] > high :
      swap(i, --right)
    else :
      i++
Complexity: Time complexity: O(n), space complexity: O(1)

Footnote:
[1] View longer discussion at Wikipedia
[2] View/download Java code from GitHub

Monday, October 15, 2012

Red Black Tree: Insertion

If this is your first attempt to understand red-black tree, please watch this video lecture. It will give you a solid foundation. Next, before you start to this tutorial, make sure:
  1. You are not in a hurry. (In that case, just use the code from the link in footnote)
  2. You do not get impatient.
  3. You do not bother why it works. It works. Once you get the algorithm and code it, it will be much simpler to visualize how it works. But, for 'why' -- you may need to research.
So, here we go

One Aim: Approximately balanced tree creation. From root to leaf, no (simple) path is more than twice as long as any other.

Two Rules: (CLRS says five, but others are trivial)
  1. Color Constraint: Red nodes will have both of it's children black. Black nodes can have either.
  2. Number Constraint: For each node, simple path to all the leaves will have same numbers of black children.
Three Situations: Couple of points to be noted before we see the situations
  1. Wherever I refer Uncle, it means (same as in common life) -- "sibling of parent node", "the other child of node's parent's parent".
  2. We assume as NIL nodes of all the leaves are black.
  3. We always insert red node. So, We are not breaking #2: Number Constraint by adding a black node, but we are breaking constaint #1: Color Constraint.
  4. Keep the root node black. You will see, it makes life easier. It does not breaks any of the constraints.
Situations:
  1. A Red uncle is a good uncle: If newly inserted node has red uncle, we can fix the situation just by fixing colors. Invert parent, grand parent, and uncle's colors. Now we need to check if grandparent is satisfying the Color Constraint.

  2. A black uncle on opposite side wants one rotation: If the uncle is black, we got to rotate. A slightly better case is when the newly inserted node is left child of its parent and uncle is right child of its parent or vice versa. In that case, you make a left-rotation or a right-rotation (rotations will be discussed later in this post). And then fix colors.

  3. A black uncle on the same side is double as dangerous: A case when newly inserted node is right child and the black uncle is also right child or both are left children, you need to make a left-rotation if inserted node is right-child or right rotation if newly inserted node is left child. This will make it situation 2.

The good thing about finding a black uncle is your quest ends there. No need to look further up the tree.

Conceptually, you are done here. All left now is some slight minor details about plumbing rotation and a couple of fine prints in algorithm like some times rotation may cause root node reassignment, how we treat a NIL node a black node, etc.

Rotation: A rotation is basically to swap a parent and child such that the BST property still holds. So, why do we call it rotation? Well, the process mutates the tree in such a way that it looks, to a casual observer, as if the tree just did a Toe Loop Jump. (well, this is the best I can explain.)

People claim there are two type of rotations -- right, when left child swaps with it's parent (i.e. moves to right); and left rotation, when a right child swaps with it's parent. Basically, if you start with a tree and right rotate a child, then left rotate the original parent the tree stays unchanged. (Figure out how and why the nodes are snipped and pasted the way they are.)



The algorithm: (Please note the below is not a valid Python code, I have use Python-esque code so that code highlighter shows key areas)
root = NIL

def insert (node)
  if root = NIL
    root = node
    root.color = BLACK
    return
    
  parent = root
  current = root
  
  while current != NIL
    parent = current
    if node.value < root.value 
      current = current.left
    else
      current = current.right
      
  if node.value < parent.value
    parent.left = node
  else
    parent.right = node
    
  node.color = RED
  fixTree(node)
  
def fixTree (node)
  
  if node.parent == RED
    uncle = getUncle(node)
    
      
    #case 1: Uncle Red
    if uncle.color = RED
      node.parent.color = BLACK
      node.parent.parent.color = RED
      uncle.color = BLACK
      fixTree(node.parent.parent)
    
    #Case 2, 3: Sirius Black.
    else if parentInLeftSubTree(node)
      #we are in left subtree, uncle is right child
      #case 3: rotate to case 2
      if node.parent.right = node
        rotateLeft(node)
        
      #case 2: color fix and rotate
      node.color = BLACK
      node.parent = RED
      rotateRight(node)
        
    else
      #we are in right subtree, uncle is left child
      #case 3: rotate to case 2
      if node.parent.left = node
        rotateRight(node)
        
      #case 2: color fix and rotate
      node.color = BLACK
      node.parent = RED
      rotateLeft(node)
      
    root.color = BLACK
  
def getUncle (node)
  if node = NIL or node.parent = NIL
    return NIL
    
  parent = node.parent
  grandParent = parent.parent
  if grandParent = NIL
    return NIL
    
  if parent = grandParent.left
    return grandParent.right
  return 
    return grandParent.left
    
def parentInLeftSubTree (node)
    if node.parent = node.parent.parent.left    
      return true
    else
      return false

"""
 *         g                               g
 *        /                               /
 *       x        Left Rotation(y)       y
 *      / \      ---------------->      / \
 *  alpha  y     <----------------     x   gamma
 *        / \    Right Rotation(x)    / \
 *     beta gamma                 alpha  beta
"""
 
def rotateLeft(y)
   x = y.parent
   g = x.parent
   alpha = x.left
   beta = y.left
   gamma = y.right
   
   y.parent = g
   if g != NIL
     if g.left = x
       g.left = y
     else
       g.right = y
   else
     root = y
     
   x.parent = y
   y.left = x 
   
   x.right = beta
   if beta != NIL
     beta.parent = x
   
def rotateRight(x)
  y = x.parent
  g = y.parent
  alpha = x.left
  beta = x.right
  gamma = y.right
  
  x.parent = g
  if g != NIL
    if g.left = y
      g.left = x
    else
      g.right = x
  else
    root = x
    
  x.right = y
  y.parent = x
  
  y.left = beta
  if beta != NIL
    beta.parent = y
Footnotes:

[1] View a foundation video lecture. it worth watching.
[2] Simulate your input on of a red-black tree creation -- here
[3] View/download Java code for red-black tree from GitHub
[4] I haven't referred much from this chapter, but it may worth reading: Introduction to Algorithms: CLRS, chapter 13: Red-Black Trees
[5] The execution of the code for a sorted input a = {8, 11, 14, 18, 22, 23, 31, 37, 41, 47, 60, 74, 84, 87, 88, 97, 99}, generates the following tree


Saturday, October 13, 2012

Why Merge Operation in MergeSort is O(n)?

I have been asked this a couple of times by the people learning or ramping on sorting algorithm. Why does merge operation in merge sort takes O(n)? Most of the times it turns out that people think it's two sorted array, just append 'em. It's intuitive -- wrong intuition.

The idea is we have two sorted array, and not two arrays with one of them having it's smallest element larger than that the other's highest. What I meant to say is, it's not [1, 3, 9, 12] and [13, 17, 19]. But it's more like [3, 12, 13, 17] and [1, 9, 19].

You see, in later case you can't append them. So, how do you combine? You follow the following algorithm:
  1. Lets say the first array has m elements, and the second has n. You make an auxiliary array of size m+n. This will store the final combined and sorted array.
  2. Assign two cursors -- i and j, pointing to the heads of  the arrays.
  3. Compare the elements at i and j; choose the smaller one. Copy the value auxiliary array, and move the smaller cursor ahead.
  4. Repeat #3, till one of the arrays exhausts; and then copy the rest of the elements from unexhausted array to the auxiliary array.
So, in our case, you compare (the [] denotes the element selected and copied to aux array) (3, [1]); ([3], 9); (12, [9]); ([12], 19); ([13], 19); ([17], 19); (--, [19]) . You see? How may comparisons? (m+n). Assuming each operation O(1). The merge process for an array of length (m+n) is O(m+n).

In best case, you will have sorted array as input. In such case, both the subarrays to be merged will just require joining. But we will apply the generic algorithm mentioned above. In that case, you will have to compare min(m,n); and then copy max(m,n) element to auxiliary array. In anyway, you are going to have total (m+n) operations.

Thursday, October 11, 2012

Search in Array for A[i] == i

Problem: We have a sorted array of distinct elements, A. Find the occurrences of elements where A[i] == i.

Discussion: It pretty much the same problem as binary search once you realize how to divide. We know
  1. The array is distinct and sorted so, for each j, A.length >= j > i we will have A[j] > A[i]; and similarly, for each j, 1 <= j < i we will have A[j] < A[i]
  2. If we found out A[i] > i, so for all j > i we will have A[j] > j (from previous point). So no point in searching in right in this case.

    Similarly, if A[i] < i, no gain in searching  in left of i.
So, here is the algorithm
searchForAiI (A, low, high)
  if low > high
      return -1
      
  mid = low + (high - low)/2
  if A[mid] == mid
    return mid
  else if mid > A[mid] 
    //all the elements in right is greater than their index, so search left
    return searchForAiI (A, low, mid-1)
 else
     //all the elements in left is smaller than their index, so search right
    return searchForAiI (A, mid+1, high)     

Footnotes:
[1] View/download Java code from GitHub and get sample run here.

Binary Search: First Occurrence of an Element

Problem: Find the first occurrence of an element in a sorted array.

Discussion: Binary search dictates that in order to find an element in a sorted array, we test the middle[1] element (call it pivot) for equality. If key is smaller than pivot, we search in subarray before pivot. If key is bigger, we search in subarray after the pivot. If equal, then we found the element. Divide an conquer.

This is a slight variant of it. We need to keep searching left until we ensure that it's the first occurrence.

Strategy: The first idea that comes to my mind, is to use regular binary search to find the element. It may not necessarily be the first occurrence. Now, we creep up the sorted array till we find an element less than the search key or we hit the boundary of the array. The complexity of this mechanism is O(log(n)) + O(k), where n is the array length, and k is repetition of key.

The other idea is to modify binary search slightly, and keep looking for the key in subarray preceding the 'pivot', if the pivot is equal to the `key` with element previous to the pivot is same as key and array boundary is not yet hit. This is O(logn). Here is the algorithm.
binary-first (A, key, low, high)
  if low > high
    return -1
    
  mid = low + (high - low)/2
  if key = A[mid]
    if mid > 1 and A[mid-1] == key
      return binary-first(A, key, low, mid-1)
    else
      return mid
  else if key < A[mid]
    return binary-first(A, key, low, mid-1)
  else
    return binary-first(A, key, mid+1, high)  
Footnotes:
[1] middle does not have to be mid. You may wish to use a random(low, high) as your pivot. However, mid as pivot makes it easy to think in terms of.

[2] View/download Java code at GitHub and sample execution here.

Wednesday, October 10, 2012

BST: Binary Search Tree

Definition: A binary tree is... umm... well, lets ditch formal definition (refer wikipedia for that). A binary tree represents a data-structure as this image:

The elements are called nodes. Each node can have at most two children. The top element is called root-node or just root. A binary search tree (BST) is one that has elements in its left subtree less than itself; and elements in its right subtree is higher than it[1]. If you remember quick sort earlier, any node in a BST is basically a pivot. Expected height[2] of a randomly build binary search tree with n elemements is lg(n) where lg is log2.

Most operations in a BST is simple and recursive. I will list the algorithms here.
Node
  value
  left
  right
  parent
  
insert(root, node)

  // use two pointer, current to the place where new node will be inserted, 
  // and parent holds it's parent reference
  val = node.val
  current = root
  parent = root
  
  // move in tree following BST rule, searching  the place where this node
  // should be. This loop ends at a leaf, giving the future parent to the node
  while current != NIL
    if val < parent.value 
      current = parent.left
    else
      current = parent.right
    
    parent = current
  
  // root was null, no tree was pre-exiting
  if parent = NIL
    node.parent 
    
  // plug in the node to correct side of the parent
  if val < parent.value
    parent.left = node
  else
    parent.right = node
    
  //assign the parent
  node.parent = parent
  
  return root
  
//iterative
search(root, value)
  current = root
  found = false
  result = NIL    
  
  while found = false and current != NIL
    if current.value = value
      result = current
      found = true
    else if value < current.value
      current = current.left
    else
      current = current.right
      
  return result

//recursive
search(root, value)
  if root = NIL or root.value = value
    return root
    
  if value < root.value
    return search(root.left, value)
  else
    return search(root.right, value)

//min is left most node, right? Think about it.
findMin(root)
  if root = NIL
    error("Nil BST?")
  min = root
  while min.left != NIL
    min = min.left
    
  return min
  
//max, same as min but in right subtree
findMax(root)
  if root = NIL
    error("Nil BST?")
  max = root
  while max.right != NIL
    max = max.right
    
  return max
  
//in order traversal, prints sorted node values
inOrderTraversal(root)
  if root = NIL
    return
  
  inOrderTraversal(root.left)
  print root.value
  inOrderTraversal(root.right)
So, these were the easier ones. With a little effort one can understand these. I will now discuss, some of the harder ones.

Successor and Predecessor: A successor of a node is a node which comes the next if we sort the tree in increasing order. A predecessor of a node is the node that comes immediately before it, if we sort the tree in increasing order. That is it. In a tree containing values 1 3 5 6 7 9, the successor and predecessor of 5 are 6 and 3, respectively.

To  find successor of a node we need to find immediate larger node. So, here is the algorithm:
  1. The right subtree of the node will contain the values larger than the node. (right?) If we find the lowest node in right subtree we will have the immediate larger node, and we get our successor.
  2. What if the node does not have a right subtree? We will have to go up in the tree somewhere. See the image below.
    1. If the node is left child, the parent will the immediate next.
    2. If the node is right child, we need to keep moving up until we get a node whose left child is one of the node's parents.

Once you get the idea behind the successor, predecessor is easy.
  1. Predecessor is highest in left subtree.
  2. If there exists no left subtree, then:
    1. If the node is right-child, immediate parent is the predecessor.
    2. If the node is left child move up till you get a node whose right child is one of the node's parents.

Deletion: You must be wondering why I am covering delete so late? That's because it's so damn hard without understanding the concepts above. So, here is the algorithm to delete a node:
  1. If the node had no child, well just knock it off.
  2. If it has one child, yank the node and connect the child to the node's parent.
  3. If it has two children, then we need to get a replacement for this node. What would be a good replacement? A good replacement is the one that on replacement, maintains the BST property. (At this moment I want you to pause, and think if you have to replace this node who could be the best candidate?) You can choose either the successor or the predecessor. The common practice is too use successor (further discussion assumes we replaced with successor). And then delete the successor from the subtree.

Now, if you read successor code properly, you will observe that if the successor chosen  from right subtree, the successor will have at most one child. So, it is NOT going to be a recursive as in "delete a node with two children, then delete the successor with two children, and so on". NO. Just place the successor's value in node. Then clip the successor using either step 1 or step 2.
successor(node)
  //if exists a right subtree
  if node.right != NIL
    return findMin(node.right)
    
  parent = node.parent
    
  // if it is left child this loop wont executeoutput here
  // else move up until a we get first right parent
  while parent != NIL and parent.right = node
    node = parent
    parent = parent.parent
    
  return parent
  
predecessor(node)
  //if exists a left subtree
  if node.left != NIL
    return findMax(node.left)
    
  parent = node.parent
  
  // if it is left child, this loop wouldn't execute
  // else move up till a left parent arrives
  while parent != NIL and parent.left = node
    node = parent
    parent = parent.parent
    
  return parent
  
delete(node)
  // leaf node
  if node.left = NIL and node.right = NIL
    if node = node.parent.left
      node.parent.left = NIL
    else
      node.parent.right = NIL
      
  // half node
  if node.left = NIL
    node.right.parent = node.parent
    if node.parent.left = node
      node.parent.left = node.right
    else
      node.parent.right = node.right
  else if node.right = NIL
    node.left.parent = node.parent
    if node.parent.left = node
      node.parent.left = node.left
    else
      node.parent.right = node.left
      
  // complete node
  successor = successor(node)
  node.value = successor.value
  delete(successor) 
Footnotes:
[1] The basic idea is to have elements less than the parent on one side and greater than the parent, on the other. And, whatever criteria you choose, it must apply to all the nodes. The most common is to have smaller in left, but if your rebel inner self does not want to conform to the standard go ahead, make a tree with higher in left.

[2] Refer Introduction to Algorithms - CLRS, Chapter 12 Binary Search Trees: 12.4 Randomly built binary search trees

[3] View/download Java code from GitHub, You can see sample output here.

Wednesday, October 3, 2012

Counting Sort

New Generation of Sorting Approach? The general idea behind sorting is to compare the elements with one another and move them in correct relative positions. This involves O(n^2) algorithms like Bubble Sort, Selection Sort, Insertion Sort; and O(n*logn) procedures like Heapsort, Merge Sort, and Quicksort. Counting sort is linear order sorting and it does NOT depend on how elements compare to each other, rather it depends on their absolute values.

Can you explain it to my 5 year old brother? Say, you have a bag of coins; values: 1 to 5. Your mom asked you to arrange the coins in increasing order making a queue. One of the mechanisms is to pick a coin and search for the right location in sorted queue, and insert at right place. (Do you see insertion sort here?). Another idea is you pick five empty buckets, mark them as 1, 2,... 5. Now, you pick coins from the bag put coin with value 1 in bucket marked as 1, coin with value 2 in bucket number 2, and so on.

buckets in counting, bucket sort :)
Buckets in Counting Sort, start adding coins :)
Once we are done with this, we empty bucket 1, make a queue of coins of value 1. Then empty bucket 2, append to the queue, and continue this till you emptied all the buckets. You now have sorted queue. This is the idea behind counting sort. And with a little variation, the same concept applies to Bucket sort and Radix sort.

Wow! I will use it everywhere! So, why do we not always use linear sorting? The caveat is, it needs extra space; something like an array at least of length (Amax - Amin). Just to give you an idea, if you have an array with a number 134,217,728, (about 134Mn; let's say you are sorting rich people by their wealth) you will have to preallocate an array of type int of length 134217728. The maths of it is 134217728*4 bytes = 536870912 bytes = 512MB of space!! So, you can't just apply it on any input. It is good as long as you know the bounds of the array and the auxiliary array can be accomodated in your machine's RAM without twitching it.

Back to business: Here is the algorithm:
countingSort(A)
  max = maxOf(A)
  //create buckets
  buckets = INT[max]
  
  //initialize buckets
  for i = 1 to buckets.length
    buckets[i] = 0
    
  //add values to appropriate bucketsI will explain in a rather layman terms. 
  for i = 1 to A.length
    currentVal = A[i]
    buckets[currentVal] = buckets[currentVal] + 1
  
  //shift the value with offset equal to the queue ahead of it.
  for i = 2 to buckets.length
    buckets[i] = buckets[i-1] + buckets[i]
   
  //put sorted values to a new Array
  B = INT[A.length]
  for i = 1 to A.length
    B[buckets[A[i]]] = A[i]
    buckets[A[i]] = buckets[A[i]] - 1
  
  return B
  
maxOf(A)
  max = INT_MIN
  for i = 1 to A.length
    if A[i] > max
      max = A[i]
  return max

Complexity: All operations are O(n).

Refer:
[1] Introduction to Algorithms - CLRS, chapter 8: Sorting in linear time.
[2] Download/view Java code from GitHub

Tuesday, October 2, 2012

Quicksort


Quicksort is a variety of divide and conquer method. The way it works is:
  1. Choose a pivot. Pivot can be any number in the array -- your wish. You may choose, first, last or middle element or any other.
  2. Rearrange the array in such manner that elements less than the pivot lies left to it, and rest right to it. This will place pivot in the location where it should be, after the sorting is done.
  3. Repeat #1 and #2 for left part and the right part.
Assume we have array A[1..N] to be sorted.
quickSort(A, start, end)
  if start <= end
    pivot = partition(A, start, end)
    quickSort(A, start, pivot-1)
    quickSort(A, pivot+1, end)
    
partition(A, start, end)
  pivotVal = A[end]
  leftIndex = start - 1 //leftIndex is the index of the last element in left chunk
  for i = start to end-1
    if A[i] < pivotVal
      leftIndex = leftIndex + 1
      swap(i, leftIndex)
  swap(leftIndex+1, end)
  return leftIndex + 1
The quickSort routine is pretty clear. Let me explain some how partition works:
  1. We select pivot as the last element.
  2. We declared a cursor leftIndex. This cursor is our handle to the place where chunk of elements having value less than the pivot value ends. Basically this cursor is pointing last element of left sub-array. We start is just before the start.
  3. We start walking on the sub-array from left to right. If we encounter an element higher or equal to pivot, we keep moving. If we find an element that is smaller than pivot, we increase leftIndex (now leftIndex points to an element that should be in right sub-array) and we swap the current element (which is less than pivot) with leftIndex. In any case, leftIndex will point to the last of the left sub-array.
  4. When the loop ends, we will have leftIndex pointing the last element of left part of sub-array. We need to place our pivot next to it.
Complexity: The worst case complexity of quick-sort is O(n^2). However, in most practical cases, it shows complexity O(n*lg(n)). Lets see how:
WORST CASE
Assume our array is sorted in decreasing order: A[1..n] for all 1 <= i < j <= n, A[i] > A[j]
Our partition will split this into left sub-array of size 0, and right sub-array of size n-1, so,

T(n) = T(n-1) + T(0) + Θ(n)

which is same as picking the lowest from remaining array, repeating it (n-1) times. Much like selection sort. 
The above equation yields: O(n^2)
---------
TAKE A RANDOM CASE
Lets assume our pivot divides the array in 9:1 ratio, we have

T(n) = T(9*n/10) + T(n/10) + Θ(n)

if you make recursion tree, the height of tree is determined by the subtree with higher number of children. So,
The height of recursion tree will be log(10/9)(n) [log n base 10/9]
At each level we walk the sub-array which costs O(n) let assume c*n is the exact function.
So, we perform c*n operation log(10/9)(n) at max.
So, the order: O(n*log(n))

The above order stays even if for a long array the partion is 99:1.
The best case would be the one where pivot divides the array into exact halves. 
Refer:
[1] Introduction to Algorithms -CLRS, Chapter 7 Quicksort
[2] Download Java code from my GitHub repository

Heap and Heap Sort

Heap: A heap is a nearly complete binary tree, that means tree is filled at all levels except probably the last one. Interesting thing about heaps are with a little set of rules on how to determine children of a node or parent of a node -- they can be easily stored in a array. There are two types of heaps max-heap, and min-heap. Any node of a max-heap is larger than its children. Thus the largest element stays on the top. A min-heap is one whose nodes are smaller than their children. A min-heap has the lowest value at the top. In all discussions here, I will assume heap implies max heap.


So, with the knowledge that (a) max heap has larger value to parent nodes, (b) it's a nearly complete binary tree. We can visualize a tree, that look like the image above.

Heap Representation: If we index nodes from top to bottom, from left to right, we would come with numbers (the blue digits in the image) that can, perhaps, work as index in an array. The only problem is how one would know what are children of a node; or what are the parents of nodes?

Look closely, you'd see the following relationship:
  1. Left child of node with index i, has index 2*i 
  2. Right child of node with index i, has index 2*i + 1 
  3. Parent of any node with index i, has index floor(i/2)
Well, now we can just forget the tree and all the baggage (pointer maintenance, recursive traversal, and extra space) associated with a tree. We'd use array to represent heap, with the following ground rules
Assume heap is repsented by array A[1..n]
LeftChild(i) = 2*i
RightChild(i) = 2*i + 1
Parent(i) = floor(i/2)
Creating a Heap: Building heap is two step process, append the new element to the array; and then check if all of it's parents (parents, grandparents, great-grandparents,...) satisfy the heap property. If not, fix it. Here is the process
  1. Append the element to the array
  2. Check if it's parent satisfy the heap-property (that each of it's children is smaller than the parent), if not swap them. Now check, if the newly swapped location satify the heap-property. Continue until you either get a parent that is OK with it or reach to top. This process is called Heap-Increase-Key or Bubble Up or Percolate Up.
Here is basic algorithm for a max-heap
A[1..N]
heapSize

parent(i)
  if i <= 1
    return NIL
  return floor(i/2)
  
left(i)
  if 2*i > heapSize
    return NIL
  return 2*i
   
right(i)
  if 2*i + 1 > heapSize
    return NIL
  return 2*i + 1

bubbleUp(i)
  parent = parent(i)
  
  while parent != NIL and A[parent] < A[i]
    swap(parent, i)
    i = parent
    parent = parent(i)
    
insert(key)
  if heapSize >= N
    return error("Not enough space")
    
  heapSize = heapSize + 1
  A[heapSize] = key
  bubbleUp(heapSize)
  
Complexity: parent, left, right are simple O(1) operations. Bubble up is a order O(lg(n)) operation. Because, in worst case, we have T(n) = T(floor(n/2)) + O(1) while n >= 2. O(1) for swap operation.

Heapify an Array: Heapify means toss and turn the array to make sure all the nodes statisfy heap-property. If you have an array, A[1..N], we need to make sure all the parents have children that are smaller than it. We should start from heapifying the last parent to root. So, who is the last parent? It's the parent of last child -- A[floor(N/2)]. We will visit all parents starting from floor(N/2) to 1, and make sure they satisfy heap property. The process is similar to bubbleUp, except here we are ajusting parents fixing the tree under it. This process of moving node to a lower position is called, to Heapify or to Bubble Down or to Percolate Down.

bubbleDown(A, i)
  maxIndex = i
  left = left(i)
  right = right(i)
  
  if left != NIL and A[left] > A[maxIndex]
    maxIndex = left
    
  if right != NIL and A[right] > A[maxIndex]
    maxIndex = right
    
  if maxIndex != i
    swap(i, maxIndex)
    bubbleDown(A, maxIndex)
    
heapify(A, N)
  
  heapSize = N
  for i = floor(N/2) to 1
    bubbleDown(A, i)

Complexity: Bubble down is similar to bubble up. In worst case, when  you will always hit the recursion block till leaf nodes. Also, you will have exactly 2*n/3 elements in worst case, where bottom level of the tree is exactly half occupied. So, we have T(n) <= T(2*n/3). This leads to order O(lg(n)).

Heap Sort: Now that we know how to convert an array into a heap, sorting is very simple.
  1. Heapify the array, making it max-heap. This will make sure largest element at index 1.
  2. Swap the first element with the element at heapSize,  decrease the heapSize by 1. Effectively, sending the top element at right location and outside the heap.
  3. Fix the heap by bubbling down the element swapped at 1.
  4. Repeat 2 and 3 until only one element remains.
heapSort(A)
  heapify(A, A.length)
  
  for i = A.length to 2
    swap(1, i)
    heapSize = heapSize - 1
    bubbleDown(A, 1)
Complexity: Heapsort is n*log(n). You see, heapify is O(log(n)) and so is bubbleDown. We are doing heapify once and bubbleDown (n-1) times. This makes total complexity of this agorithm O(n*log(n)).

Refer:
[1] Introduction to Algorithms - CLRS, chapter 6 - Heap Sort
[2] The Algorithm Design Manual - Skiena, chapter 4.3 - Heap
[3] Download Java code from gitHub

Sunday, September 30, 2012

Maximum Sum Sub-array

Definition: Suppose you have given an array A[1.. n]. You need to find out a p and a q, such that A[q] - A[p] is maximum where 1 <= p < q <= n.

Discussion: Couple of common approaches that does not really work are:
  1. look for global min (A[p]) and a global min(A[q]). This may not necessarily satisfy p < q constrain.
  2. find global min index (a) and global max index (b). Walk from right till a for max difference (X), and left till b for max difference (Y). The higher of the two will give our p and q.
Both of these theories looked promising intially until I came to a counter example:
12   10   11   20   2   6   19   1   3
               ^    ^       ^    ^
              max   p       q   min
So, what failed us? The underlying thought that we need to have global max or global min in our equation to be able to find max chnage is fundamentally flawed. What we are looking for is maximum difference between two elements when going left to right. So, with this knowledge, we are left in a bad zone of brute-force. We will compare each element with all elements that are right to it, and store the max. For an n length array, the number of comparisons would be (n-1) + (n-2) + ... + 1 = n*(n-1)/2 i.e. O(n2)

Divide and Conquer: O(n2) is as bad as it could be. We can increase performance by not looking into all possible combinations but just the ones that matter. We can divide the problem into smaller identical subproblems and take the winners of the smaller subproblems.

Another thing we observe is it's the sum of the differences that matters. Let me explain this. Take our case
Original:   12   10   11   20   2   6   19   1   3
Difference:  0   -2   1    9  -18   4   13 -18  -2
All we need to do is to find out a subarray that has maximum sum. Divide and conquer suggests to divide the array into two parts, then get the higher of maximum subarray from left subarray qand maximum subarray from right subarray. But wait, these two conditions assume that the maximum subarray is completely in either left or right subarray. What if there exists a subarray that crosses over the mid point. We should take that into consideration too. So, here is what we come up with:
maxSubarray ( A, low, high )
  
  if ( low == high )
    return ( low, high, A[low] )
  
  mid = low + (high - low)/2
  maxLeft = maxSubarray ( A, low, mid )
  maxRight = maxSubarray ( A, mid+1, right )
  maxCrossOver = maxCrossOver ( A, low, mid, high )
  return max ( maxLeft, maxRight, maxCrossOver )
  
maxCrossOver ( A, low, mid, high )
  
  leftSum = INT_MIN
  tempSum = 0
  leftIndex = mid
  
  for i = mid to low
    tempSum = tempSum + A[i]
    if tempSum > leftSum
      leftSum = tempSum
      leftIndex = i
      
  rightSum = INT_MIN
  rightIndex = mid + 1
  tempSum = 0
  for i = mid + 1 to high
    tempSum = tempSum + A[i]
    if tempSum > rightSum
        rightSum = tempSum
        rightIndex = i

  return ( leftIndex, rightIndex, leftSum + rightSum )
Refer:

[1] Java code for this algorithm, view at GitHub
[2] Inroduction to Algorithms: CLRS, Chapter 4 Divide-and-Conquer 4.1 The maximum-subarray problem

Wednesday, September 19, 2012

How to minify JS and CSS files using Maven

Motivation: Minification of JS and CSS is a common best practice when developing web application. It's done to save bandwidth and keep single unified JS and CSS file, which once loaded will be used for subsequent requests.

Prerequisites: Introductory knowledge of Maven.

Steps:  I will use Samaxes minifier plugin. Read details about this plugin here. I will follow an example. Assuming your directory structure is similar to this
myApp
 |- pom.xml
 |
 `- src
      `-- main
            |-- java
            |     `-- [more dirs ommitted]
            |-- resources
            `-- webapp
                  |-- META-INF
                  |-- static
                  |       |-- css
                  |       |     |-- style1
                  |       |     |      |-- one.css
                  |       |     |      `-- two.css
                  |       |     |-- style2
                  |       |            `-- three.css
                  |       `-- js
                  |            `---  js1
                  |                   |-- a.js
                  |                   `-- b.js
                  `-- WEB-INF

In your build tag you need to add, plugin tag under  the build tag. For the above structure it looks like this:
<plugin>
    <groupId>com.samaxes.maven</groupId>
    <artifactId>maven-minify-plugin</artifactId>
    <version>1.3.5</version>
    <executions>
        <execution>
            <id>min-js</id>
            <phase>package</phase>
            <configuration>
                <cssSourceDir>static/css</cssSourceDir>
                <cssSourceFiles>
                    <cssSourceFile>style1/one.css</cssSourceFile>
                    <cssSourceFile>style1/two.css</cssSourceFile>
                    <cssSourceFile>style3/three.css</cssSourceFile>
                </cssSourceFiles>
                <cssTargetDir>css</cssTargetDir>
                <cssFinalFile>minified.css</cssFinalFile>

                <jsSourceDir>static/js</jsSourceDir>
                <jsSourceFiles>
                    <jsSourceFile>js1/a.js</jsSourceFile>
                    <jsSourceFile>js1/b.js</jsSourceFile>
                </jsSourceFiles>

                <jsTargetDir>js</jsTargetDir>
                <jsFinalFile>final.js</jsFinalFile>
            </configuration>
            <goals>
                <goal>minify</goal>
            </goals>
        </execution>
        <!-- more execution statements -->
    </executions>                              
</plugin>
Here is brief explanation:
  1. Line #10, is relative path from webapp directory
  2. Line #12, 13, 14 are file paths with respect to cssSourceDir
  3. Line #16, is the final directory wrt webapp where the minified file goes.
  4. Line #17, is file name that minified (and concatenated) files will have. Minified file will end with .min.css or .min.js

Monday, August 27, 2012

Recover Eclipse from Crash on Ubuntu

Motivation: Sometimes, Eclipse on Ubuntu (specifically, 11.10) hangs beyond recovery. It eventually crashes or I have to kill -9 it. When, I start it again, the splash screen gets stuck at about 70%.

The only solution till now was to create a new workspace which leads to reconfiguring the plugins, downloading code from repository -- a complete waste of time.

Requisites: Access to corrupted workspace directory.

Steps: Since changing workspace fixed the problem. I knew that something has got corrupted in the workspace. With a little poking around in $WORKSPACE/.metadata, I came to realize that at least one plugin's (with .ui in its name) metadata has gone bad. So, I started hit and trial by moving files from $WORKSPACE/.metadata/.plugins to another location and starting Eclipse. (starting Eclipse causes recreation of the moved plugin metadata file).

Long story short,
  1. look into Eclipse log file in the workspace, find out which plugin is doing the mischief.
  2. Delete/move the corresponding file from $WORKSPACE/.metadata/.plugins directory.
Most likely, you will hit the same plugin as mine, so likely you will have to just delete the org.eclipse.ui.workbench and org.eclipse.ui.ide from $WORKSPACE/.metadata/.plugins. And start Eclipse.

Hope this helps. 

Wednesday, August 15, 2012

Remote Debug Your Application in Jetty on Remote Machine

Motivation: Something weird happened today, I was asked to setup remote debug mechanism on our test server so that developers can step through code in their machine while execution takes place on test environment sitting in other city.

I never imagined need of remote debug to actually debug a code on remote machine.

Prerequisite: 
  1. Standalone Jetty installation
  2. Eclipse (or your favorite Java IDE that has remote debug facility)
Steps: $JETTY_HOME denotes the directory where Jetty is installed/copied/extracted.

Edit ini file: Open $JETTY_HOME/start.ini file in your favorite text editor. uncomment --exec and after that add the debug setup. Here is a snippet of that file:
--exec
-Xmx1194m
-Xmn256m
-Xdebug
-Xrunjdwp:transport=dt_socket,address=4000,server=y,suspend=n
# -verbose:gc
Uncommenting --exec basically makes content of this file gets passed as VM parameters when Jetty is launched. Which is nothing special -- in bare minimum form this is what gets executed:
java -Xdebug -Xrunjdwp:transport=dt_socket,address=4000,server=y,suspend=n -jar start.jar

Start Jetty as usual: start Jetty by $JETTY_HOME/bin/jetty.sh start

How do I confirm if these parameters are passed? If you are on Unix, just run ps -ef | grep  -v grep | grep jetty, you will see long description with the parameters above in it:
root     30320 30308  0 17:53 pts/0    00:00:07 /usr/java/jdk1.6.0_30/jre/bin/java -Xmx1194m -Xmn256m \
-Xdebug -Xrunjdwp:transport=dt_socket,address=4000,server=y,suspend=n \
 -Djetty.home=/opt/apps/jetty-hightide-7.5.0.v20110901 -cp \
/opt/apps/jetty-hightide-7.5.0.v20110901/lib/jetty-xml-7.5.0.v20110901.jar: \
/opt/apps/jetty-hightide-7.5.0.v20110901/lib/servlet-api-2.5.jar: \
/opt/apps/jetty-hightide-7.5.0.v20110901/lib/jetty-http-7.5.0.v20110901.jar:
...
Connecting to Remote Debugger: In Eclipse, go to Run > Debug Configurations... menu. Double click Remote Java Application, give an appropriate name, link to relevant project, provide the host-name, mention the port that you have started your remote debugger on. So, from step#1, in my case this will be 4000.

Here is the screen shot:

Apply and start debugging.

A word about firewall: Since it makes TCP connection to your Jetty make sure, you have debug port -- 4000, open on the host machine. In AWS EC2, you can allow port 4000 in your security group.

Friday, August 3, 2012

Running Shell Script on Remote Machine

Motivation: Most of the time running a command or a group of commands on a remote machine is a thoughtless process -- ssh username@remoteIP.or.dns "command1 param1 param2; command2 param3;". What throws you out is when you have 40 lines of convoluted script to be run on 37 machines! You won't use this approach. You'd probably scp the script to remote machine and then call it using ssh as mentioned earlier, and then probably delete this file. It's three step process. If you haven't automated stuff, it's a pain.

A better way is to automate this process and do it in a single connection.

Prerequisites:
  1. Executable access to the remote machine.
  2. Some knowledge of shell script.
Steps: It's just one line command. I will write the breakdown.
  1. Keep the shell script that you wanted to execute remotely, on your local machine. cat this file.
  2. The cated file is piped to ssh command that...
  3. Writes this file to a location of your choice on remote machine, say /tmp/remote.sh  and...
  4. Changes mode chmod to executable then...
  5. Calls this script with parameters, if any, and finally...
  6. Deletes this file from remote location. 
The most interesting part is that all this is done in a single connection. Here is the code. The \ is continue command to next line. You may want to keep the whole command in a single line.

cat runremote.sh | ssh -i /home/naishe/.keys/remotekey.pem \
root@server1.naishe.in "cat > /tmp/remote.sh ; chmod 755 /tmp/remote.sh ; \
/tmp/remote.sh param1 param2 ; rm -f /tp/remote.sh " 

You see, it's pretty simple.

Tuesday, July 24, 2012

Backup your MySQL to Amazon S3

Motivation: If ever database gets corrupted, or the MySQL goes boink, the data loss can be minimized.
Prerequisites: 
  1. Familiarity with shell script
  2. S3 bucket and credentials
  3. Any S3 tool that you're familiar with. Here I have used s3cmd . You may use weapon of your choice, may be JetS3t or anything else.
Steps:
  1. Create a clean directory where all the operations will take place. Call it staging_dir
  2. Take mysqldump of desired database. Or all the database. Learn here what you want: mysqldump usage document
    mysqldump --user MYSQL_USER --password=MYSQL_PASSWORD --add-locks --flush-privileges \
    --add-drop-table --complete-insert --extended-insert --single-transaction \
    --database DB_TO_BACKUP > FILENAME_FOR_SQL_STATEMENTS

    where
    • --add-locks: Surround each table dump with LOCK TABLES and UNLOCK TABLES statements. This results in faster inserts when the dump file is reloaded.
    • --flush-privileges: Send a FLUSH PRIVILEGES statement to the server after dumping the mysql database. This option should be used any time the dump contains the mysql database and any other database that depends on the data in the mysql database for proper restoration.
    • --complete-insert: Use complete INSERT statements that include column names. 
    • --extended-insert: Use multiple-row INSERT syntax that include several VALUES lists. This results in a smaller dump file and speeds up inserts when the file is reloaded.

      you may also want to use -h HOSTNAME, if your cron job is going to run on a remote machine.
  3.  Make a tarball of this file.
    tar cf TAR_FILENAME --directory=STAGING_DIR DUMPED_SQL_FILE_NAME
    
    
  4. Upload to S3, (I assume you have configured your S3 tool whatever you use). With s3cmd, it's simple. s3cmd put FILENAME s3://BUCKETNAME/SUB/DIRECTORY/
  5. The tricky part is how would you know, what file was uploaded X days ago? With richer (scripting) languages like Python, or Groovy, it's lot more easy to enumerate the files in the backup folder. And if files are more than X, delete the excessive files. To me, it's tricky to this in shell script. So, I ended up in having smart filename instead.

    I name the files as www.mywebsite.com.YYYY-MM-DD.sql.tar. So, to delete the file that was created X days ago. All I have to do it to generate the date of X day ago and place in similar filename structure and then call s3cmd del s3://BUCKETNAME/SUB/DIRECTORY/OLD_FILENAME
Here is the complete script:
#!/bin/bash
set -e
is_quiet="false"

mysql_user="naishe"
mysql_pass="mysqlp@ssword"
mysql_db="appdb"

staging_dir="/tmp/mysql_bkp"
logfile="/var/log/mysql.bkp"

s3_bucket="backupbucket/www/mysql"
aws_access_key=""
aws_secret_key=""

keep_for_days=30
today=$( date '+%Y-%m-%d' )
remove_bkp_on=$( date -d "${today} -${keep_for_days} days" '+%Y-%m-%d' )
filename="www.naishe.in"
write_to="${filename}.${today}.sql"

delete_file="${filename}.${remove_bkp_on}.sql.tar"
fullpath_sqlfile="${staging_dir}/${write_to}"
fullpath_tarfile="${fullpath_sqlfile}.tar"

# conditinal printing
function echome(){
    if [ "${is_quiet}" != "true" ] ; then 
        echo -e "\033[0;34m\xE2\x9C\x93 $1\033[0m"
    fi
}

function start_bkp(){
    echome "--- MySQL Backup on $( date '+%b %d, %Y' ) ---"
    #delete staging dir if any
    echome "Cleaning staging directory: ${staging_dir}"
    rm -rf ${staging_dir}
    mkdir ${staging_dir}

    #take a MySQL dump
    echome "Starting MySQL backup..."
    echome "mysqldump --user ${mysql_user} --password=${mysql_pass} --add-locks --flush-privileges --add-drop-table --complete-insert --extended-insert --single-transaction --database ${mysql_db} > ${fullpath_sqlfile}"

    mysqldump --user ${mysql_user} --password=${mysql_pass} --add-locks --flush-privileges --add-drop-table --complete-insert --extended-insert --single-transaction --database ${mysql_db} > ${fullpath_sqlfile}

    #tar
    echome "Creating tar file..."
    echome "tar cf ${fullpath_tarfile} --directory=${staging_dir} ${write_to}"
    tar cf ${fullpath_tarfile} --directory=${staging_dir} ${write_to}

    #upload to S3
    echome "Uploading to S3..."
    echome "s3cmd put ${fullpath_tarfile} \"s3://${s3_bucket}/\""
    s3cmd put ${fullpath_tarfile} "s3://${s3_bucket}/"

    echome "Deleting the file uploaded ${keep_for_days} days ago..."
    echome "s3cmd del \"s3://${s3_bucket}/${delete_file}\""
    s3cmd del "s3://${s3_bucket}/${delete_file}"

    #delete staging folder
    echome "Removing staging directory: ${staging_dir}"
    rm -rf ${staging_dir}
    
    echome "--- MySQL Backup Completed ---"
    echome "   "
}

start_bkp >> ${logfile}


Setting up Cron job: Place this script somewhere safe. Provide appropriate permission to it. I kept it under /opt/scripts/mysql_backup.sh, and appended this line to crontab -e

0 4 * * * /opt/scripts/mysql_backup.sh
this runsthe script daily at 4AM.

Thursday, July 19, 2012

Updating Version in Maven pom.xml

Sure you can do it manually. And it worked flawlessly until now. Wait until you have a multi-module project and yet dared to do it manually. Sooner than later you will see you have made a typo -- you renamed the version to 1.2.3-SHAPSHOT

Anyway, the most common Google search results into mvn release:update-revisions which requires the pom.xml to be in SNAPSHOT revision, and it's release, you may not want to perform a release.

A better (and correct) alternative is to use Maven Versions plug-in. It updates the versions of submodules too. Here is how to use it
mvn versions:set -DnewVersion=1.2.3-SNAPSHOT

Saturday, July 7, 2012

Change Nagios' Default Home Page

My last install of Nagios on CentOS went all fine except the home page of Nagios was not loading the main container with welcome page. It was blank. I had to change default home page to something meaningful. I changed it to "tactical overview" page.

In /usr/local/nagios/share/index.php or /usr/share/nagios/htdocs/index.php file, change this line
$corewindow="main.php";
to:
$corewindow="cgi-bin/tac.cgi";
If you do not find your index.php in above two locations, try using this command to locate it on your disc:
locate index.php | grep nagios

Here are a couple of paths that you may be interested in
"Tactical Summary": cgi-bin/tac.cgi
"Map"             : cgi-bin/statusmap.cgi?host=all
"Hosts"           : cgi-bin/status.cgi?hostgroup=all&style=hostdetail
"Services"        : cgi-bin/status.cgi?host=all
"Summary"         : cgi-bin/status.cgi?hostgroup=all&style=summary
"Problems"        : cgi-bin/status.cgi?host=all&servicestatustypes=28
"Availability"    : cgi-bin/avail.cgi
"Trends"          : cgi-bin/trends.cgi
"Summary"         : cgi-bin/summary.cgi