Sunday, September 30, 2012

Maximum Sum Sub-array

Definition: Suppose you have given an array A[1.. n]. You need to find out a p and a q, such that A[q] - A[p] is maximum where 1 <= p < q <= n.

Discussion: Couple of common approaches that does not really work are:
  1. look for global min (A[p]) and a global min(A[q]). This may not necessarily satisfy p < q constrain.
  2. find global min index (a) and global max index (b). Walk from right till a for max difference (X), and left till b for max difference (Y). The higher of the two will give our p and q.
Both of these theories looked promising intially until I came to a counter example:
12   10   11   20   2   6   19   1   3
               ^    ^       ^    ^
              max   p       q   min
So, what failed us? The underlying thought that we need to have global max or global min in our equation to be able to find max chnage is fundamentally flawed. What we are looking for is maximum difference between two elements when going left to right. So, with this knowledge, we are left in a bad zone of brute-force. We will compare each element with all elements that are right to it, and store the max. For an n length array, the number of comparisons would be (n-1) + (n-2) + ... + 1 = n*(n-1)/2 i.e. O(n2)

Divide and Conquer: O(n2) is as bad as it could be. We can increase performance by not looking into all possible combinations but just the ones that matter. We can divide the problem into smaller identical subproblems and take the winners of the smaller subproblems.

Another thing we observe is it's the sum of the differences that matters. Let me explain this. Take our case
Original:   12   10   11   20   2   6   19   1   3
Difference:  0   -2   1    9  -18   4   13 -18  -2
All we need to do is to find out a subarray that has maximum sum. Divide and conquer suggests to divide the array into two parts, then get the higher of maximum subarray from left subarray qand maximum subarray from right subarray. But wait, these two conditions assume that the maximum subarray is completely in either left or right subarray. What if there exists a subarray that crosses over the mid point. We should take that into consideration too. So, here is what we come up with:
maxSubarray ( A, low, high )
  if ( low == high )
    return ( low, high, A[low] )
  mid = low + (high - low)/2
  maxLeft = maxSubarray ( A, low, mid )
  maxRight = maxSubarray ( A, mid+1, right )
  maxCrossOver = maxCrossOver ( A, low, mid, high )
  return max ( maxLeft, maxRight, maxCrossOver )
maxCrossOver ( A, low, mid, high )
  leftSum = INT_MIN
  tempSum = 0
  leftIndex = mid
  for i = mid to low
    tempSum = tempSum + A[i]
    if tempSum > leftSum
      leftSum = tempSum
      leftIndex = i
  rightSum = INT_MIN
  rightIndex = mid + 1
  tempSum = 0
  for i = mid + 1 to high
    tempSum = tempSum + A[i]
    if tempSum > rightSum
        rightSum = tempSum
        rightIndex = i

  return ( leftIndex, rightIndex, leftSum + rightSum )

[1] Java code for this algorithm, view at GitHub
[2] Inroduction to Algorithms: CLRS, Chapter 4 Divide-and-Conquer 4.1 The maximum-subarray problem

1 comment:

  1. There is a minor mistake in the difference table above. sumArray[8] should be 2 and not -2.