Post

LC 295 - Find Median from Data Stream

LC 295 - Find Median from Data Stream

Question

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

Example 1:

1
2
3
4
5
Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation MedianFinder medianFinder = new MedianFinder(); medianFinder.addNum(1); // arr = [1] medianFinder.addNum(2); // arr = [1, 2] medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2) medianFinder.addNum(3); // arr[1, 2, 3] medianFinder.findMedian(); // return 2.0

Constraints:

  • -105 <= num <= 105
  • There will be at least one element in the data structure before calling findMedian.
  • At most 5 * 104 calls will be made to addNum and findMedian.

Follow up:

  • If all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?
  • If 99% of all integer numbers from the stream are in the range [0, 100], how would you optimize your solution?

Question here and solution here

Solution

concept

The key idea is to keep 2 heaps, and these 2 heaps is roughly the same size:

  1. the first heap to keep track all the smaller numbers (this is a max heap)
  2. the second heap to keep track all the bigger numbers (this is a min heap) with these 2 setup, finding the median is easy since we can get the top of the 2 heap and compute the median (if the total number is odd, then it is one of the number on top of these 2 heaps, if even, then it is the average between these two numbers)

When we add in the numbers in the heap, we can just add the number into the max heap (i.e. the smaller portion) and then we need to take care:

  1. order check: make sure all number in the small heap is smaller than the large heap
  2. make sure the size is about the same, this is important for medium computation.

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class MedianFinder:

    def __init__(self):
        self.small = [] # max heap
        self.large = [] # min heap

    def addNum(self, num: int) -> None:
        heapq.heappush(self.small, -1 * num)

        # check order
        if self.small and self.large and -1*self.small[0] > self.large[0]:
            tmp = heapq.heappop(self.small)
            heapq.heappush(self.large, -1*tmp)

        # check uneven size
        if len(self.small) > len(self.large) + 1:
            tmp = heapq.heappop(self.small)
            heappush(self.large, -1*tmp)
        if len(self.large) > len(self.small) + 1:
            tmp = heapq.heappop(self.large)
            heappush(self.small, -1*tmp)
        
    def findMedian(self) -> float:
        # odd total number
        if len(self.small) > len(self.large):
            return -1*self.small[0]
        elif len(self.small) < len(self.large):
            return self.large[0]
        else: #even total number
            return (-1*self.small[0] + self.large[0])/2

# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()

Complexity

time: $O(m* \log n)$
space: $O(n)$

This post is licensed under CC BY 4.0 by the author.