Find Median from Data Stream
Last updated
Last updated
Design a data structure that supports the following two operations
void addNum(int num) - Add a integer number from the data stream to the data structure
double findMedian() - Return the median of all elements so far
The median is defined as the middle element of an ordered list. If the list is even, the median is the mean of the two middle values.
Insertion sort is good choice to start with
addNum is O(n)
findMedian O(1)
findMedian()
is already optimal. We can figure out ways to optimize addNum(num)
.
A valuable insight is that we don't have to keep the entire list sorted as long as we know where the median value is.
A classic approach is to use 2 heaps. One min heap to keep all the bigger elements. One max heap to keep the smaller elements. As long as we can guarantee, at all times, the smallest element in the min heap is bigger than the biggest element in the max heap and the sizes are about the same (no more than 1), then we can compute the median in O(1).
Both pop()
and push()
for heaps are O(logn). This will bring the time complexity down to
addNum is O(logn)
findMedian O(1)
A visual of two heaps will help with the understanding.
An alternative data structure to use is a balanced binary search tree with time complexity
addNum is O(logn)
findMedian O(logn)
If all integer numbers from the stream are between 0 and 100, how would you optimize it?
Use a hashmap to the bucket sort thing
Then a total count to see where the median falls
Iterating over a constant count is O(1)
If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?
I think the exceptions are what making this interesting
We can do a ranged count strategy (i.e. (-inf, -1], [0, 100], [101, inf)
)
When numbers are large, the median will be between 0 and 100 anyway