To calculate the median exactly, we will need to keep all the elements in the list. While we could try to keep all of them in order (which wouldn’t actually be so bad), that’s more than we really need. All we really need is to keep track of which elements are above the median, and which are below.
We can keep track of the elements in two heaps. The values below the median are stored with a negative sign, so that the largest element sits at the top of the heap - heapq
always gives us the smallest value when we pop, and we actually want the largest. Each element is pushed on to the appropriate heap, and then the heaps are balanced. When both heaps have the same number of elements, the median is just the average of the tops of the two heaps. We choose to require the small heap to be as big or bigger than the large heap, so when there are an odd number of elements, the median will be on the top of the small heap.
Note that a heap does not preserve the actual sorted order, it implements the heap as a special ordering on a list. It is best when you need to return the first n
largest (or smallest) values. Since it is just a list underneath, we can get the sorted order if we need it using the usual list functions.
def heapsort(heap):
return sorted(heap)
Also, the output format they are expecting is slightly different than what’s asked for in the instructions. They want a function that takes the ordered list of inputs and returns a list of the running medians, as floats. They take care of the actual printing of results.
Here is the reference solution:
import heapq
def runningMedian(a):
low = []
high = []
median = 0
running_medians = []
for number in a:
if number < median:
heapq.heappush(low, -number)
else:
heapq.heappush(high, number)
# Balance the lists - we only need to adjust one element
# since we are maintaining the balance every iteration
if len(low) > len(high) + 1:
heapq.heappush(high, -heapq.heappop(low))
if len(high) > len(low):
heapq.heappush(low, -heapq.heappop(high))
if len(low) == len(high):
median = (high[0] - low[0])/2
else:
median = -low[0]*1.0
running_medians.append(median)
return running_medians