Deriving Average Without Totals

September 21, 2020

Even an old dog like me learns something new every day.

Let’s say you need a program that receives a sequence of numbers and output the current average. How do you do this efficiently? The naive solution is to to keep track of two variables:

total   # add up all numbers seen so far
count   # a count of how many numbers seen so far

This way, at any step, you can calculate the current average using total/count.

It turns out there is another way to do this that allows you to produce the average. You will track

current_average    # the value of `average` that was last calculated
count              # a count of how many numbers seen so far

With each new number that you see, you can recalculate current_average using the formula

current_average_new = current_average_previous + (( new_number - current_average_previous ) / count)

In terms of space complexity, the storage is the same: two variables. However, if you need to show the average repeatedly, the second approach only performs calculation once. Nice!

A good step-by-step explanation is stolen from Henry @ Math StackExchange $Screenshot$

And inspired by a snippet from Anson Wong’s multi-armed bandit code.

math