Mitchell J. Wagner

Mitch Wagner

I recently attempted to derive the function that yields Excel's column numbering, which adheres to the following pattern.

A, B, ..., Z, AA, AB, ..., ZZ, AAA, AAB...

"Pssh, base 26, easy." I thought to myself.

import string

def num_in_base(n, b):
    if n == 0:
        return [0]
    digits = []
    while n:
        digits.append(n % b)
        n //= b
    return digits[::-1]

m = dict(zip(range(0, 26), string.ascii_uppercase))
def base26_num_to_excel(digits):
    return "".join(m[d] for d in digits)
A, B, ..., Z, BA, BB, ...

… whoops, got cocky. I flailed around a bit trying to get a handle on the real behavior here… "It's base 26, of course, and 'A' is '1'… but then what's '0'? Also 'A'?…"

Life's short, so I cheated, and found the following:

import string

def num_to_excel(n):
    c = ""
    while n >= 0:
        c = string.ascii_uppercase[n % 26] + c
        n = n // 26 - 1
    return c

Frustratingly, that seems to do it.

Bringing it back to base 10 makes it obvious what's going on here. Excel's column numbering is the decimal equivalent of counting like

0, 1, ..., 9, 00, 01, ... 09, 10, 11, ...

So you snag the least-significant digit like you're counting in base 26. Then, you start counting off with a second digit that starts at 0. The n // 26 lops of a digit at a time, but you need to compensate for those extra 26 values ("BA" should be "AA"), so you subtract one from the next digit.

Each non-ones-place Excel digit is indeed exactly one unit behind its corresponding base 26 digit:

  • Two-digit base 26 numbers start at "BA". Two-digit base Excel numbers start at "AA". Thus, when encoding the same value, two-or-more-digit base Excel numbers are always at least 26 units behind their counterpart. Dividing the encoded value by 26 and subtracting one accounts for this difference.
  • Three-digit base 26 digits start at "BAA". Three-digit base Excel numbers start start at "AAA". Thus, when encoding the same value, three-or-more-digit base Excel numbers are always at least 702 units behind their counterpart. Assuming we've accounted for the 26-unit difference from the second digit, we can find the third digit of the base Excel value by simplying subtracting out a further 262.

And so on. The loop is an exercise in iteratively removing accumlated difference.

Conclusion

Don't think too hard or you'll spend an evening or more of your life pondering Excel columns.