Performance matters, don't use string manipulation, use math wherever possible

June 28th 2023 | ~ 4 minute read

Preface

Ah, performance. Always a hotly debated topic among programmers. Does it even matter when computers are so fast these days? I'd argue that it does, and that you're doing everyone a disservice if you don't take ample care to ensure smooth sailing for everyone involved.

The problem

Here's an issue I recently came across while writing some python code. I had to do some work on an integer representing a bank account number that is structured in the following fashion:

AAA-BBBBBBBBBBBBB-CC

AAA -> Bank identifier (3 digits)
BBBBBBBBBBBBB -> Account number (13 digits)
CC -> checksum (2 digits)

Notice the checksum at the end? It's used to validate the integrity of the whole number. It's calculated using the following formula:

98 - bank_identifier-account_number * 100 % 97

You take the first 16 digits of the whole thing and then subtract the modulus of that x 100 with 97 from the number 98. This is your checksum.

It's then simple to verify that the resulting checksum is the same as is written at the end of the bank account number.

The solution

Let's take a moment to think about how we might implement this in python. The first thing that comes to mind to a good chunk of people is something like this:

def validate_account_no_str(account_no: int) -> bool:
    account_no_str = str(account_no)
    csum = int(account_no_str[-2:])
    account_no_without_csum = int(account_no_str[0:16])
    calculated_csum = 98 - account_no_without_csum * 100 % 97

    return csum == calculated_csum

Cast the number to a string and then using string manipulation split it into parts we need and then cast those bits back into integers to do the calculation.

While this works, it's wasteful, you end up doing all sorts of unnecessary allocations and type conversions, all of which is slowing the code down significantly.

A better way to do this is to think about the problem in a different way. We're dealing with integers here. Is there a simple method to do all of this with some math without resorting to type conversion? Yes, there is.

def validate_account_no(account_no: int) -> bool:
    csum = account_no % 100
    account_no_without_csum = account_no // 100
    calculated_csum = 98 - account_no_without_csum * 100 % 97

    return csum == calculated_csum

To get the last two digits representing the checksum, just do a simple modulus operation with 100. Likewise, to get the first 16 digits, just do an integer division with 100. Then, just apply the same calculation as before.

The performance difference

Let's pit these two solutions against each other with a little test. We'll use the timeit python module to run them each 10 million times, and then we'll compare how long each one took to complete. Here's the complete code:

#!/usr/bin/env python3

import timeit

def validate_account_no_str(account_no: int) -> bool:
    account_no_str = str(account_no)
    csum = int(account_no_str[-2:])
    account_no_without_csum = int(account_no_str[0:16])
    calculated_csum = 98 - account_no_without_csum * 100 % 97

    return csum == calculated_csum

def validate_account_no(account_no: int) -> bool:
    csum = account_no % 100
    account_no_without_csum = account_no // 100
    calculated_csum = 98 - account_no_without_csum * 100 % 97

    return csum == calculated_csum

def main():
    print(
        timeit.timeit(
            lambda: validate_account_no(200000000012345600), 
            number=10_000_000,
        )
    )

    print(
        timeit.timeit(
            lambda: validate_account_no_str(200000000012345600),
            number=10_000_000,
        )
    )

if __name__ == '__main__':
    main()

And, now for the results:

1.7433777059995919
4.486554025999794

The integer version took a little over 1.7s to complete, while the string version took a staggering 4.5s! The integer version is almost 2.6 times faster with 10 million iterations.

Conclusion

The key takeaway is that sometimes, the "obvious" solution just isn't as optimal as what you can get if you put just a little more thought into the solution.

In a world where dwindling attention spans are a given, it's that much more imperative to write faster, more performant code instead of relying on hardware improvements that many are still unable to access. Our users deserve better!