# A Discovery About the Collatz Sequence

By calculating, and storing, Collatz values in sequence, the sequence calculations can be simplified by using lookups.

For example, the Collatz value of 4 is 2. So, as we calculate the Collatz value of 5, we get:

5 – 16 – 8 – 4

which is 3 iterations. Once we have a value less than the starting value, we can do a lookup and add the current iteration count to the lookup result. Since the Collatz value of 4 is 2, that means that the Collatz value of 5 is 5. Or, more explicitly:

5 – 16(

) – 8(1iter) – 4(2iter) – (3iter)2 more iterations after Collatz[4], totalling 5 iterations

On my own system, storing Collatz values into an array yielded an incredible improvement. Calculating the first 10,000,000 Collatz values, using a very optimized assembly routine but *no* caching array, takes 3.34 seconds on my system. Using a caching array, even with a generically-optimized C routine, the same set of calculations takes 0.23 seconds, giving a 93% improvement.

The drawback to a plain Standard C approach, is that the cache allocation can't be dynamically re-allocated using the malloc() family, beyond available core (memory + swap). Yes, the Posix mmap() function can access large files beyond available core, but they have two drawbacks: they have no way to make those files grow as needed, and they can't be larger than the largest linear-memory hole.

This approach calls for database storage. A database grows as needed, and can hold arbitrary data. In this case, the data consists of two integers, the number and its Collatz value. For a small list of Collatz values, this approach won't be efficient, but as the list grows beyond two billion, a database on disk will prove much more scalable.

I know, the mathematicians already know all this. I'm not a mathematician; I'm merely a hobbyist programmer, enjoying these independent discoveries.

## Leave a Reply