Anthony Scopatz

I think, therefore I amino acid.

Col Me Sometime (or Why NumPy is the Best Lover)

Whether you know, or care, most database software has hard coded limits on the number of columns that you can store in a single table.

Often, if you hit this limit for some reason, it indicates that ‘you are doing it wrong.’  However, this is not always the case.

For instance, say you want to store some information that is a function of isotope.  Well, exist some 3000+ nuclides that are known to exist in nature.  If you want to store this data in a single table and your database allows 4096 columns per table, you are in luck!  However, if you go down by a single power of two, there are only 2048 columns per table and you are in trouble.

Below is a list of different table structures and their implicit table limits:

That’s right; NumPy structured (or record) arrays seem to allow as many columns as you can fit into memory!  Below is some code that demonstrates this.  I tested it up to a million columns, but I am certain that it could handle a few more orders of magnitude.

This situation is peculiar since traditional databases are limited by disk size, not memory, like NumPy.  (HDDs, of course, being much larger than RAM.)

Go to the Source

import numpy as np
import uuid
from random import sample, randint
population = [int, float, 'S']

def rand\_dtype(n):
    dt = []    for i in xrange(n):
    dt.extend( [(str(uuid.uuid1()), np.dtype(d)) for d in sample(population, 1)] )
    dt = np.dtype(dt)
    return dt

if __name__ == "__main__":
    dt = rand\_dtype(1000000)
    z = np.zeros(10, dtype=dt)

table4one

Comments