Friday, April 26, 2013

Working with numpy's Structured Array and numpy.dtype

In my previous post, I showed how to quickly get access data (2007/2010) into a file geodatabase without creating an ODBC connection in ArcCatalog/Map using pyODBC.  You might have noticed that I used numpy to create a table pretty easily, but you might be wondering what are the dtypes?

numpy.dtype are data type objects that describe how the bytes in a fixed-size block of memory are seen by the system.  So is the data a string, number, etc...  It describes the follow aspects of data:

  1. Type of data (integer, float, object, ...)
  2. size of data (number of bytes)
  3. describes an array of data
  4. names of fields in the record
  5. if the data is a sub-array, it's shape
Getting the dtype formats for common python data types if fairly easy in python. The numpy.dtype() will return the proper format for almost any python data type:

>>> print numpy.dtype(str)
|S0

For array protocol type strings, there are various data types supported:

'b'Boolean
'i'(signed) integer
'u'unsigned integer
'f'floating-point
'c'complex-floating point
'S''a'string
'U'unicode
'V'anything (void)
(source: numpy help documents)

This allows you to specify thing like string length.
Example:
>>> dt = numpy.dtype('a25')  # 25-character string

After you know what your data types are, you will want to associate these types with the fields in the records.  I prefer to use the optional dictionary method where there are two keys: 'names' and 'formats'.  You would then pass this information to create your 'structured array'.

Example:

>>> dt = numpy.dtype({'names': ['r','g','b','a'],

     'formats': [numpy.uint8, numpy.uint8, numpy.uint8, numpy.uint8]})

>>> colors = numpy.zeros(5, dtype = dt)

>>> print colors

[(0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0) (0, 0, 0, 0)]




There are many ways to declare dtypes for your data, and you can read them all here.
More on structured arrays in numpy can be found here.

Enjoy