Wednesday, July 27, 2016

Reading Spatial Data Into a Pandas Dataframe

At 10.4.x scipy is included in your basic python install, which is great!

Working with Pandas DataFrame can make life easy, especially if you need to do it quickly.

import arcpy
import pandas as pd
import sys
def trace():
        trace finds the line, the filename
        and error message and returns it
        to the user
    import traceback
    tb = sys.exc_info()[2]
    tbinfo = traceback.format_tb(tb)[0]
    # script name + line number
    line = tbinfo.split(", ")[1]
    # Get Python syntax error
    synerror = traceback.format_exc().splitlines()[-1]
    return line, __file__, synerror

with arcpy.da.SearchCursor(r"d:\temp\scratch.gdb\INCIDENTS_points",
                           ["OBJECTID", "SHAPE@X", "SHAPE@Y"]) as rows:
        df = pd.DataFrame.from_records(data=rows,
        print ((df.columns[1], df.columns[2]))
        print ((df[df.columns[1]].mean(), df[df.columns[2]].mean()))

        print trace()

Like normal, you create an arcpy.da cursor, then pass that generator into the DataFrame's from_records().  Once the data is loaded, like in my example, you can perform operations on the frame itself.  For example let's say you needed the mean location of points.  This can be quickly done by loading in all the location XY columns (SHAPE@X and SHAPE@Y) and performing a mean call on each column.

With this method you can't control the chunksize when loading the data, so be careful of your memory.

No comments: