Académique Documents
Professionnel Documents
Culture Documents
Introduction
This toolset and its scripts allow you to determine several variants of the Euclidean minimum
spanning tree (EMST) for point data within ArcMap, or from within Python programs. The
scripts are:
EMST_helper.py
This module contains the base code for producing the EMST. The Prim’s algorithm is used
and it was translated from Avenue code produced by William Huber (Quantitative Decisions).
Other functions allow for assembling the EMST in different ways or for obtaining other
information from it. For example, a treed version of the EMST can be produced whereby
branches of it emanate from a user-provided start node.
EMST_treed.py
A variant of the EMST script which allows for multiple EMSTs to be created for a space-
delimited list of origin node IDs. Alternately, points can be spatially allocated to the origin
nodes and a desire line connecting the input points to the origin node is created. The data
are radially sorted.
EMST.py
This module is specific to ArcMap and provides the necessary code to read point shapefiles
and produce the EMST as a point shapefile. This scripts imports EMST_helper.py,
arcpy_helper.py and groupPnts_helper.py as well as other base Python modules.
arcpy_helper.py
Provides utilities for working with the geoprocessor object.
groupPnts_helper
Allows for the grouping of points based upon an attribute. This enables the user to perform
a task on subsets returning an output for each set. In this case, a point dataset could be
subdivided based upon some class and the EMST determined for each one.
The toolbox enables one to select the point data and choose the type of EMST needed.
Examples follow.
A layer of input points is required and an output polyline shapefile is required. The Grouping
field option allows one to produce individual EMSTs for groups of points based upon a text or
numeric attribute field. The Start node option is an option which allows the user to produce a
treed version of the EMST such that branches are formed which emanate outward from the
origin or trunk node. The input to this option is the feature ID number (FID) from the input
shapefile. This option cannot be used in conjunction with the Grouping field option.
The last option is a checkbox which allows you to produce a multipart and a singlepart EMST.
The various options and their outputs are shown in the follow sections.
Basic EMST
In this implementation, the input points are sorted lexicographically prior being used in the
EMST tool. There is no need to do this, but you can use it to your advantage if you want to see
the migration of the EMST from left to right and bottom to top. In the above example, you will
see that the first line (FID = 0) has a FromID of 8 and a ToID of 9. One would expect the EMST
to indicate the reverse and produce FromID - ToID pairs that are also sorted accordingly. This
isn’t the case, but a version exists that allows for line, and hence, point sorting.
Treed EMST
In this version, it is apparent that there are two branches as indicated in the GroupedBy column.
If point 5 is chosen as the principle node, you end up with the same EMST, but the number of
branches and their direction, are different, as shown in the two figures below.
Two other optional versions can be created, one which is a multipart shape (produced by a
dissolve) and the other which breaks up the multipart shape into its single part segments. The
latter can contain interconnected points between the terminal nodes, but it will generally lack
the directionality of the treed version. The tables for those are shown below. Note that the
FromID and ToID information is not present in either version.
Other Connections
The Connections tool allows the user to produce
multiple EMSTs given a space delimited list of
origin nodes. The origin nodes are then used to
partition the input data based upon minimum
euclidean distance to one of the origins. Once the
points are allocated to an origin, a treed version of
the EMST is derived. An example is shown to the
right for four origin nodes (points, 16, 27, 3 and
57).
Purpose
Computes the minimum spanning tree of a set of points
using Prim's algorithm
Author
Dan Patterson
Dept of Geography and Environmental Studies
Carleton University, Ottawa, Canada
Dan_Patterson@carleton.ca
converted from Avenue code for ArcView 3.x
References
Derrick Wood, "Data Structures, Algorithms, & Performance"
(Addison-Wesley, 1993), Section 13.4.4. See also
Preparata & Shamos, "Computational Geometry" (Springer-Verlag,
1985), Section 6.1.
'''
#-----------------------------------------------------------------------------
def createPolyline(pnts):
'''creates a polyline from a list of points assign the X,Y and ID values
Points must be in the form x,y,id
'''
polyArray = arcpy.CreateObject("Array")
aPnt = arcpy.CreateObject("Point")
for aPair in pnts:
if isinstance(aPair, (list, tuple)):
aPnt.X = aPair[0]
aPnt.Y = aPair[1]
aPnt.ID = aPair[2]
#arcpy.AddMessage(str(aPnt.X) + " " + str(aPnt.Y) + " " + str(aPnt.id))
polyArray.add(aPnt)
return polyArray
try:
arcpy.CreateFeatureclass_management(out_folder, outFName, outType, "#",
"Disabled", "Disabled", SR)
arcpy.AddMessage("\n" + "Creating " + str(outFName) + "\n")
except:
arcpy.AddMessage("Failed to create feature class" + arcpy.GetMessages())
sys.exit()
for aField in fieldsToAdd:
arcpy.AddMessage("Adding field " + str(aField))
try:
arcpy.AddField_management(outFullName, aField[0], aField[1], aField[2],
aField[3], aField[4])
except:
arcpy.AddMessage("Failed to add field: " + str(aField))
#sys.exit()
#
try:
cur = arcpy.InsertCursor(out_FC)
except:
arcpy.AddMessage("failed to create cursor")
sys.exit()
for i in range(0,len(polylines)):
outShape = polylines[i][0] #the first is the polyline
aClass = polylines[i][1] #this is the grouping class
num_pnts = outShape.count
pnt1ID = outShape.getObject(0).ID #get the first id
pnt2ID = outShape.getObject(num_pnts - 1).ID #get the last id
feat = cur.newRow()
feat.shape = outShape
feat.FromID = pnt1ID
feat.ToID = pnt2ID
feat.GroupedBy = aClass
cur.insertRow(feat)
del cur
#-----------------------------------------------------------------------------
#import the modules and create arcpy
#
import sys, os, math
import arcpy
import arcpy_helper
import groupPnts_helper
import EMST_helper
arcpy.env.overwriteOutput = True
desc = arcpy.Describe
in_FC = sys.argv[1]
group_field = sys.argv[2]
out_FC = sys.argv[3].replace("\\","/")
if sys.argv[4] != "#":
origin_node = int(sys.argv[4])
treed_version = True
else:
origin_node = 0
treed_version = False
if str(sys.argv[5]) == "true":
other_versions = True
else:
other_versions = False
full_name = os.path.split(out_FC)
out_folder = full_name[0].replace("\\", "/")
shape_Type= desc(in_FC).shapeType
shape_field = desc(in_FC).shapeFieldName
OIDField = desc(in_FC).OIDFieldName
fc = desc(in_FC).CatalogPath.replace("\\","/")
SR = desc(in_FC).spatialReference
if shape_Type != "Point":
arcpy.AddMessage("\nRequires a point file...bailing" + "\n")
sys.exit()
else:
arcpy.AddMessage("\nProcessing " + in_FC + " " + shape_Type)
pntList = []
rows = arcpy.SearchCursor(in_FC)
if (shape_Type == "Point") and (group_field == "#"): #collect points
pnts=[]
for row in rows:
aShape = row.getValue(shape_field) #get the shape from the shape field
pnt = aShape.getPart()
XY = [pnt.X, pnt.Y] #get the X and Y
if XY not in pnts:
pnts.append(XY)
pntList.append(pnts)
if treed_version:
pnt_dict = EMST_helper.pnts_to_dict(pntList)
if not(pnt_dict.has_key(origin_node)):
arcpy.AddMessage("Origin node is not in the list of FID numbers")
sys.exit()
from_to_list = EMST_helper.branches(primsResult)
tree = EMST_helper.form_tree(origin_node, from_to_list)
tree_branches = EMST_helper.connect_branches(origin_node, tree)
outList = []
for i in range(len(tree_branches)):
polyline_list = EMST_helper.tree_pnts(tree_branches[i], pnt_dict)
outList.insert( len(outList), [polyline_list, str(i)] )
if len(polylines) > 0:
# try:
if aType in ["Text", "String", "None", "AllPoints"]:
GroupBy = ["GroupedBy", "TEXT", aPrec, aScale, aLeng]
elif aType == "Integer":
GroupBy = ["GroupedBy", "LONG", "16", aScale, str(aLeng)]
fieldsToAdd = [["FromID", "LONG", "9", "#", "#"],
["ToID", "LONG", "9", "#", "#"],
GroupBy ]
arcpy.AddMessage("Fields to add: " + str(fieldsToAdd))
polyFromLines(out_FC, "Polyline", SR, polylines, fieldsToAdd)
# except:
# arcpy.AddMessage("Failed to create EMST: " + arcpy.GetMessages())
#
#Attempt to correct for geometry errors, there are some errors that aren't
trapped however.
try:
arcpy.RepairGeometry_management(out_FC)
arcpy.AddMessage("\n" + "Attempting to repair any geometry errors" + "\n")
except:
arcpy.AddMessage("Unable to repair geometry " + arcpy.GetMessages())
sys.exit()
#
#produce a dissolved and singlepart versions
if other_versions:
try:
outFName = full_name[1].replace(".shp", "") + "_mp.shp"
outFName2 = full_name[1].replace(".shp", "") + "_sp.shp"
dissolved = out_folder + "/" + outFName
dissolved2 = out_folder + "/" + outFName2
arcpy.AddMessage("Creating a dissolved EMST: " + dissolved)
arcpy.MakeFeatureLayer_management(out_FC,"EMST_dissolved")
if group_field != "#":
arcpy.Dissolve_management("EMST_dissolved", dissolved, "GroupedBy" )
else:
arcpy.Dissolve_management("EMST_dissolved", dissolved)
arcpy.RepairGeometry_management(dissolved)
arcpy.AddMessage("Creating a singlepart EMST: " + dissolved2)
arcpy.MultipartToSinglepart_management(dissolved,dissolved2)
arcpy.RepairGeometry_management(dissolved2)
arcpy.AddMessage("\nNOTE:\n" + \
"Two other EMST versions have also been created: \n\t" + \
dissolved + "\n\t" + dissolved2 + \
"\nYou will have to add these manually\n")
except:
arcpy.AddMessage("\n" + "Unable to created dissolved versions" + "\n")
arcpy.AddMessage(arcpy.GetMessages())
Purpose
Computes a spanning tree of a set of points using Prim's algorithm
for points that are connected
Author
Dan Patterson
Dept of Geography and Environmental Studies
Carleton University, Ottawa, Canada
Dan_Patterson@carleton.ca
converted from Avenue code for ArcView 3.x
References
Derrick Wood, "Data Structures, Algorithms, & Performance"
(Addison-Wesley, 1993), Section 13.4.4. See also
Preparata & Shamos, "Computational Geometry" (Springer-Verlag,
1985), Section 6.1.
'''
#-----------------------------------------------------------------------------
def createPolyline(pnts):
'''creates a polyline from a list of points assign the X,Y and ID values
Points must be in the form x,y,id
'''
polyArray = arcpy.CreateObject("Array")
aPnt = arcpy.CreateObject("Point")
for aPair in pnts:
if isinstance(aPair, (list, tuple)):
aPnt.X = aPair[0]
aPnt.Y = aPair[1]
aPnt.ID = aPair[2]
#arcpy.AddMessage(str(aPnt.x) + " " + str(aPnt.y) + " " + str(aPnt.id))
polyArray.add(aPnt)
return polyArray
#anID=0
for i in range(0,len(polylines)):
outShape = polylines[i][0] #the first is the polyline
aClass = polylines[i][1] #this is the grouping class
num_pnts = outShape.count
pnt1ID = outShape.getObject(0).ID #get the first id
pnt2ID = outShape.getObject(num_pnts - 1).ID #get the last id
feat = cur.newRow()
feat.shape = outShape
feat.FromID = pnt1ID
feat.ToID = pnt2ID
feat.GroupedBy = aClass
cur.insertRow(feat)
del cur
#-----------------------------------------------------------------------------
#import the modules and create gp
#
import sys, os, math, string
import arcpy
import arcpy_helper
import groupPnts_helper
import EMST_helper
arcpy.env.overriteOutput = True
desc = arcpy.Describe
in_FC = sys.argv[1]
out_FC = sys.argv[2].replace("\\","/")
input_nodes = string.split(sys.argv[3])
output_type = sys.argv[4]
treed_version = True
main_nodes = []
for i in input_nodes:
main_nodes.append(int(i))
full_name = os.path.split(out_FC)
out_folder = full_name[0].replace("\\", "/")
shape_Type = desc(in_FC).ShapeType
shape_field = desc(in_FC).shapeFieldName
OIDField = desc(in_FC).OIDFieldName
fc = desc(in_FC).CatalogPath.replace("\\","/")
SR = desc(in_FC).SpatialReference
if shape_Type != "Point":
arcpy.AddMessage("\n" + "Requires a point file...bailing" + "\n")
sys.exit()
else:
arcpy.AddMessage("\n" + "Processing " + in_FC + " " + shape_Type)
valueList = []
arcpy.MakeFeatureLayer_management(in_FC, "TempLayer")
rows = arcpy.SearchCursor("TempLayer")
aVal = "None"
pntList = []
node_list = []
for row in rows:
aShape = row.getValue(shape_field)
pnt = aShape.getPart()
pnt_vals = [pnt.X, pnt.Y, row.getValue(OIDField), aVal]
if pnt_vals[2] in main_nodes:
node_list.append(pnt_vals)
if pnt_vals not in pntList:
pntList.append(pnt_vals)
else:
arcpy.AddMessage("Duplicate point ignored.")
pnt_dict = EMST_helper.pnts_to_dict(pntList)
alloc_dict = EMST_helper.allocate(pntList, node_list)
counter = 0
for i in range(len(tree_branches)):
polyline_list = EMST_helper.tree_pnts(tree_branches[i], pnt_dict)
outList.insert( len(outList), [polyline_list, str(origin_node)] )
counter += 1
else:
for node_data in node_list:
origin_node = node_data[2]
key_pnts = alloc_dict.get(origin_node)
arcpy.AddMessage("Radial lines for point " + str(origin_node))
spanResult = EMST_helper.sort_radial(key_pnts, node_data)
sorted_pnts = spanResult[0]
lines_to_key =[]
for i in sorted_pnts:
lines_to_key.append( [node_data, i] )
outList.append( [lines_to_key, origin_node] )
if len(polylines) > 0:
#arcpy.AddMessage("In type " + str(inType))
try:
GroupBy = ["GroupedBy", "LONG", "16", "#", "#"]
fieldsToAdd = [["FromID", "LONG", "9", "#", "#"], \
["ToID", "LONG", "9", "#", "#"], \
GroupBy ]
arcpy.AddMessage("Fields to add: " + str(fieldsToAdd))
polyFromLines(out_FC, "Polyline", SR, polylines, fieldsToAdd)
except:
arcpy.AddMessage("Failed to create EMST: " + arcpy.GetMessages())
sys.exit()
else:
arcpy.AddMessage("No shapes to create")
#Attempt to correct for geometry errors, there are some errors that aren't
trapped however.
try:
arcpy.RepairGeometry_management(out_FC)
arcpy.AddMessage("\n" + "Attempting to repair any geometry errors" + "\n")
Author
Dan Patterson
Dept of Geography and Environmental Studies
Carleton University, Ottawa, Canada
Dan_Patterson@carleton.ca
converted from Avenue code for ArcView 3.x
See EMST.py for further comments and original Prims algorithm implementation
Purpose:
Input forms
points pnts = [[ X, Y, ID, GroupedBy], ... [ X, Y, ID, GroupedBy]]
not the attributes are required by X,Y values are needed as a minimum
other attributes are simply ignored, but can be used in other applications
'''
import sys
import math
def prims(pnts):
'''Input pnts = [[ X, Y, ID, GroupedBy], ... [ X, Y, ID, GroupedBy]]
x,y coordinates
ID point ID
GroupdBy a value (numeric or test), or None
which allows you to group point sets
Prim's algorithm
Find the vertex pntmin of V with minimum distance d to the points of edges
in emst_edges. The first step is just a glorified search for the minimum
distance in closest_list.
Initialization:
closest_list represents the vertices not yet included in the minimum
spanning
tree (MST). Each element of the list is of the form [[v,u],d] where v is a
point
not in the tree, u is, and d = the distance from v to u.
emst_edges is the set of edges [v,u] comprising the MST.
'''
#Constants.
iEdge = 0
iEdgeLength = 1
null_pnt = None
null_number = None
emst_edges = [] # The edges, as they are added
closest_list = [] # The vertices, waiting to be added
for i in pnts:
pnt = [i[0],i[1],i[2],i[3]] # X, Y, ID and GroupedBy
closest_list.append([[pnt, null_pnt], null_number])
while (len(closest_list) > 0):
min_dist = null_number
min_id = 0
for i in range(0,len(closest_list)): # Test each remaining vertex
uv_dist = closest_list[i][iEdgeLength]
if ((min_dist == null_number) or ((uv_dist != None) and (min_dist >
uv_dist)) ):
min_dist = uv_dist
min_id = i
vx = closest_list[min_id] # Adjoin edge i to the list of edges.
lPtEdge = vx[iEdge]
lPtEdge.append(min_dist) # add the distance information (June 2010)
pntmin = lPtEdge[0]
ptUMin = lPtEdge[1]
if (ptUMin != None):
emst_edges.append(lPtEdge)
closest_list.pop(min_id) # Remove lPtEdge from V.
#This guarantees loop termination.
# Update the distances in closest_list.
# Vertices of closest_list must be compared to the new node, pntmin.
for vx in closest_list:
lPtEdge = vx[iEdge]
uv_dist = vx[iEdgeLength] # Distance from v to u
pnt = lPtEdge[0]
ptU = lPtEdge[1]
#
#Check the distance conditions Original,
xDistance = math.hypot(pntmin[0] - pnt[0], pntmin[1] - pnt[1])
iEdge = 0
iEdgeLength = 1
null_pnt = None
null_number = None
emst_edges = [] # The edges, as they are added
closest_list = [] # The vertices, waiting to be added
for i in pnts:
pnt = [i[0],i[1],i[2],i[3]] # X, Y, ID and GroupedBy
closest_list.append([[pnt, null_pnt], null_number])
while (len(closest_list) > 0):
min_dist = null_number
min_id = 0
for i in range(0,len(closest_list)): # Test each remaining vertex
uv_dist = closest_list[i][iEdgeLength]
if ((min_dist == null_number) or ((uv_dist != None) and (min_dist >
uv_dist)) ):
min_dist = uv_dist
min_id = i
vx = closest_list[min_id] # Adjoin edge i to the list of edges.
lPtEdge = vx[iEdge]
lPtEdge.append(min_dist) # add the distance information (June 2010)
pntmin = lPtEdge[0]
ptUMin = lPtEdge[1]
if (ptUMin != None):
emst_edges.append(lPtEdge)
closest_list.pop(min_id) # Remove lPtEdge from V.
#This guarantees loop termination.
# Update the distances in closest_list.
# Vertices of closest_list must be compared to the new node, pntmin.
for vx in closest_list:
lPtEdge = vx[iEdge]
uv_dist = vx[iEdgeLength] # Distance from v to u
pnt = lPtEdge[0]
ptU = lPtEdge[1]
#
#Check the distance conditions Added July 2010
if [pntmin, pnt] in must_connect:
xDistance = -1
else:
xDistance = math.hypot(pntmin[0] - pnt[0], pntmin[1] - pnt[1])
#
def pnts_to_dict(pnts):
'''create a dictionary from pnt data in the form
pnts = [[ X, Y, ID, GroupedBy], ... [ X, Y, ID, GroupedBy]]
'''
dict = {}
for pnt in pnts:
dict[pnt[2]] = pnt
return dict
def branches(edges):
'''form the branches yielded from prim and returns a from_to_list'''
from_to_list = []
for a_pair in edges:
from_to_list.append( [a_pair[0][2], a_pair[1][2]] ) #a_pair[-1][2]] )
return from_to_list
def node_types(from_to_list):
'''Requires: a list of from-to IDs
Returns: a list of intersections nodes (connects to at least 2 points)
or terminal nodes (only one node connects to it)
Uses sets to find the types, hence requires Python 2.4+
'''
from_id = []; to_id = []
from_to_list.sort()
for aPair in from_to_list:
from_id.append(aPair[0])
to_id.append(aPair[1])
nodes = [] #node list
from_set = set(from_id)
to_set = set(to_id)
inter_set = from_set.intersection(to_set)
sym_diff_set = from_set.symmetric_difference(to_set)
return from_set, to_set, inter_set, sym_diff_set
Purpose:
Helper functions, that use the geoprocessor created by arcpy
'''
#---------------------------------------------------------------
#required modules
import os, sys
def shapeToPoints(a_shape,theType,arcpy):
'''
pnts = shapeToPoints(a_shape, shape type, geoprocessor)
Purpose: Converts a shape to points, the shape and its type
are passed by the calling script
Requires: def pntXY(pnt)
'''
outList=[]
part_num = 0
part_count = a_shape.partCount
if theType == "Multipoint": #Multipoints
while part_num < part_count:
pnt = a_shape.getPart(part_num)
XY = pntXY(pnt)
if XY not in outList:
outList.append(XY)
part_num += 1
else: #Poly* features
Author:
Dan Patterson
Dept of Geography and Environmental Studies
Carleton University, Ottawa, Canada
Dan_Patterson@carleton.ca
def getPnts(rows):
'''Get unique points'''
pnts=[]
for row in rows:
aShape = row.shape
pnt = aShape.getPart()
XY = [pnt.X, pnt.Y] #get the X and Y
if XY not in pnts:
pnts.append(XY)
return pnts
#-----------------------------------------------------------------
def groupPoints(inField, inFC, arcpy):
'''group points based upon attributes in a field'''
groupedPoints = [] # list to hold grouped points
theFields = arcpy.ListFields(inFC)
inType = ""
desc = arcpy.Describe
OIDField = desc(inFC).OIDFieldName
OKFields = [OIDField]
#
if inField not in OKFields:
arcpy.AddMessage("The field " + inField + " is not an appropriate" + \
" field type. Terminating operation." + "\n")
del arcpy
sys.exit()
#
#Determine unique values in the selected field
arcpy.AddMessage(inField + " is being queried for unique values." + "\n")
valueList = []
rows = arcpy.SearchCursor(inFC)
aString = ""
aLen = 0; aFac = 1
for row in rows:
aVal = row.getValue(inField)
if aVal not in valueList:
valueList.append(aVal)
aLen = len(aString)
if aLen > 50 * aFac:
aString = aString + "\n"
aFac = aFac + 1
aString = aString + " " + str(aVal)
arcpy.AddMessage("Unique values: " + "\n" + aString)
#
#Do the actual work of producing the unique
aMax = 1
outVals = [] # a list to append valid output values
for aVal in valueList:
aMax = max(aMax,len(str(aVal)))
for aVal in valueList:
if (str(aVal).isdigit()) and (not inType == "String"):
fs = '"'+"%"+str(aMax)+"."+str(aMax)+'i"'
aSuffix = fs % aVal
aVal = str(aVal)
elif inType == "Double" and inScale == 0:
aSuffix = str(aVal).replace(".0","") ######
aVal = str(aVal).replace(".0","")
else:
aSuffix = str(aVal)
aVal = str(aVal)
try:
#Create a query and produce the file
if (not aVal.isdigit()) or (inType == "String"):
aVal = "'"+aVal+"'"
whereClause = "%s = %s" % (inField, aVal)
TempLayer = "TempLayer"
arcpy.MakeFeatureLayer_management(inFC, TempLayer, whereClause)
#
rowsNew = arcpy.SearchCursor("TempLayer")
pnts = getPnts(rowsNew)
if len(pnts) >= 3: #need 3 valid points for a convex hull
groupedPoints.append(pnts)
outVals.append(aVal)