Error Exactitud Precisión

Error, Exactitud, y Precisin
Error, Exactitud y Precisin.

Texto original: Error, Accuracy, and Precision - Kenneth E. Foote and Donald J.
Huebner, Dpto. of Geography of Texas at Austin, 1995. The Geographer Graft
Project, Dpto. of Geography, The University of Colorado at Boulder . (Contactar con
k.foote@colorado.edu)
Traduccin: Jos Antonio Rodrguez Esteban, Prof. de Geografa, Dpto. Geografa,
Lab. Geografia Aplicada, Proyecto GEOteca, Universidad Autnoma de Madrid
(2002).
Tabla de Contenidos
1. La importancia del Error, la Exactitud y la Precisin
2. Algunas definiciones bsicas
3. Tipos de error
G 3.1. Exactitud y precisin posicional
G 3.2. Exactitud y precisin de los atributos
G 3.3. Exactitud y precisin conceptual
G 3.4. La lgica de la exactitud y la presicin
4. Fuentes de inexactitud y imprecisin
G 4.1. Fuentes obvias de error
H 4.1.1. Antigedad de los datos
H 4.1.2. rea de cobertura
H 4.1.3. Escala del mapa
H 4.1.4. Densidad de las obervaciones
H 4.1.5. Relevancia
http://www.uam.es/docencia/geoteca/articulos/error/Esp/Error,%20Exactitud%20y%20Precision.htm (1 de 18)09/08/2006 07:56:08 p.m.
H 4.1.6. Formato
H 4.1.7. Acesibilidad
H 4.1.8. Coste
G 4.2. Errores resultantes de la variacin natural de los datos originales
H 4.2.1. Exactitud posicional
H 4.2.2. Precisin en el contenido
H 4.2.3. Fuentes de variacin de los datos
G 4.3. Errores originados durante los procesos
H 4.3.1. Errores numricos
H 4.3.2. Errores en los anlisis topolgicos
H 4.3.3. Problemas de la clasificacin y la generalizacin
H 4.3.4. Digitalizacin y los errores geocodificados
5. Los problemas de la propagacin y de la conexin en cascada
G 5.1. Propagacin
G 5.2. Conexin en cascada
6. Cuidado con la falsa precisin y la falsa exactitud
7. Los peligros de los datos indocumentados
G 7.1. Investigue cuando pida prestado o compre datos
G 7.2. Prepare un informe de la calidad de los datos que vaya a utilizar
G 7.3. Haga preguntas sobre al indocumentacin de los datos
8. Principios del manejo del error (en Ingls)
9. Bibliografa utilizada
10. Referencias y bibliografa adicional
11. Examination and Study Questions (en Ingls)

1. La importancia del Error, la Exactitud, y la Precisin
Slo recientemente, los usuarios y desarrolladores de los Sistemas de Informacin
Geogrfica (SIG) han prestado atencin a los problemas causados por el error, la
exactitud y la imprecisin en el conjunto de datos espaciales. Ciertamente, exista la
conciencia de que todos los datos contenan cierta inexactitud e imprecisin, pero su
efecto en los problemas y soluciones de los SIG no ha sido considerada con gran
detalle. Las principales introducciones a los SIG, tales como la de C. Dana Tomlin
Geographic Information Systems and Cartographic Modeling (1990), la de Jeffrey
Star y John Estes's Geographic Information Systems: And Introduction (1990), o la
de Keith Clarke's Analytical and Computer Cartography (1990), apenas tratan esta
cuestin.
Esta situacin ha cambiado sustancialmente en los ltimos aos. Ahora existe un
reconocimiento general de que el error, la inexactitud y la imprecisin pueden
"quebrar" algunos tipos de proyectos SIG. Esto es, los errores no detectados, pueden
dejar sin valor algunos de los anlisis GIS.
La irona est en que el problema del error es inherente a uno de las grandes
potencialidades de los SIG. Gran parte de las soluciones aportadas por los SIG son
posibles gracias a que cotejan y cruzan diversos tipos de datos con localizacin. Esto
es particularmente til al posibilitar integrar diversos conjuntos de datos discretos
bajo un nico sistema. Desafortunadamente, cada vez que se importa un nuevo
conjunto de datos, el SIG arrastrar el error inherente a los mismos. La mezcla y
combinacin de errores puede llevar al conjunto de datos por caminos impredecibles.
Una de las primeras discusiones en profundidad sobre el problema y las fuentes de
error aparece en P. A. Borrough's Principles of Geographical Information Systems
form Land Resources Assessment (1986). Ahora la cuestin aparece tratada en varias
introducciones al los SIS, camo en Geographical Information System: A Guide to the
Technology (1991) deJohn Antenucci, Kay Broen, Peter Croswell, Michael Kevany
and Hugh Archer.
El punto central est en que a travs del error se pueden alterar los anlisis del SIG,
que hay diversos caminos para reducir el error al mnimo mediante una cuidadosa
planificacin y mtodos para estimar sus efectos en las soluciones SIG. El
conocimiento del problema del error ha tenido como consecuencia beneficiosa el
hacer sensible a los usuarios de los SIG de las potenciales limitaciones inherente al
proceso para alcanzar exactitud y precisin en las soluciones.
2. Algunas definciones bsicas
Es importante distinguir desde el principio la diferencia entre exactitud y precisin:
1. Exactitud es el grado en el cual la informacin de un mapa o en una base de
datos digital se muestra verdadera o con valores aceptables. La exactitud es un
asunto perteneciente a la cualidad de los datos y al nmero de errores
contenidos en un conjunto de datos o mapa. Analizando una base de datos de
un SIG, es posible considerar la exactitud horizontal y vertical con respecto a
la posicin geogrfica, tanto atributiva y conceptual, como en la agudeza
lgica.
H El nivel de exactitud requerido puede variar enormemente de unos
casos a otros.
H Producir y compilar una gran exactitud en los datos puede ser muy
difcil y costoso.
2. Precisin hace referencia a la medida y exactitud de las descripciones en las
base de datos de un SIG. Los atributos de informacin precisos pueden
especificar la caractersticas de los elementos con gran detalle. Es importante
observar, no obstante, que los datos precisos - no importando el cuidado en su
medida - pueden ser inexactos. Los topgrafos pueden cometer errores, o bien
los datos pueden ser introducidos en las bases de datos incorrectamente.
H El nivel de precisin requerido puede variar enormemente de unos
casos a otros. Los proyectos de ingeniera como el de una carretera, y
las herramientas de construccin, requieren una muy precisa medida,
de milmetros a decenas de centmetros. Anlisis demogrficos de las
tendencias del electorado pueden prescindir de esta precisin mediante
un cdigo postal o de circunscripcin.
H Obtener datos altamente precisos puede ser verdaderamente difcil y
costoso. Topografiar cuidadosamente las localizaciones requiere de
compaas especficas para la recogida de la informacin.
Gran precisin no es indicativo de gran exactitud y tener gran exactitud no implica
gran precisin. Pero gran exactitud y gran precisin son bastante expresivas.
Los usuarios de los SIG no son siempre conscientes en el uso de los trminos. En
ocasiones ambos trminos son intercambiables lo que resulta contraproducente.
Dos trminos adicionales son igualmente usados:
1. Calidad de los datos hace referencia a la relativa exactitud y precisin de una
base de datos particular en un GIS. Estos hechos estn a menudo
documentados en los informes de calidad.
2. Error acompaa tanto a la imprecisin de los datos como a su inexactitud.
3. Tipos de error
El error posicional es el que ms a menudo concierne a los SIG, pudiendo afectar a
diferentes caractersticas de la informacin almacenada en un bases de datos.
3.1. Exactitud y precisin posicional
Es aplicable tanto a la posicin horizontal como a la vertical.
Exactitud y precisin estn en funcin de la escala en la que ha sido creado el mapa
(impreso o digital). Los mapas estndar empleados por el Servicio Geolgico de los
Estados Unidos (USGS) especifican que:
"se requiere una exactitud horizontal del 90 % en todos los puntos tomados que
deben de estar entre 1 y 30 pulgadas (2,54 y 76.2 cm) para mapas de escala superior a
1:20.000 y entre 1 y 50 pulgadas (2,54 y 127 cm) para mapas de escala inferior a
1:20.000"

Precisiones estndar para algunas escalas de mapas
1:1.200 3,33 pies ( 1,015 m)
1:2.400 6,67 pies ( 2,033 m)
1:4.800 13,33 pies ( 4,063 m)
1:10.000 27,78 pies ( 8,467 m)
1:12.000 33,33 pies ( 10,159 m)
1:24.000 40,00 pies ( 12,192 m)
1:63.360 105,60 pies ( 32,187 m)
1:100.000 166,67 pies ( 50,80 m)
(Nota: 1 pie = 30,48 cm = 0,3048 m)
Esto significa que cuando nosotros vemos un punto en un mapa, tendremos esta
probabilidad de que se encuentre dentro de cierto rea. Esto se hace extensivo a las
lneas.
Por otra parte, estn los peligros de la falsa exactitud y de la falsa precisin, que son
ledos en la informacin locacional desde los mapas con niveles de exactitud y
precisin bajo los cuales han sido creados. Esto es un verdadero peligro en los
ordenadores, puesto que permite a los usuarios aumentar y reducir las vistas en un
nmero infinito de escalas. Exactitud y precisin estn unidos a la escala original del
mapa y no cambia aunque se use el zoom para aumentar o reducir la vista. Estas
operaciones pueden incluso hacer creer -falsamente- que la exactitud y la precisin
son mejores.
3.2. Exactitud y precisin de los atributos
Los datos no espaciales unidos a la localizacin pueden ser inexactos o imprecisos.
La inexactitud puede ser consecuencia de errores de distinto tipo. Los datos no
espaciales pueden variar mucho tambin en precisin. La informacin precisa de los
atributos describen fenmenos con gran detalle. Por ejemplo, la descripcin precisa
de una persona que vive en una direccin particular puede incluir gnero, edad,
ingresos, ocupacin, nivel de educacin y muchas otras caractersticas. Una
descripcin imprecisa puede incluir slo los ingresos o slo el gnero.
3.3. Exactitud y precisin conceptual
Los SIG dependen sobretodo de la abstraccin y la clasificacin de los fenmenos
del mundo real. Los usuarios determinan que cantidad de informacin debe usarse y
como sta debe ser clasificada en categoras apropiadas. En ocasiones, los usuarios
pueden usar inapropiadas categoras o una clasificacin errnea de la informacin.
Por ejemplo, la clasificacin de ciudades por el comportamiento del voto electoral es
una va inadecuada para estudiar la fertilidad de las parejas; fallos en la clasificacin
de las lneas de alto voltaje puede limitar la efectividad en el diseo de un SIG en la
construccin de las infraestructuras elctricas. An empleando correctas categoras
los datos pueden estar mal clasificados. Un estudio de los sistemas de drenaje puede
necesitar de una clasificacin de las corrientes y ros por su "orden", atendiendo su
jerarqua al lugar donde una corriente particular desagua en el sistema tributario de la
red. Los canales individuales pueden estar mal clasificados si los tributarios estn
mal localizados. Por ello, algunos estudios pueden no requerir un tipo preciso de
categorizacin del orden de las corrientes. Todo lo ms que pueden necesitar es la
localizacin y el nombre de las corrientes fluviales, sin tener en cuenta el orden.
3.4 La lgica de la exactitud y precisin
La informacin almacenada en una base de datos puede estar ilgicamente
introducida. Por ejemplo, los permisos necesarios para construir una subdivisin
residencial en un plano de inundacin pueden necesitar comparar la proposicin con
el mapa del plano de inundacin. Por lo tanto, la construccin puede ser posible en
algunas zonas del plano de inundacin pero su uso no ser conocido hasta que las
variaciones de la inundacin potencial hayan sido registradas y puedan ser usadas en
la comparacin. La cuestin es, pues, que la informacin almacenada en la base de
datos de un SIG puede ser usada y cuidadosamente comparada, si produce resultados
tiles. Los SIG estn normalmente incapacitados para avisar a los usuarios cuando se
produce una inapropiada comparacin o si los datos han sido utilizados
incorrectamente. Algunas reglas de uso pueden ser incorporadas en el diseo de un
SIG, como sucede con los "sistemas expertos", pero los desarrolladores necesitaran
estar seguros que las reglas empleadas corresponden al mundo real de los fenmenos
que ellos modelan.
Finalmente sealar, cometeremos una equivocacin si creemos que una gran
exactitud y una gran precisin de la informacin es necesario para todas las
aplicaciones de los SIG. La necesidad de exactitud y precisin puede variar
radicalmente dependiendo del tipo de informacin codificada y del nivel de medida
necesario para una particular aplicacin. Son los usuarios los que deben determinar el
alcance de su trabajo. Excesiva exactitud y precisin no slo es costoso, sino tambin
puede resultar un gran engorro.
4. Fuentes de inexactitud e imprecisin
Son muchas las fuentes de error que pueden afectar la calidad del conjunto de datos
de un SIG. Esto, que resulta muy obvio, puede no ser tan difcil de discernir. Algunas
de ellas sern automticamente identificadas por el mismo SIG, pero es
responsabilidad del usuario su prevencin. Algunos casos particulares puede
necesitar de comprobaciones especficas de error, porque los propios SIG son
capaces de inducir al usuario una falsa sensacin de exactitud y precisin sin
garantizar la validez de los datos. Por ejemplo, suavizar cambios en las lneas
fronterizas, en las curvas de nivel y en las zonas de cambio de los mapas de
coropletas es una "elegancia que falsea" la realidad. En realidad, estas cuestiones son
a menudo "vagas, graduales o azarosas" (Burrough 1986). Hay una imprecisin
inherente en la cartografa como resultado de los procesos de proyeccin y la
necesaria distorsin producida en algunos de sus datos (Koeln et all, 1994); una
imprecisin que puede continuar a travs de los procesos aplicados con los SIG. Los
usuarios de los SIG deben ser capaces, no slo de reconocer el error, sino el grado de
error tolerable y asumible del sistema.
Burrough (1986) divide las fuentes de error en tres grandes categoras:
1. Fuentes de error obvias.
2. Errores resultantes de la variacin natural de las mediadas originales.
3. Errores surgidos en los procesamientos.
Generalmente los dos primeros errores son ms fciles de detectar que aquellos
errores del tercer tipos, surgidos al procesar los datos, por permanecer un tanto
escondidos y ser difciles de identificar. Burrough divide estos grupos principales en
distintas categoras, tratadas a continuacin.
4.1 Fuentes obvias de error
4.1.1. Antigedad de los datos.
Las fuentes de datos pueden ser simplemente antiguas para ser usadas en un proyecto
SIG. Las colecciones estndar del pasado pueden ser desconocidas, inexistentes o
desfasadas. Por ejemplo, los datos topogrficos del Gran Can obtenidos por el
decimonnico John Wesley Powell, contienen falta de precisin para ser utilizados
hoy en da. Adems, una parte de la informacin base puede, adems, haber
cambiado como consecuencia de la erosin, la deposicin o cualquier otro proceso
geomorfolgico. Pese al poder de los SIG, la dependencia de datos antiguos puede
tergiversar, sesgar o convertir en negativos los resultados.
4.1.2. rea de cobertura.
Los datos de una zona determinada pueden haber desaparecido completamente, o
nicamente una parte de los niveles de informacin pueden ser utilizables en un
proyecto SIG. Por ejemplo, los mapas de vegetacin o de suelo pueden estar
incompletos en las zonas de transicin o faltarles exactitud en la representacin. Otro
ejemplo, es la falta de datos proporcionados por los sensores remostos en ciertas
partes del mundo al estar permanentemente nublado. La exactitud de una cobertura
uniforme pude no estar disponible y el usuario debe decidir que nivel de
generalizacin debe ser necesaria si una nueva coleccin de datos es requerida.
4.1.3. Escala del mapa.
La posibilidad de los mapas para mostrar detalles est determinada por la escala. Un
mapa con una escala 1:1.000 puede ilustrar detalles ms precisos que otro a una
escala 1:25.000. La escala determina el tipo, cualidad y cantidad de los datos (Star
and Estes 1990). Se debe elegir siempre la escala adecuada al nivel de detalles
requerido en el proyecto. Transformar la pequea escala de un mapa en otra ms
grande no amentar el nmero de detalles o el nivel de precisin de dicho mapa.
4.1.4. Densidad de las observaciones.
El nmero de observaciones realizadas en un rea es una gua de la verosimilitud del
mapa y debe ser conocido por los usuarios del mismo. Un insuficiente nmero de
observaciones puede no proporcionar el adecuado nivel de resolucin requerido para
efectuar anlisis espaciales y resolver los objetivos marcados en el proyecto SIG. En
el caso de un punto, si las curvas de nivel poseen un intervalo de 120 cm, no es
posible bajar el nivel de precisin. Las lineas de un mapa son una generalizacin
basada en el intervalo de datos grabados, de este modo el ms cercano al intervalo
muestral, alcanza la mayor precisin de datos dibujados.
4.1.5. Relevancia.
Bastante a menudo, no es posible satisfacer los deseos de obtener datos de un lugar o
de un rea, y en su lugar debe haber sustitucin estos datos por otros. Ha de existir
una relacin de validez entre los datos sustituidos y el fenmeno a estudiar, pero an
as, pueden producirse errores en tanto en cuanto los fenmenos no han sido medidos
directamente. Un ejemplo local en el uso de datos sustituidos puede tomarse de los
estudios del hbitat de la curruca en Hill Country. Es muy costoso (y molesto para
los pjaros) inventariar esto hbitat mediante observacin directa. No obstante, las
currucas prefieren vivir en viejos cedros Juniperus ashei. El hbitat pueden ser
localizados por fotografa area. La densidad de los Juniperus ashei puede utilizarse
para deducir la densidad del hbitat de las currucas. Por supuesto, algunas reas de
cedro estarn inhabitadas o, por contra, tendrn una gran densidad. Estas reas
pueden no ser visibles cuando se utiliza fotografa area para tabular el hbitat.

Otro ejemplo de deduccin de datos se produce con la seal electrnica utilizada para
estimar mediante imgenes de satlite la cobertura vegetal, los tipos de suelo, la
erosionabilidad y otras tantas caractersticas. Los datos son obtenidos por mtodos
indirectos. Los sensores de los satlites no pueden "ver" los rboles, si no nicamente
ciertas signaturas digitales tpicas de los rboles y la vegetacin. En ocasiones, estas
signaturas son almacenadas por los satlites aunque no estn presentes los rboles y
la vegetacin (falso positivo) o no ser recogidas cuando los rboles y la vegetacin si
estn presentes (falso negativo). Dado el alto coste de obtencin de datos in situ,
sustituir los datos por deduccin es con frecuencia utilizado y el usuario debe
entender estas variaciones y asumir, o no, su validez en funcin de la exactitud
requerida en el proyecto.
4.1.6. Formato.
Los mtodos para transmitir, almacenar y procesar la informacin de forma digital,
pueden introducir error en los datos. Las conversiones de escala y proyeccin, los
cambios desde raster a vector y la resolucin del tamao y profundidad del pxel, son
ejemplos de los posibles errores inherentes al formato de los datos. En ocasiones, los
datos han de ser transmitidos y utilizadas en mltiples SIG por lo que deben
reformarse bajo un mnimo denominador comn. Mltiples conversiones desde un
formato a otro pueden crear un efecto similar a realizar copia tras copia en un
mquina fotocopiadora. Adems, hay que tener en cuenta que los estndares
internacionales para la transmisin, almacenamiento y recuperacin de datos
cartogrficos no estn totalmente realizados.
4.1.7. Acesibilidad.
La accesibilidad de los datos es otra cosa. Lo que est disponible en un pas puede no
estarlo en otros. Antes de la desaparicin de la Unin Sovitica, no pocos mapas eran
considerados como documentacin clasificada y por lo tanto, inobtenibles por la
mayor parte de la gente. Las restricciones militares, la rivalidad entre agencias, las
leyes de privacidad y los factores econmicos pueden restringir la validad de los
datos o su nivel de exactitud.
4.1.8. Coste.
Extensos y veraces datos son, a menudo, demasiado caros de obtener o convertir.
Iniciar una nueva coleccin de datos puede ser demasiado caro para los beneficios
generados en un determinado proyecto GIS y sus diseadores deben moverse entre su
deseo de exactitud y el coste de la informacin. La verdadera exactitud es cara y
puede ser inasequible.
4.2. Errores resultantes de la variacin natural de los datos originales.
En ocasiones estas fuentes de error pueden no ser tan obvias, una revisin cuidadosa
puede mostrar su trascendencia en el proyecto.
4.2.1. Exactitud posicional.
La exactitud en la posicin es una medida del desajuste entre los elementos del mapa
y la verdadera posicin de los atributos (Antenucci and others, 1991, 102). Depende
del tipo de datos usados u observados. Los cartgrafos pueden situar con exactitud
objetos bien definidos, como carreteras, edificios, lneas divisorias y unidades
topogrficas discretas en mapas y en sistemas digitales, mientras que separaciones
menos discretas como las existentes entre la vegetacin o los tipos de suelo pueden
ser el resultado de las estimaciones del cartgrafo. El clima, los biomas, el relieve,
los tipos de suelo, el drenaje y otros elementos faltos de una clara delimitacin en la
naturaleza, son susceptibles de ser interpretados.
Defectos o trabajos parciales, errores de digitalizacin de mapas y de conversin en
los mapas o en los escner, pueden todos ellos producir mapas inexactos en un
proyecto SIG.
4.2.2. Precisin en el contenido.
Los mapas deben de ser correctos y estar libre de presuposiciones. La precisin
cualitativa hace referencia a la correccin en la clasificacin y a la presencia de
elementos especficos. Por ejemplo, un bosque de pinos puede estar incorrectamente
clasificado como un bosque de abetos, introduciendo de esta forma errores que no
pueden ser conocidos o sospechados por el usuario de los datos o del mapa. Ciertos
elementos pueden ser omitidos tanto desde el mapa como desde la base de datos
espacial por descuido o intencionadamente.
Otros errores en exactitud cuantitativa pueden ocurrir por los defectos de los
instrumentos de calibracin usados para medir aspectos concreto como la altitud; el
pH del suelo o del agua, o los gases atmosfricos. Los errores cometidos en el campo
o en el laboratorio, pueden ser indetectables en un proyecto SIG, salvo que el usuario
confirmara o corroborara la validad de la informacin.
4.2.3. Fuentes de variacin de datos.
Las variaciones en los datos pueden realizarse por la introduccin de errores de
medida durante la observacin, por la predisposicin del observador o por falta de
una adecuada calibracin del equipamiento. Por ejemplo, se puede no esperar
precisiones submtricas con un GPS de mano sin corrector diferencial. Por otra parte,
una incorrecta calibracin en las formas de disolver el oxgeno puede producir
valores incorrectos de concentracin del mismo en una corriente.
Puediendo ser una variacin natural durante la toma de datos. As, por ejemplo, la
salinidad en la baha y en el estuario de Texas vara durante el ao dependiendo del
influjo de la las corrientes fras en profundidad y de la evaporacin. Si alguien no
fuera consciente de esta variacin natural, ideas y decisiones errneas pudieran ser
tomadas, introduciendo un significativo error en el proyecto SIG. En algunos casos,
si el error no da lugar a inesperados resultados, su deteccin sera extremadamente
difcil.
4.3. Errores originados durantes los procesos
Los errores originados durante los procesos son los ms dificiles de detectar por los
usuarios de los SIG. Pueden ser especficamente buscados para lo cual se requiere
conocimento de la informacin y de los sistemas usados en su procesamiento. Hay
suberrores que ocurren de diferentes modos, habiendo otros potencialmente ms
insidiosos, por que pueden ocurrir en mltiples conjuntos de datos durante su
manipulacin en un proyecto SIG.
4.3.1. Errores numricos.
Diferentes ordenadores pueden no terner la misma capacidad para construir
complejas operaciones matemticas, pudiendo producir resultados significativamente
diferentes desde un mismo problema. Borrough (1990) cita un ejemplo en la
elevacin al cuadrado de un nmero, lo que produce una diferencia del 1.200 %. Los
errores en los procesos de clculo ocurren en las operaciones de redondeo y son
inherentes al nmero de dgitos manipulados por le procesador. Otra fuente de error
puede deberse a defectos del propio procesador, como ha sucedido con un problema
matemtico identificado en los chips del Pentium de Intel (tm). En ciertos clculos, el
chip ofreca respuestas equivocadas.
Un mayor reto es el de la exactitud en la conversin de mapas existentes en formato
digital (Muehcke 1986). Como los ordenadores manipulan los datos en formato
digital, los errores numricos pueden producir resultados inexactos. En cualquier
caso, los errores en los procesos numricos son extremadamente difciles de detectar,
y quiz requieran de una sofisticacin no presente en la mayora de los usuarios de
SIG o promotores de proyectos.
4.3.2. Errores en los anlisis topolgicos.
Los errores lgicos pueden causar una incorrecta manipulacin de los datos y de los
anlisis topolgicos. Se pueden reconocer qu datos no son uniformes y estn sujetos
a variaciones. La superposicin de mltiples capas de mapas puede resultar ocasionar
problemas del tipo "Slivers", "Overshoots" y "Dangles". Variaciones en la exactitud
entre diferentes capas del mapas pueden oscurecer durante le proceso en la creacin
de "datos virtuales los cuales pueden dificultar el reconocimento de los datos
reales" (Sample 1994).
4.3.3. Problemas de clasificacin y generalizacin.
Para el entendimiento humano, la comprensin de una vasta cantidad de datos reside
en su clasificacin, y en algunos casos en su generalizacin. Siguiendo a Borrough
(1986, 137) clasificar los datos en torno a siete divisiones es el ideal, ya que se
pueden retener fcilmente en la memoria. Definir como se harn los intervalos es
otro problema. Por ejemplo, definir las causa de muerte en hombres de entre 18-25
aos ser significativamente diferente que entre 18-40 aos. Los datos son ms
exactos y manipulables entre mltiplos pequeos. Definir un mltiplo razonable y
preguntases "por qu hay que comparar", es esencial (Tufte 1990, 67-79). La
clasificacin y la generalizacin de atributos usada en un GIS est sujeta a errores de
interpolacin y puede introducir irregularidades en lo datos difciles de detectar.
4.3.4. Digitalizacin y errores geocodificados.
Los errores ocurridos durante el transcurso de las fases de manipulacin de datos
tales como la digitalizacin y la geocodificacin, el recubrimiento y las
intersecciones de los lmites, y los errores de rasterizacin de un mapa vectorial. Los
errores fisiolgicos del operador por contracciones involuntarias del msculo pueden
dar lugar a "spikes" (puntos), a switchbacks (zig-zags), a "polygonal knots" (nudos
poligonales), y a "loops" (lazos). Los errores asociados con los mapas fuente
daados, el error del operador mientras lo converta a digital, y los prejuicios puede
ser comprobados comparando los mapas originales con versiones convertidas a
digital. Otros errores resultan ms evasivos.
5. Los problemas de la propagacin y de la conexin en
cascada
Esta discusin se ha enfocado en relacin a los errores que pueden estar presentes en
sistemas de datos individuales. Los SIG dependen generalmente de comparaciones de
muchos sistemas de datos. Este diagrama esquemtico demuestra cmo una variedad
de conjunto de datos discretos puede tener que ser combinados y ser comparados
para solucionar un problema de anlisis del recurso. Es inverosmil que la
informacin contenida en cada capa sea exactamente igual y precisa. Los errores
pudieron tambin haber surgido compilando la informacin. Si ste es el caso, la
solucin al problema del SIG en si mismo puede ser inexacta, imprecisa o errnea.
El problema es esa inexactitud, imprecisin y el error se puede formar en los SIG que
empleen muchas fuentes de datos. Hay dos maneras posibles para que esto ocurra.
5.1. Propagacin.
La propagacin ocurre cuando un error conduce a otro. Por ejemplo, si un punto de
registro del mapa se ha convertido a digital en una cobertura y despus se utiliza para
colocar una segunda cobertura, la segunda cobertura propagar el primer error. De
esta manera, un solo error puede conducir a otro y separarse hasta que corrompe los
datos a travs del proyecto entero del SIG. Para evitar este problema utilice el mapa
de la escala ms grande para colocar sus puntos.
La propagacin ocurre a menudo en una manera acumulativa, como cuando los
mapas de diversa exactitud se compaginan.
5.2. Conexin en cascada.
La conexin en cascada significa que la informacin errnea, imprecisa e inexacta
sesgar la resolucin de SIG cuando la informacin se combina selectivamente en
nuevas capas y coberturas. En este sentido, la conexin en cascada ocurre cuando los
errores se pueden propagar de manera incontrolada de capa en capa.
Los efectos de la conexin en cascada pueden ser, igualmente muy difciles de
predecir. Pueden ser aditivos o multiplicativos y pueden variar dependiendo de cmo
se combina la informacin, variando de situacin en situacin. Como la conexin en
cascada puede tener tales efectos imprevisibles, es importante probar su influencia en
una solucin dada de un SIG. Esto se hace calibrando una base de datos de un SIG
mediante tcnicas tales como la del anlisis de la sensibilidad. El anlisis de la
sensibilidad permite que los usuarios calibren cmo y cuntos errores tendrn
solucin. El anlisis de la calibracin y de la sensibilidad se discute en el error de
manejo.
Tambin es importante sealar que la propagacin y la conexin en cascada pueden
afectar a la horizontal, la vertical, la cualidad, la conceptualidad y a la exactitud
lgica y la precisin
6. Cuidado con la falsa precisin y la falsa exactitud !
Los usuarios de los SIG no estn siempre enterados de los difciles problemas
causados por el error, la inexactitud y la imprecisin. Caen a menudo presa de la
falsa precisin y de la falsa exactitud , as que divulgan sus resultados a un nivel de
precisin o de exactitud que son imposibles alcanzar con sus materiales de fuente. Si
las localizaciones en una cobertura de un SIG se miden solamente en cientos de pies
de su posicin verdadera, no tiene ningn sentido divulgar localizaciones predichas
en una solucin a una dcima del pie. Es decir, slo porque los ordenadores pueden
almacenar imgenes numricas con muchos espacios decimales no significa que
todos esos espacios decimales sean "significativos". Es importante que las soluciones
de un SIG sean divulgadas honestamente y slo bajo la exactitud y la precisin en la
que se puedan apoyar.
Esto significa en la prctica que las soluciones de un SIG son a menudo las mejores
divulgadas como gamas o graduacin, o presentadas dentro de intervalos estadsticos
de confianza. Estas ediciones se tratan en el mdulo, tratamiento del error.
7. Los peligros de los datos indocumentados
Despus de lo dicho, es fcil imaginarse el peligro de usar datos indocumentados en
un proyecto de SIG. A menos que el usuario tenga una idea clara de la exactitud y de
la precisin del conjunto de datos, mezclar stos en un GIS puede ser muy
aventurado. Los datos que se han elaborado cuidadosamente se pueden interrumpir
por errores que algn otro ha cometido. Esto trae a la luz tres situaciones
importantes.
7.1. Investigue cuando pida prestado o compre datos.
Muchos de los datos de productos gubernamentales y comerciales importantes,
trabajan dentro de los establecidos estndares de exactitud y precisin que estn
disponibles al pblico en forma impresa o digital. Estos documentos explican
exactamente cmo fueron compilados los mapas y el conjunto de los datos, y tales
informes se deben estudiar cuidadosamente. Los informes de calidad de los datos se
proporcionan generalmente en las agencias locales y del estado o en los de agentes
privados.
7.2. Prepare un informe de la calidad de los datos que vaya a utilizar.
Sus datos no tendrn valor a menos que se prepare tambin un informe de la calidad
de los datos. Incluso si usted no planea compartir sus datos con otros, debe preparar
un informe -para el caso de que se utilice el conjunto de datos otra vez en el futuro-.
Si no se documentan el conjunto de datos cuando se crean, puede terminar por perder
el tiempo ms adelante comprobarlos una segunda vez. Utilice los informes de la
calidad de los datos encontrados como modelos para documentar su conjunto de
datos.
7.3. En ausencia de un informe de la calidad de los datos, hga preguntas acerca
de los datos indocumentados antes de utilizarlos.
G Cul es la antigedad de los datos?
G De dnde proceden?
G Por qu medio se crearon originalmente?
G Cul es la cobertura regional de los datos?
G A qu escala del mapa fueron convertidos a digital los datos?
G Qu proyeccin, sistema de coordenadas y 'datum' fueron utilizados en los
mapas?
G Cul era la densidad de las observaciones usadas para su compilacin?
G Cmo de exactas son las caractersticas posicionales y de cualidad?
G Parecen lgicos y consistentes los datos?
G Parecen limpias las representaciones cartogrficas?"
G Son relevantes los datos para el prouyecto actual?
G Qu formato se mantienen los datos?
G Cuando fueron comprobados los datos?
G Por qu fueron compilados los datos?
G Cul es realmente la fiablidad del proveedor?
8. Tratamiento del error (en Ingls)
Los mtodos para controlar, medir, y el error de manejo son el tema del mdulo
siguiente.
9. Bibliografa utilizada
Antenucci, J.C., Brown, K., Croswell, P.L., Kevany, M. and Archer, H. 1991.
Geographic Information Systems: a guide to the technology. Chapman and Hall. New
York.
Burrough, P.A. 1990. Principles of Geographical Information Systems for Land
Resource Assessment. Clarendon Press. Oxford.
10. Referencias y bibliografa adicional
Antenucci, J.C., Brown, K., Croswell, P.L., Kevany, M. and Archer, H. 1991.
Geographic Information Systems: a guide to the technology. Chapman and Hall. New
York.
Burrough, P.A. 1990. Principles of Geographical Information Systems for Land
Resource Assessment. Clarendon Press. Oxford.
Koeln, G.T., Cowardin, L.M., and Strong, L.L. 1994. "Geographic Information
Systems". P. 540 in T.A. Bookhout ed. Research and Management Techniques for
Wildlife and Habitat. The Wildlife Society. Bethesda.
Muehrcke, P.C. 1986. Map Use: Reading, Analysis, and Interpretation . 2d Ed. JP
Publications, Madison.
Sample, V.A. (Ed). 1994. Remote Sensing and GIS in Ecosystem Management .
Island Press. Washington, D.C.
Star, J. and Estes, J. 1990. Geographic Information Systems: an Introduction .
Prentice Hall. Englewood Cliffs.
Tufte, E.R. 1990. Envisioning Information. Graphics Press, Cheshire, Conn.
11. Preguntas de examen y estudio (Ingls)
Creado el 31 de mayo de 2002. Artculo de GEOteca de la Universidad
Autnoma de Madrid, Espaa.
error
Examination and Study Questions for
Error, Accuracy, and Precision
1. Essay Questions
2. Short Answer
3. Multiple-choice
Click here to review Lecture and Discussion Notes
1. Essay Questions
Limit each essay to two double-spaced typewritten pages plus
references.
1. How can sensitivity analysis be used to judge the level of precision and
accuracy required to meet the needs of a GIS application?
2. Is it fair to say that an undocumented dataset is a worthless dataset? What
documentation makes a dataset valuable?
3. Why must standards for data accuracy address both procedures and products?
Return to top
2. Short Answer
Limit your answers to no more than 100 words.
http://www.colorado.edu/geography/gcraft/exam/error/error.html (1 de 7)09/08/2006 07:56:42 p.m.
error
1. What are three sources of error in a GIS database?
1. Obvious error
2. Errors arising from natural variation or original measurement
3. Processing error
2. What is the difference between precision and accuracy?
H Accuracy is how well information or data matches true values.
Precision refers to the level of measurement
3. What is "false precision"? How can the results of GIS analysis be reported to
avoid this pitfall?
H False precision is reporting GIS analysis or finding to a level of
precision or accuracy impossible to achieve from the source materials.
GIS solutions are best reported as ranges or rankings, or presented
within statistical confidence intervals.
4. Why are mistakes caused by "false scale" so common when using automated
mapping systems?
H Due to the ease in which one may change scale, i.e. "zoom in and zoom
out", but the automated map is only as accurate as the source scale.
5. How are accuracy and precision related to the scale of USGS map products?
H USGS employs accuracy standards that require "horizontal accuracy
as 90 per cent of all measurable points must be within 1/30th of an
inch for maps at a scale of 1:20,000 or larger, and 1/20th of an inch
for maps at scales smaller that 1/20,000." Different scale maps have
different levels of accuracy, e.g. a 1:1,200 map is accurate to (3.33 ft,
whereas a 1:24,000 is accurate to (40.00 feet.
6. What is meant by a "pedigree" for data and why is it important to
understanding the quality of spatial databases?
H Data should be documented as to accuracy and precision in reports
error
that tell you exactly how the maps and datasets were compiled.
Unreliable, unknown, or poor quality data will produce uncertain or
spurious GIS solutions.
7. What is the difference between "propagation of error" and "cascading of
error"?
H Propagation occurs when one error leads to another. Propagation is
often additive in nature, as when maps of different accuracy are
collated. Cascading of error occurs when erroneous, imprecise, or
inaccurate information is combined selectively into new layers and
coverages, that is errors are allowed to propagate unchecked from
layer to layer.
8. According to P.A. Burrough (1986) in Chapter 6 Principles of Geographic
Information Systems for Land Resources Assessment, what are four types of
error arising from natural variation or original measurements that affect the
quality of GIS datasets?
1. Positional accuracy
2. Accuracy of content
3. Measurement error
4. Natural variation in data collected
9. List two sources of error that may be "hidden" from the GIS user, that is error
resulting from natural variation, original measurement, or processing.
1. Numerical errors
2. Uncalibrated measuring equipment
3. Errors in topological analysis
4. Classification and generalization errors
5. Original observations are faulty
6. Digitizing and geocoding errors
10. What is the difference between positional accuracy, conceptual accuracy, and
attribute accuracy?
H Positional accuracy is dependent on the accuracy standards of a map,
that is what is the probable location defined by some parameter such
error
as (10 feet. Conceptual accuracy is based on how one classifies
information into appropriate categories. Attribute accuracy is the level
of non-spatial data accuracy linked to location. It may also include the
level of detail assigned to a location.
11. List at least five problems that arise when "paper" maps are converted to
"digital" maps.
1. digitizing errors
2. geocoding errors
3. Physiological errors of operator
4. Damaged source maps
5. Rasterizing a vector map
6. Propagation errors
7. Numerical processing errors
Return to top
3. Multiple-choice questions
Choose the best or most appropriate answer(s) to the question.
1. When an error in a dataset leads to the commission of another error this is
called:
1. false precision
2. propagation
3. spawning
4. horizontal error
5. cascading
6. breeding
2. Which of the following statements about accuracy and precision are true:
1. Conceptual accuracy means employing the correct database model
to represent a real-world feature or event
2. False precision applies only to positional accuracy
3. Measures of attribute accuracy vary depending upon the level of
measurement of the attribute
error
4. Cohen's Kappa is one measure of positional accuracy
3. Which of the following statements is true?
1. a precise measurement may be inaccurate
2. an accurate measurement may be imprecise
3. accuracy applies only to attribute data whereas precision applies to
both attribute and geographic data
4. high accuracy and high precision are both expensive to acquire
4. The degree to which information on a map or in a digital database matches
true or accepted values is referred to as:
1. precision and accuracy
2. precision
3. accuracy
4. data quality
5. attribute information
6. None of the above
5. The term "precision" refers to:
1. highly accurate data
2. measurements that are within (1mm
3. logical accuracy
4. the level of measurement and exactness of description in a GIS
database
5. All of the above
6. Examples of non-obvious sources of data are:
1. areal cover
2. map scale
3. numerical errors
4. format
5. density of observations
7. Propagation occurs when:
error
1. different scale maps are digitized
2. data quality reports are missing
3. one error leads to another
4. data cascades
5. none of the above
8. GIS solutions are best reported:
1. using calibration sensitive analysis
2. as ranges or rankings
3. with surrogate data analysis
4. with statistical confidence intervals
5. all of the above
6. 2 and 4 only
9. Modern GIS and CAD packages allow easy changes to map scales. Enlarging
a small scale map:
1. provides higher levels of observation
2. increases the density of observations
3. does not increase its accuracy or level of detail
4. allows better use of surrogate data
10. Many agencies now provide data quality reports for GIS data. A data quality
report:
1. ensures GIS solutions will be correct
2. requires sensitivity calibration
3. ensures precision and accuracy in digital data
4. provides information on how maps and data sets were compiled
5. none of the above
6. 2 and 4
Return to top
Return to list of topics
error
Return to Geographer's Craft Homepage
Created on 22 Dec 95. Revised on 5 February 2000. LNC
http://www.colorado.edu/geography/gcraft/notes/error/gif/scale.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/scale.gif09/08/2006 07:57:10 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/hump3.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/hump3.gif09/08/2006 07:57:25 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/line3.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/line3.gif09/08/2006 07:57:31 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/soil.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/soil.gif09/08/2006 07:57:56 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/digitz.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/digitz.gif09/08/2006 07:58:04 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/sliver.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/sliver.gif09/08/2006 07:58:18 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/overun.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/overun.gif09/08/2006 07:58:21 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/dangle.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/dangle.gif09/08/2006 07:58:24 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/prop2.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/prop2.gif09/08/2006 07:58:48 p.m.
http://www.colorado.edu/geography/gcraft/notes/error/gif/cascade.gif
http://www.colorado.edu/geography/gcraft/notes/error/gif/cascade.gif09/08/2006 07:58:59 p.m.
Managing Error

These materials were developed by Kenneth E. Foote and Donald J. Huebner, Department of Geography,
University of Texas at Austin, 1996. These materials may be used for study, research, and education in not-
for-profit applications. If you link to or cite these materials, please credit the authors, Kenneth E. Foote
and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of
Colorado at Boulder. These materials may not be copied to or issued from another Web server without the
authors' express permission. Copyright 2000 All commercial rights are reserved. If you have comments
or suggestions, please contact the author or Kenneth E. Foote at k.foote@colorado.edu .
This page is also available in a framed version . For convenience we have provided a full Table of
Contents .
1. The Problems of Error, Accuracy and Precision
Managing error in GIS datasets is now recognized as a substantial problem that needs to be addressed in the
design and use of such systems. Failure to control and manage error can limit severely or invalidate the
results of a GIS analysis. Please see the module, Error, Accuracy, and Precision for an overview of the key
issues.
2. Setting Standards for Procedures and Products
No matter what the project, standards should be set from the start. Standards should be established for both
spatial and non-spatial data to be added to the dataset. Issues to be resolved include the accuracy and
precision to be invoked as information is placed in the dataset, conventions for naming geographic features,
criteria for classifying data, and so forth. Such standards should be set both for the procedures used to
create the dataset and for the final products. Setting standards involves three steps.
2.1. Establishing Criteria that Meet the Specific Demands of a Project
Standards are not arbitrary; they should suit the demands of accuracy, precision, and completeness
determined to meet the demands of a project. The Federal and many state governments have
established standards meet the needs of a wide range of mapping and GIS projects in their domain.
Other users may follow these standards if they apply, but often the designer must carefully establish
standards for particular projects. Picking arbitrarily high levels of precision, accuracy, and
completeness simply adds time and expense. Picking standards that are too low means the project
may not be able to reach its analytical goals once the database is compiled. Indeed, it is perhaps best
to consider standards in the light of ultimate project goals. That is, how accurate, precise, and
http://www.colorado.edu/geography/gcraft/notes/manerror/manerror.html (1 de 5)09/08/2006 07:59:08 p.m.
Managing Error
complete will a solution need to be? The designer can then work backward to establish standards for
the collection and input of raw data. Sensitivity analysis (discussed below) applied to a prototype can
also help to establish standards for a project.
2.2 Training People Involved to Meet Standards, Including Practice
The people who will be compiling and entering data must learn how to apply the standards to their
work. This includes practice with the standards so that they learn to apply them as a natural part of
their work. People working on the project should be given a clear idea of why the standards are being
employed. If standards are enforced as a set of laws or rules without explanation, they may be
resisted or subverted. If the people working on a project know why the standards have been set, they
are often more willing to follow them and to suggest procedures that will improve data quality.
2.3. Testing That the Standards Are Being Employed Throughout a Project and Are
Reached by the Final Products
Regular checks and tests should be employed through a project to make sure that standards are being
followed. This may include the regular testing of all data added to the dataset or may involve spot
checks of the materials. This allows to designer to pinpoint difficulties at an early stage and correct
them.
Examples of data standards:
G USGS, National Mapping Program Standards, http://nationalmap.gov/gio/standards/
G Information on the Spatial Data Transfer Standard, http://mcmcweb.er.usgs.gov/sdts/
G USGS Map Accuracy Standards, http://rockyweb.cr.usgs.gov/nmpstds/nmas.html
3. Documenting Procedures and Products: Data Quality Reports
Standards for procedures and products should always be documented in writing or in the dataset itself. Data
documentation should include information about how data was collected and from what sources, how it was
preprocessed and geocoded, how it was entered in the dataset, and how it is classified and encoded. On
larger projects, one person or a team should be assigned responsibility for data documentation.
Documentation is vitally important to the value and future use of a dataset. The saying is that an
undocumented dataset is a worthless dataset. By in large, this is true. Without clear documentation a dataset
can not be expanded and cannot be used by other people or organizations now or in the future.
Documentation is of critical importance in large GIS projects because the dataset will almost certainly
outlive the people who created it. That is, GIS for municipal, state, and AM/FM applications are usually
designed to last 50-100 years. The staff who enters the data may have long retired when a question arises
about the characteristics of their work. Written documentation is essential. Some projects actually place
information about data quality and quality control directly in a GIS dataset as independent layers. An
example of data quality reports is:
G Digital Elevation Model Standards, http://rockyweb.cr.usgs.gov/nmpstds/demstds.html
Managing Error
4. Measuring and Testing Products
GIS datasets should be checked regularly against reality. For spatial data, this involves checking maps and
positions in the field or, at least, against sources of high quality. A sample of positions can be resurveyed to
check their accuracy and precision. The USGS employs a testing procedure to check on the quality of its
digital and paper maps, as does the Ordnance Survey. Indeed, the Ordnance Survey continues periodically to
test maps and digital datasets long after they have first been compiled. If too many errors crop up, or if the
mapped area has changed greatly, the work is updated and corrected.
Non-spatial attribute data should also be checked either against reality or a source of equal or greater
quality. The particular tests employed will, of course, vary with the type of data used and its level of
measurement. Indeed, many different tests have been developed to test the quality of interval, ordinal, and
nominal data. Both parametric and nonparametric statistical tests can be employed to compare true values
(those observed "on the ground") and those recorded in the dataset.
Cohen's Kappa provides just one example of the types of test employed, this one for nominal data. The
following example shows how data on land cover stored in a database can be tested against reality.
See Attribute Accuracy and Calculating Cohen's Kappa
5. Calibrating a Dataset to Ascertain How Error Influences Solutions
Solutions reached by GIS analysis should be checked or calibrated against reality. The best way to do this is
check the results of a GIS analysis against the findings produced from completely independent calculations.
If the two agree, then the user has some confidence that the data and modeling procedure is valid.
This process of checking and calibrating a GIS is often referred to as Sensitivity Analysis. Sensitivity
analysis allows the user to test how variations in data and modeling procedure influence a GIS solution.
What the user does is vary the inputs of a GIS model, or the procedure itself, to see how each change alters
the solution. In this way, the user can judge quite precision how data quality and error will influence
subsequent modeling.
This is quite straight forward with interval/ratio input data. The user tests to see how an incremental change
in an input variable changes the output of the system. From this, the user can derive "marginal sensitivity" to
an input and establish "marginal weights" to compensate for error.
But sensitivity analysis can also be applied to nominal (categorical) and ordinal (ranked) input data. In these
cases, data may be purposefully misclassified or misranked to see how such errors will change a solution.
Sensitivity analysis can also be used during system design and development to test the levels of precision
and accuracy required to meet system goals. That is, users can experiment with data of differing levels of
precision and accuracy to see how they perform. If a test solution is not accurate or precise enough in one
pass, the levels can be refined and tested again. Such testing of accuracy and precision is very important in
Managing Error
large GIS projects that will generated large quantities of data. In is of little use (and tremendous cost) to
gather and store data to levels of accuracy and precision beyond what is needed to reach a particular
modeling need.
Sensitivity can also be useful at the design stage in testing the theoretical parameters of a GIS model. It is
sometimes the case that a factor, though of seemingly great theoretical importance to a solution, proves to be
of little value in solving a particular problem. For example, soil type is certainly important in predicting crop
yields but, if soil type varies little in a particular region, it is a waste of time entering into a dataset designed
for this purpose. Users can check on such situations by selectively removing certain data layers from the
modeling process. If they make no difference to the solutions, then no further data entry needs to be made.
To see how sensitivity analysis might be applied to a problem concerned with upgrading a municipal water
system, go to the following section on Sensitivity Analysis.
In closing this example, it is useful to note that the results were reported in terms of ranking. No single
solution was optimal in all cases. Picking a single, best solution might be misleading. Instead, the sites are
simply ranked by the number of situations in which each comes out ahead.
6. Report Results in Terms of the Uncertainties of the Data
Too often GIS projects fall prey to the problem of False Precision , that is reporting results to a level of
accuracy and precision unsupported by the intrinsic quality of the underlying data. Just because a system can
store numeric solutions down to four, six, or eight decimal places, does not mean that all of these are
significant. Common practice allows users to round down one decimal place below the level of
measurement. Below one decimal place the remaining digits are meaningless.
As examples of what this means, consider:
Population figures are reported in whole numbers (5,421, 10,238, etc.) meaning that calculations can be
carried down 1 decimal place (density of 21.5, mortality rate of 10.3).
If forest coverage is measured to the closest 10 meters, then calculations can be rounded to the closest 1
meter.
A second problem is False Certainty, that is reporting results with a degree of certitude unsupported by the
natural variability of the underlying data. Most GIS solutions involve employing a wide range of data layers,
each with its own natural dynamics and variability. Combining these layers can exacerbate the problem of
arriving at a single, precision solution. Sensitivity analysis (discussed above) helps to indicate how much
variations in one data layer will affect a solution. But GIS users should carry this lesson all the way to final
solutions. These solutions are likely to be reported in terms of ranges, confidence intervals, or rankings. In
some cases, this involves preparing high, low, and mid-range estimates of a solution based upon maximum,
minimum, and average values of the data used in a calculation.
You will notice that the case considered above pertaining an optimal site selection problem reported it's
results in terms of rankings. Each site was optimal in certain confined situations, but only a couple proved
optimal in more than one situation. The results rank the number of times each site came out ahead in terms
Managing Error
of total cost.
In situations where statistical analysis is possible, the use of confidence intervals is recommended.
Confidence intervals established the probability of solution falling within a certain range (i.e. a 95%
probability that a solutions falls between 100m and 150m).

7. References and Supplemental Reading
Chapter 14 in Bolstad, Paul. 2005. GIS Fundamentals: A First Text on Geographic Information Systems,
2nd. ed. White Bear Lake, MN: Eider Press.
Burrough, P.A. 1990. Principles of Geographical Information Systems for Land Resource Assessment.
Clarendon Press. Oxford.
Chapter 8 in Chang, Kang-tsung. 2006. Introduction to Geographic Information Systems, 3rd. ed. Boston:
McGraw Hill.

Chapter 4 in Lo, C.P. and Albert K.W. Yeung. 2002. Concepts and Techniques of Geographic Information
Systems. Upper Saddle River, NJ: Prentice Hall.

Chapter 6 in Longley, Paul A., Michael F. Goodchild, David J. Maguire, and David W. Rhind. 2005.
Geographic Informaiton Systems and Science, 2nd ed. Hoboken, NJ: Wiley.

USGS, National Mapping Program Standards, http://nationalmap.gov/gio/standards/
8. Examination and Study Questions.
Last revised on 2006.3.15. k.foote@colorado.edu

Managing Error

Managing Error
Full Table of Contents
Top of Module

1 The Problems of Error,
Accuracy and Precision

2 Setting Standards for
Procedures and Products

3 Documenting Procedures
and Products: Data Quality
Reports

4 Measuring and Testing
Products

5 Calibrating a Dataset to
Ascertain How Error
Influences Solutions

6 Report Results in Terms of
the Uncertainties of the Data

7 References and
Supplemental Reading

8 Examination and Study
Questions

These materials were developed by Kenneth E. Foote and Donald J. Huebner, Department of Geography,
University of Texas at Austin, 1996. These materials may be used for study, research, and education in not-
for-profit applications. If you link to or cite these materials, please credit the authors, Kenneth E. Foote
and Donald J. Huebner, The Geographer's Craft Project, Department of Geography, The University of
Colorado at Boulder. These materials may not be copied to or issued from another Web server without the
authors' express permission. Copyright 2000 All commercial rights are reserved. If you have comments
or suggestions, please contact the author or Kenneth E. Foote at k.foote@colorado.edu .
This page is also available in a framed version . For convenience we have provided a full Table of
Contents .
1. The Problems of Error, Accuracy and Precision
Managing error in GIS datasets is now recognized as a substantial problem that needs to be addressed in the
design and use of such systems. Failure to control and manage error can limit severely or invalidate the
results of a GIS analysis. Please see the module, Error, Accuracy, and Precision for an overview of the key
issues.
2. Setting Standards for Procedures and Products
No matter what the project, standards should be set from the start. Standards should be established for both
spatial and non-spatial data to be added to the dataset. Issues to be resolved include the accuracy and
precision to be invoked as information is placed in the dataset, conventions for naming geographic features,
criteria for classifying data, and so forth. Such standards should be set both for the procedures used to
create the dataset and for the final products. Setting standards involves three steps.
2.1. Establishing Criteria that Meet the Specific Demands of a Project
Standards are not arbitrary; they should suit the demands of accuracy, precision, and completeness
determined to meet the demands of a project. The Federal and many state governments have
established standards meet the needs of a wide range of mapping and GIS projects in their domain.
Other users may follow these standards if they apply, but often the designer must carefully establish
standards for particular projects. Picking arbitrarily high levels of precision, accuracy, and
completeness simply adds time and expense. Picking standards that are too low means the project
may not be able to reach its analytical goals once the database is compiled. Indeed, it is perhaps best
to consider standards in the light of ultimate project goals. That is, how accurate, precise, and
complete will a solution need to be? The designer can then work backward to establish standards for
the collection and input of raw data. Sensitivity analysis (discussed below) applied to a prototype can
also help to establish standards for a project.
2.2 Training People Involved to Meet Standards, Including Practice
The people who will be compiling and entering data must learn how to apply the standards to their
work. This includes practice with the standards so that they learn to apply them as a natural part of
http://www.colorado.edu/geography/gcraft/notes/manerror/manerror_f.html (1 de 5)09/08/2006 07:59:31 p.m.
Managing Error
their work. People working on the project should be given a clear idea of why the standards are being
employed. If standards are enforced as a set of laws or rules without explanation, they may be
resisted or subverted. If the people working on a project know why the standards have been set, they
are often more willing to follow them and to suggest procedures that will improve data quality.
2.3. Testing That the Standards Are Being Employed Throughout a Project and Are
Reached by the Final Products
Regular checks and tests should be employed through a project to make sure that standards are being
followed. This may include the regular testing of all data added to the dataset or may involve spot
checks of the materials. This allows to designer to pinpoint difficulties at an early stage and correct
them.
Examples of data standards:
G USGS, National Mapping Program Standards, http://nationalmap.gov/gio/standards/
G Information on the Spatial Data Transfer Standard, http://mcmcweb.er.usgs.gov/sdts/
G USGS Map Accuracy Standards, http://rockyweb.cr.usgs.gov/nmpstds/nmas.html
3. Documenting Procedures and Products: Data Quality Reports
Standards for procedures and products should always be documented in writing or in the dataset itself. Data
documentation should include information about how data was collected and from what sources, how it was
preprocessed and geocoded, how it was entered in the dataset, and how it is classified and encoded. On
larger projects, one person or a team should be assigned responsibility for data documentation.
Documentation is vitally important to the value and future use of a dataset. The saying is that an
undocumented dataset is a worthless dataset. By in large, this is true. Without clear documentation a dataset
can not be expanded and cannot be used by other people or organizations now or in the future.
Documentation is of critical importance in large GIS projects because the dataset will almost certainly
outlive the people who created it. That is, GIS for municipal, state, and AM/FM applications are usually
designed to last 50-100 years. The staff who enters the data may have long retired when a question arises
about the characteristics of their work. Written documentation is essential. Some projects actually place
information about data quality and quality control directly in a GIS dataset as independent layers. An
example of data quality reports is:
G Digital Elevation Model Standards, http://rockyweb.cr.usgs.gov/nmpstds/demstds.html
4. Measuring and Testing Products
GIS datasets should be checked regularly against reality. For spatial data, this involves checking maps and
positions in the field or, at least, against sources of high quality. A sample of positions can be resurveyed to
check their accuracy and precision. The USGS employs a testing procedure to check on the quality of its
digital and paper maps, as does the Ordnance Survey. Indeed, the Ordnance Survey continues periodically to
test maps and digital datasets long after they have first been compiled. If too many errors crop up, or if the
mapped area has changed greatly, the work is updated and corrected.
Non-spatial attribute data should also be checked either against reality or a source of equal or greater
quality. The particular tests employed will, of course, vary with the type of data used and its level of
measurement. Indeed, many different tests have been developed to test the quality of interval, ordinal, and
nominal data. Both parametric and nonparametric statistical tests can be employed to compare true values
Managing Error
(those observed "on the ground") and those recorded in the dataset.
Cohen's Kappa provides just one example of the types of test employed, this one for nominal data. The
following example shows how data on land cover stored in a database can be tested against reality.
See Attribute Accuracy and Calculating Cohen's Kappa
5. Calibrating a Dataset to Ascertain How Error Influences Solutions
Solutions reached by GIS analysis should be checked or calibrated against reality. The best way to do this is
check the results of a GIS analysis against the findings produced from completely independent calculations.
If the two agree, then the user has some confidence that the data and modeling procedure is valid.
This process of checking and calibrating a GIS is often referred to as Sensitivity Analysis. Sensitivity
analysis allows the user to test how variations in data and modeling procedure influence a GIS solution.
What the user does is vary the inputs of a GIS model, or the procedure itself, to see how each change alters
the solution. In this way, the user can judge quite precision how data quality and error will influence
subsequent modeling.
This is quite straight forward with interval/ratio input data. The user tests to see how an incremental change
in an input variable changes the output of the system. From this, the user can derive "marginal sensitivity" to
an input and establish "marginal weights" to compensate for error.
But sensitivity analysis can also be applied to nominal (categorical) and ordinal (ranked) input data. In these
cases, data may be purposefully misclassified or misranked to see how such errors will change a solution.
Sensitivity analysis can also be used during system design and development to test the levels of precision
and accuracy required to meet system goals. That is, users can experiment with data of differing levels of
precision and accuracy to see how they perform. If a test solution is not accurate or precise enough in one
pass, the levels can be refined and tested again. Such testing of accuracy and precision is very important in
large GIS projects that will generated large quantities of data. In is of little use (and tremendous cost) to
gather and store data to levels of accuracy and precision beyond what is needed to reach a particular
modeling need.
Sensitivity can also be useful at the design stage in testing the theoretical parameters of a GIS model. It is
sometimes the case that a factor, though of seemingly great theoretical importance to a solution, proves to be
of little value in solving a particular problem. For example, soil type is certainly important in predicting crop
yields but, if soil type varies little in a particular region, it is a waste of time entering into a dataset designed
for this purpose. Users can check on such situations by selectively removing certain data layers from the
modeling process. If they make no difference to the solutions, then no further data entry needs to be made.
To see how sensitivity analysis might be applied to a problem concerned with upgrading a municipal water
system, go to the following section on Sensitivity Analysis.
In closing this example, it is useful to note that the results were reported in terms of ranking. No single
solution was optimal in all cases. Picking a single, best solution might be misleading. Instead, the sites are
simply ranked by the number of situations in which each comes out ahead.
6. Report Results in Terms of the Uncertainties of the Data
Too often GIS projects fall prey to the problem of False Precision , that is reporting results to a level of
Managing Error
accuracy and precision unsupported by the intrinsic quality of the underlying data. Just because a system can
store numeric solutions down to four, six, or eight decimal places, does not mean that all of these are
significant. Common practice allows users to round down one decimal place below the level of
measurement. Below one decimal place the remaining digits are meaningless.
As examples of what this means, consider:
Population figures are reported in whole numbers (5,421, 10,238, etc.) meaning that calculations can be
carried down 1 decimal place (density of 21.5, mortality rate of 10.3).
If forest coverage is measured to the closest 10 meters, then calculations can be rounded to the closest 1
meter.
A second problem is False Certainty, that is reporting results with a degree of certitude unsupported by the
natural variability of the underlying data. Most GIS solutions involve employing a wide range of data layers,
each with its own natural dynamics and variability. Combining these layers can exacerbate the problem of
arriving at a single, precision solution. Sensitivity analysis (discussed above) helps to indicate how much
variations in one data layer will affect a solution. But GIS users should carry this lesson all the way to final
solutions. These solutions are likely to be reported in terms of ranges, confidence intervals, or rankings. In
some cases, this involves preparing high, low, and mid-range estimates of a solution based upon maximum,
minimum, and average values of the data used in a calculation.
You will notice that the case considered above pertaining an optimal site selection problem reported it's
results in terms of rankings. Each site was optimal in certain confined situations, but only a couple proved
optimal in more than one situation. The results rank the number of times each site came out ahead in terms
of total cost.
In situations where statistical analysis is possible, the use of confidence intervals is recommended.
Confidence intervals established the probability of solution falling within a certain range (i.e. a 95%
probability that a solutions falls between 100m and 150m).

7. References and Supplemental Reading
Chapter 14 in Bolstad, Paul. 2005. GIS Fundamentals: A First Text on Geographic Information Systems,
2nd. ed. White Bear Lake, MN: Eider Press.
Burrough, P.A. 1990. Principles of Geographical Information Systems for Land Resource Assessment.
Clarendon Press. Oxford.
Chapter 8 in Chang, Kang-tsung. 2006. Introduction to Geographic Information Systems, 3rd. ed. Boston:
McGraw Hill.

Chapter 4 in Lo, C.P. and Albert K.W. Yeung. 2002. Concepts and Techniques of Geographic Information
Systems. Upper Saddle River, NJ: Prentice Hall.

Chapter 6 in Longley, Paul A., Michael F. Goodchild, David J. Maguire, and David W. Rhind. 2005.
Geographic Informaiton Systems and Science, 2nd ed. Hoboken, NJ: Wiley.

USGS, National Mapping Program Standards, http://nationalmap.gov/gio/standards/
8. Examination and Study Questions.
Managing Error
Last revised on 2006.3.15. k.foote@colorado.edu


Error Exactitud Precisión

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Error Exactitud Precisión

Transféré par

Droits d'auteur :

Formats disponibles

Error, Exactitud, y Precisin

Error, Exactitud y Precisin.

Vous aimerez peut-être aussi