Vous êtes sur la page 1sur 10

CHAPTER 1

Data Management
Daniel J. McCarthy
CONTENTS
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 Database Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.3 Geographic Information Systems Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.4 Understanding Data Management Needs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
1.5 Data Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.5.1 Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.6 Temporal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
1.7 Data Validation and Verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
1.8 Data Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
1.8.1 Querying Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
1.8.2 Reporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.9 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
1.1 INTRODUCTION
Data management encompasses many tasks, priorities,
and decisions. Underlying these activities is the need
for an accurate data from sampling and monitoring
programs designed to measure the effects of oper-
ational activities. To understand the value of good data
management, it is helpful to understand the nature of
how this information is generated and used to support
management decisions.
So what is data? The American Heritage Dictionary
denes data as factual information, especially infor-
mation organized for analysis or used to reason or make
decisions, or values derived from scientic experiments.
Scientic professionals generate huge quantities of data
every day. It is estimated that scientists spend 80% of
their time managing the data and 20% analyzing and
interpreting. By establishing sound data management
practices, more time can be spent in data analysis and
interpretation.
Throughout this chapter, we will provide examples of
data management practices within the context of an
investigation of contaminated groundwater and surface
water. These practices are directly applicable to
managing other types of data, such as those found in
this book.
1.2 DATABASE OVERVIEW
A discussion of data management would be incomplete
without a general discussion of databases.
A very general denition of a database might be A
collection of related items of information contained on
1-1
q 2006 by Taylor & Francis Group, LLC
various media organized in a way that allows easy
search and retrieval of subsets of the items of
information. Note that, strictly speaking, a database
does not have to be electronic: Boxes containing recipes,
telephone books, or paper address books are all
databases. A database used for environmental purposes
might be composed of a combination of paper copies of
information along with items of information contained in
electronic form, with perhaps some sort of paper or
electronic index to or inventory list of all of the data.
Electronic data are nearly always organized into
tables. Consider the example shown in Table 1.1.
Each row of this table represents a single data point;
in this case the rst row provides data about location
SW-1, and only SW-1. Each column of this table
represents a type of data that is stored for each row. In
our example, the Area of Concern column identies the
spatial group that each location belongs to.
The rows in a database table are typically called
records, while the columns are called elds. Thus, a
useful denition of an electronic database is A
collection of related items of data organized into one or
more tables. Each eld is constrained to a single data
type. Table 1.2 lists the most common data types.
Electronic databases are typically either at le
databases, in which the entire database resides in a
single table, or relational databases, in which the data
are distributed into more than one table, which are then
linked together by a common key eld. The tables will be
related to one another according to a one-to-one (each
record in one table has a single matching record in the
other table) or one-to-many relationship (each record in
one table may have one or more matching records in the
other table, but not the reverse).
One signicant advantage of relational databases to
at-le databases is the ability to query the data in
different ways. A query is dened as a statement to
retrieve database records that match certain criteria. By
structuring the query statement a certain way, different
information can be returned from the data set.
The most popular general-purpose software for
managing data is an electronic spreadsheet program
such as Microsoft Excel. Electronic spreadsheets are
excellent tools for managing electronic data that t in a
single table. However, spreadsheets are cumbersome or
inadequate tools for managing relational data, where
more than one table is required. For managing
relational data, other more powerful data management
programs should be used. There are many popular
relational database management systems available,
including Microsoft Access. It should be noted that,
in contrast to at-le databases, which typically can be
managed by the casual computer user, large relational
databases require management by trained individuals,
and will usually be beyond the capabilities of the
casual user.
The eld of database management is continually in
ux, and would have changed by the time this book is
published. Thus, it is impossible to cover all the facets of
data management and database theory here. However, it
can be said that the relational database model over-
whelmingly dominates large-scale data management and
database theory. For further information about relational
databases, the reader is directed to any of the numerous
references on this subject. A particularly helpful book
designed for the casual database user is by Michael J.
Hernandez (2003) entitled Database Design for Mere
Mortals: A Hands-On Guide to Relational Database
Design, AddisonWesley Developers Press.
Table 1.1 Groundwater and Surface Water Location Data
Location ID Area of Concern x y z
SW-01 Upstream 1,75,470.994 16,35,550.124 100.203
SW-02 Upstream 1,77,126.487 16,35,925.814 100.102
SW-03 Outfall 1 1,77,047.029 16,35,676.853 100.00
SW-04 Downstream 1,76,871.093 16,35,674.137 98.97
SW-05 Downstream 1,75,790.418 16,35,597.208 97.96
MW-01 Background area 1,74,345.077 16,32,431.087 96.597
MW-02 Oil storage area 1,74,251.127 16,32,466.059 97.384
MW-03 Oil storage area 1,74,690.942 16,31,435.707 97.384
Table 1.2 Common Data Types
Data Type Description or Example
Integer Typically stores numbers that relate to counts
quantities, or, ID numbers
Decimal Numbers with fractional parts such as
percentages or rates
Floating point Numbers with a scientic notation that can be
calculated approximately, such as distance
or weight
Fixed length
character
Names, descriptions, addresses
Date/Time Storage of date and/or time or intervals of
dates or times
Boolean Explicit constraints (Yes/No, True/False) or
logical constraints (AND/OR)
Unstructured
data
Images, video, audio
THE WATER ENCYCLOPEDIA: HYDROLOGIC DATA AND INTERNET RESOURCES 1-2
q 2006 by Taylor & Francis Group, LLC
1.3 GEOGRAPHIC INFORMATION SYSTEMS
OVERVIEW
Data are often presented in a tabular format. Some-
times, a visual representation is helpful in drawing
conclusions, particularly if the data have a spatial
component. A Geographical Information System (GIS)
is a way to display information with a spatial
component. GIS can be dened as a software package
that manages and displays information in a database
composed of data that are associated with spatial
information. That is to say, there will be both tabular
and spatial information in the database, it will be
possible to query the database for specic data, and
the user will be able to display the data spatially, as
a map.
The software package comprising the GIS may be a
single program, a set of programs from a single vendor,
a combination of programs that together constitute the
GIS, a custom-programmed software package, or any
combination thereof.
In the groundwater arena, the tabular data could
typically be depth to water data, water table elevation
data, water chemistry data, and water quality data. In
addition, the tabular data will have some spatial
component in either two or three dimensions, as xyz
coordinates. Spatial data, in addition to the xyz
coordinates mentioned above, often will include digitally
processed air or satellite photographs, computer-aided
design (CAD) drawings, or other electronic spatial
entities. Clearly, depth to water data and water chemistry
data, which can be displayed in both plan view (two-
dimensional) and side view (three-dimensional), are well
suited to management using a GIS. Table 1.3 is an
example of data from a GIS system representing depth
to groundwater.
When generated in a GIS system in plan view, the
view can look like Figure 1.1.
The primary attraction of a GIS is the ability to
manage, query and display a large amount of data
spatially, in real time. Using most GISs, the user
can view the data on a map, query for a subset of the
data while viewing the map, and then see the distribu-
tion of the subset data when the map is refreshed.
This process can be repeated as many times as necessary
to answer a question. For example, the temporary
wells in the previous gure were not measured
during the August sampling event. By querying the
GIS system only for locations that were measured
during the August event, we return a subset of the data,
shown in Table 1.4.
From this set of data, our map would look like
Figure 1.2.
The literature on GISs is voluminous. Because
GISs have a wide applicability throughout many
disciplines, the reader is directed to the internet,
where search engines associated with any popular
internet portal (Yahoo! or America Online, for
example) may be used to nd literally thousands of
references on the subject.
Table 1.3 Depth to Groundwater Data from GIS System
Well_Id Loc_Type X_Coord Y_Coord Date Gwelev
MW131 Monitor well 25,48,766.73000 3,19,202.76570 8/17/2004 19.21000
MW132 Monitor well 25,48,671.72700 3,19,228.04840 8/17/2004 20.38000
MW133 Monitor well 25,48,677.58100 3,19,285.61010 8/17/2004 19.23000
MW134 Monitor well 25,48,740.41300 3,19,287.73680 8/17/2004 18.13000
MW135 Monitor well 25,48,668.82800 3,19,335.81440 8/17/2004 18.78000
MW136 Monitor well 25,48,747.90500 3,19,346.29610 8/17/2004 16.71000
MW137 Monitor well 25,48,704.97300 3,19,107.44400 8/17/2004 21.40000
TW-161 Temp monitor well 25,48,677.70000 3,18,512.69000 8/17/2004 NM
TW-162 Temp monitor well 25,49,393.00000 3,17,485.00000 8/17/2004 NM
TW-163 Temp monitor well 25,49,412.00000 3,17,501.00000 8/17/2004 NM
TW-164 Temp monitor well 25,48,775.80000 3,19,271.63000 8/17/2004 NM
TW-165 Temp monitor well 25,48,788.62000 3,19,196.71000 8/17/2004 NM
TW-166 Temp monitor well 25,48,656.95000 3,19,215.44000 8/17/2004 NM
TW-167 Temp monitor well 25,48,690.37000 3,19,153.66000 8/17/2004 NM
TW-168 Temp monitor well 25,48,618.73000 3,19,285.76000 8/17/2004 NM
TW-169 Temp monitor well 25,48,672.29000 3,19,291.02000 8/17/2004 NM
TW-170 Temp monitor well 25,48,722.90000 3,19,286.09000 8/17/2004 NM
TW-171 Temp monitor well 25,48,745.57000 3,19,224.97000 8/17/2004 NM
TW-172 Temp monitor well 25,48,758.72000 3,19,164.83000 8/17/2004 NM
TW-173 Temp monitor well 25,48,304.43000 3,19,248.26000 8/17/2004 NM
TW-174 Temp monitor well 25,48,314.10000 3,19,441.75000 8/17/2004 NM
TW-175 Temp monitor well 25,48,247.26000 3,19,666.89000 8/17/2004 NM
DATA MANAGEMENT 1-3
q 2006 by Taylor & Francis Group, LLC
1.4 UNDERSTANDING DATA MANAGEMENT
NEEDS
Initially designing a data management program for an
investigation or experiment requires signicant scientic
expertise. However, after design and implementation,
the processes generally follow a well-dened and
straightforward cycle. In our groundwater contamination
example, samples are routinely collected and sent to
laboratories where they are analyzed with the results
reported in the form of a hard-copy analytical results
report. From this point on, the data are put to multiple
uses to meet a variety of needs. Some portion of the data
are collected and reported under the requirements of
environmental permits, while additional data are
generated voluntarily to further the objectives of sound
environmental management. Accountability for effec-
tively managing the collection and utilization of this
information according to well-dened processes within
MW135
18.78
MW136
16.71
TW-168
TW-169
TW-170
TW-171
TW-172
TW-165
N
TW-167
0 30 60 120 180
Feel
TW-166
TW-164
MW133
19.23
MW134
18.13
MW131
19.21
MW138
20.38
MW137
21.4
17
19
2
0
2
1
18
Legend
Monitoring well
Temporary monitoring well
Groundwater elevation (ft MSL)
Groundwater elevation contour
(dashed where inferred)
21.07
18
Figure 1.1 Example of groundwater elevation data from GIS System.
Table 1.4 Subset of Table 1.3 Data for August Events Only
Well_Id Loc_Type X_Coord Y_Coord Date Gwelev
MW131 Monitor well 25,48,766.73000 3,19,202.76570 8/17/2004 19.21000
MW132 Monitor well 25,48,671.72700 3,19,228.04840 8/17/2004 20.38000
MW133 Monitor well 25,48,677.58100 3,19,285.61010 8/17/2004 19.23000
MW134 Monitor well 25,48,740.41300 3,19,287.73680 8/17/2004 18.13000
MW135 Monitor well 25,48,668.82800 3,19,335.81440 8/17/2004 18.78000
MW136 Monitor well 25,48,747.90500 3,19,346.29610 8/17/2004 16.71000
MW137 Monitor well 25,48,704.97300 3,19,107.44400 8/17/2004 21.40000
THE WATER ENCYCLOPEDIA: HYDROLOGIC DATA AND INTERNET RESOURCES 1-4
q 2006 by Taylor & Francis Group, LLC
standardized software environments is essential to
effective environmental management.
Problems with data collection can be mitigated with a
data management plan. This plan will typically specify
how data are to be labeled and categorized, the format
that the data are to be stored in, how to handle data
collected over a period of time or over a signicant
geographic area, and procedures to account for changes
in the investigation and experiment. The advantage to
having a data management plan is that it allows the users
to collect, label, and record data in a consistent manner.
By doing so, retrieval of that data does not have to take
into account bias on the part of the individual collecting
the data. Consistent terms, units, methods and
procedures, will allow any user to retrieve data
accurately and quickly.
1.5 DATA CATEGORIZATION
1.5.1 Spatial Data
Data found in this book are often of two distinct types:
data with a spatial representation and that of a temporal
representation. Water quality, for example, can be
represented as changing over an area based on land use,
and can also be represented as changing over time due to
urban development changing the drainage pathways.
It is often helpful to have a dened nomenclature
when collecting and categorizing data. This nomencla-
ture makes it easy to glean basic information from the
raw data, as well as expediting queries from the data
management system. As an example, consider the
following spatial data, shown in Table 1.5.
MW135
18.78
MW136
16.71
MW134
18.13
MW133
19.23
MW132
20.38
MW131
19.21
MW137
21.4
0 30 60 120 180
Feel
N
17
18
19
2
0
2
1
Legend
Monitoring well
21.07
18
Groundwater elevation (ft MSL)
Groundwater elevation contour
(dashed where inferred)
Figure 1.2 Groundwater elevation data from August only.
DATA MANAGEMENT 1-5
q 2006 by Taylor & Francis Group, LLC
Consistent nomenclature for identifying spatial
locations is crucial to maintaining data integrity,
particularly where there is sensitivity to geographic,
political, and physical boundaries. In addition, each
location should be unique in order to maintain referential
integrity within the data management system. This unique
value in this table is referred to as a primary key. The
location IDs are coded in such a way to let the reader know
at a glance what type of location it is. Acceptable
locations types in this example include:
The Area of Concern column provides a general
categorization of where the location occurs within the
context of the immediate surroundings. This is useful for
queries that would ask for all data found in a particular
site-specic area.
The example table also presents spatial data in the
form of x, y, and z coordinates. Spatial data for each
location should be collected in a form that is consistent
with use at the site, but that can also reference features
that are located nearby, such as surface water bodies,
wetland areas, or other physiographic features. Use of a
consistent coordinate system will make sure that all the
data can be compared to each other.
Examples of coordinate systems include latitude,
longitude and height, Universal Transverse Mercator
(UTM), Earth Centered, Earth Fixed Cartesian (ECEF),
and State Plane coordinates. In our example, the
coordinates are in State Plane Coordinates. In the United
States, the State Plane System was developed in the
1930s and was based on the North American Datum 1927
(NAD27), which are based on the foot. A more recent
variation is the NAD83 system, which is based on the
North American 1983 datum and is based on the meter.
The State Plane System was developed to provide local
references tied to a national datum. Most USGS 7.5
Minute Quadrangles use several coordinate system grids
including latitude and longitude, UTM kilometer tic
marks, and applicable State Plane coordinates.
1.6 TEMPORAL DATA
With the establishment of the spatial locations, data
collected over time can be collected and referenced.
Consider Table 1.6, which summarizes data from
samples collected from our groundwater and surface
water locations.
Consistent nomenclature for sample identications is
crucial to maintaining data integrity. And sample
identications should be unique in order to maintain
referential integrity. Character limits are often in place
in database management systems; therefore, care should
be taken in minimizing spaces, dashes, or parentheses. In
our example, the date of collection is captured in the
sample ID in parentheses, which allows each sample to
be unique.
By including the Location ID in this table, we
establish a relationship to the previous table. This
relationship allows us to query the data in different
ways. Because the relationship is of one location to many
samples, this is referred to as a one-to-many relationship.
The Location ID in this table is referred to as a foreign
key because it matches primary key values in our spatial
data table presented earlier. Together the primary and
foreign keys create a parent/child relationship, which is
at the heart of relational database systems.
The Sample Type column provides an identier to
discriminate individual samples from each other based
on quality assurance needs. In this example, all of the
samples have a normal N type. If duplicate samples
were required for quality assurance purpose to check on
the validity of the data set, one could identify the sample
type as a D for duplicate.
The Sample Matrix column identies what the
medium collected for each sample was, through
Location Type
Identier
Location Type
Description
MW Monitoring well
SW Surface water
Table 1.5 Spatial Data for Groundwater and Surface Water
Locations
Location
ID
Area of
Concern x y z
SW-01 Upstream 1,75,470.994 16,35,550.124 100.203
SW-02 Upstream 1,77,126.487 16,35,925.814 100.102
SW-03 Outfall 1 1,77,047.029 16,35,676.853 100.00
SW-04 Downstream 1,76,871.093 16,35,674.137 98.97
SW-05 Downstream 1,75,790.418 16,35,597.208 97.96
MW-01 Background
area
1,74,345.077 16,32,431.087 96.597
MW-02 Oil storage
area
1,74,251.127 16,32,466.059 97.384
MW-03 Oil storage
area
1,74,690.942 16,31,435.707 97.384
Table 1.6 Sample Information from Groundwater and
Surface Water Locations
Sample
ID
Location
ID
Sample
Type
Sample
Matrix
Sample
Date
Sample
Time
SW-1(081602) SW-1 N WS 8/16/2002 13:00
SW-2(081602) SW-2 N WS 8/16/2002 13:10
SW-3(081602)) SW-3 N WS 8/16/2002 13:20
SW-4(081602) SW-4 N WS 8/16/2002 13:30
SW-5(081602) SW-5 N WS 8/16/2002 14:31
MW-01(111301) MW-01 N WG 11/13/2001 15:22
MW-01(111401) MW-01 N WG 11/14/2001 16:31
MW-01(111501) MW-01 N WG 11/15/2001 11:00
THE WATER ENCYCLOPEDIA: HYDROLOGIC DATA AND INTERNET RESOURCES 1-6
q 2006 by Taylor & Francis Group, LLC
an abbreviated two-digit code. Examples of these kinds
of matrices are:
The sample date and time columns allow the data to
be sorted from more recent to historical and can provide a
context for how the data changes over time.
Continuing with our groundwater contamination
example, consider Table 1.7, which summarizes the data
obtained from an analytical laboratory.
The primary key (Sample ID) is present again to
establish the relationship back to the other tables. The
other columns relate information related to the analyses
performed on the samples and the results.
Each of these records represents a concentration of a
given chemical at a given location for a specic point in
time. As chemical concentrations, detection limits and
detections change, those data points would be represented
as new records in the database. This can be illustrated in
the following query result, shown in Table 1.8.
1.7 DATA VALIDATION AND VERIFICATION
Once data are collected into the database, steps must be
taken to ensure that it was collected accurately and is
representative of the source. Data verication checks the
compliance of the collected data against known
requirements of the experiment or investigation.
For example, using the data set shown above, we
would verify with the analytical laboratory that the
groundwater samples were analyzed according to
specic methods, such as the use of the calibration
samples for laboratory equipment. If a verication check
fails, then the data may be considered suspect.
Data validation, by contrast, must take into account
the suitability of the data, and must take into account how
the data were collected, how the data were analyzed, and
nally, based on the results of the review of the
collection and analysis processes, how the data should
be used. If the collection process is found to be awed,
the data might be discarded or used for qualitative
purposes only. The United States Environmental Protec-
tion Agency has provided guidance documents on
validating data, which can be found at www.epa.gov/
superfund/programs/clp/guidance.htm.
1.8 DATA REPORTING
Once the data have been collected and validated into the
data management system, it is time to retrieve the data
and make some conclusions about its meaning. Data
retrieval usually takes the form of querying the data and
then reporting the data.
1.8.1 Querying Data
A query is a programmatic statement that asks a question
about the data. Each query usually species a criterion,
which is a condition or test that must be met in order for a
given record to be selected. Queries produce subsets of
data, which are the records that match the conditions of
the query. For example, a query might request the
locations where a certain chemical, e.g. carbon tetra-
chloride, exceeds a certain concentration, e.g. 500 mg/L.
This query would produce a result like the one presented
below in Table 1.9.
Sample Matrix Matrix Description
WB Water collected from borehole or during
geoprobe investigation
WE Estuary water
WG Ground water
WL Leachate
WO Ocean water
WP Drinking water
WQ Water quality control matrix
WS Surface water
WW Waste water
Table 1.7 Analytical Data Summary
Sample ID Matrix SDG Lab Method Chemical Result RDL Detect Unit
SW-1(081602) WG 884825 SW8260 Carbon disulde 2.5 N mg/L
SW-1(081602) WG 884825 SW8260 Xylene (total) 2.5 N mg/L
SW-1(081602) WG 884825 SW8260 Ethylbenzene 0.2 N mg/lL
SW-1(081602) WG 884825 SW8260 Carbon tetrachloride 670 25 Y mg/L
Table 1.8 Additional Analytical Data
Sample ID Matrix SDG Sample Date Lab Method Chemical Result RDL Detect Unit
SW-1(081602) WG 884825 8/16/2002 SW8260 Carbon tetrachloride 670 25 Y mg/L
SW-1(111602) WG 884825 11/16/2002 SW8260 Carbon tetrachloride 340 25 Y mg/L
SW-1(011603) WG 884825 01/16/2003 SW8260 Carbon tetrachloride 100 5 Y mg/L
SW-1(031603) WG 884825 03/16/2003 SW8260 Carbon tetrachloride 5 N mg/L
DATA MANAGEMENT 1-7
q 2006 by Taylor & Francis Group, LLC
This type of query is referred to a selection query; it
returns the selection based on the criteria. Other types
of queries, available in Microsoft Access for example,
include action queries and crosstab queries. Action
queries change record information by specifying
criteria and changing the values in given elds based
on those criteria. An example of an action query would
be the application of data qualiers following a
rigorous validation of the collected data. For example,
in our surface water example, the data validation
process might uncover that the analytical laboratory
initially reported the reported detection limit of carbon
tetrachloride as 25 mg/L but in fact it should have been
100 mg/L. An action query could be used to specify the
criteria (Sample SW-1(081602) and carbon tetrachlor-
ide) and correct the reported detection limit (RDL)
from 25 to 100.
Crosstab queries perform aggregate calculations on
the value of a eld, using one or more other elds as rows
and one elds data as columns. For example, consider
the data set shown in Table 1.10.
An analysis of this data might help rene a remedial
course of action for impacted groundwater. Useful
calculations to perform would be the average and the
maximum values of each of the constituents. A crosstab
query of this data would generate the following
results, shown in Table 1.11.
1.8.2 Reporting Data
Once data have been retrieved through a query process, it
can be reported in tabular format, like those found in this
chapter and elsewhere in this book, graphically in the
form of charts and graphs, or if there is a spatial
representation to the data, as a gure in a GIS system.
1.9 METADATA
The formal denition of metadata is simply data about
data. Metadata is the information about a data source,
for example, a book contains information, but there is
also information about that book such as the author and
publisherthis is the metadata.
Metadata in the context of this book can be used to
describe how a particular data table was assembled, who
collected the data, what method was used to collect and
aggregate the data, the sources of the data. In our
groundwater contamination example, metadata might
include who did the surveying for the locations, the date
of that survey, and the coordinate system specication.
For the sample data, the metadata might consist of the
following components, shown in Table 1.12.
Table 1.10 Data Summary for Crosstab Query
Loc ID AoC Nitrate Iron
Sulfate
Concentration
SW-01 Upstream 200 599 100.203
SW-02 Upstream 250 342 100.102
SW-03 Background area 300 105 100
SW-04 Downstream 100 20 98.97
SW-05 Downstream 50 40 97.96
MW-01 Background area 75 50 96.597
MW-02 Oil storage area 60 65 97.384
MW-03 Oil storage area 35 10 97.384
Table 1.11 Crosstab Query Results
Area of Concern
Data
Background
Area Downstream
Oil Storage
Area Upstream
Average of nitrate 187.5 75 47.5 225
Average of iron 77.5 30 37.5 470.5
Average of sulfate
concentration
98.2985 98.465 97.384 100.1525
Max of nitrate 300 100 60 250
Max of iron 105 40 65 599
Max of sulfate
concentration
100 98.97 97.384 100.203
Table 1.9 Query Results for Carbon Tetrachloride Concentrations over 500 mg/L
Sample ID Matrix SDG Lab Method Chemical Result RDL Detect Unit
SW-1(081602) WG 884825 SW8260 Carbon tetrachloride 670 25 Y mg/L
Table 1.12 Examples of Metadata Components
Field Name Field Description
COC
num
Chain of custody identier
Sent_to_lab_date Date sample was sent to lab
Sample_receipt_date Date that sample was received at
laboratory
Sampler Name or initials of sampler
Sampling_company Name or initials of sampling company
Sampling_reason Reason for sampling
Sampling_technique Sampling technique
Task_code Code used to identify the task under
which sample was retrieved
Collection_quarter Quarter of the year sample was
collected (e.g., "1Q96")
Composite_yn Boolean eld used to indicate whether
a sample is a composite sample
Composite_desc Description of composite sample (if
composite_yn is YES)
THE WATER ENCYCLOPEDIA: HYDROLOGIC DATA AND INTERNET RESOURCES 1-8
q 2006 by Taylor & Francis Group, LLC
Within the context of GISs, metadata almost always
refers to data about digital geospatial data. Throughout
this discussion, the term metadata is used under this
restricted denition to refer to the content, quality,
condition, and other characteristics of digital geospatial
data.
Metadata for electronic images should include, as a
minimum:
How the image was created
Who created it originally
What has been done to enhance the image
Coordinate system to which the image has been
rectied
Projection system to which the image has been
rectied
Other information unique to these particular
images
With regard to electronic spatial data, in the United
States, efforts are being made to try to develop a
universal standard format for metadata for GIS systems.
This would make accessing and using metadata much
easier than it is today. Toward this end, the Federal
Geographic Data Committee (FGDC) has approved a
standard for metadata. Development of the standard was
a part of the development of the National Spatial
Data Infrastructure.
The standard is known as the Content Standard
for Digital Geospatial Metadata (Version 2). It may
be downloaded as a pdf le from the FGDC web
site (www.fgdc.gov/standards/documents/standards/
metadata/v2_0698.pdf).
While this standard is intended to facilitate the use of
metadata and associated data (particularly images), it
appears to not have been universally accepted outside of
U.S. government bodies. Accordingly, within this
denition metadata may take many forms, ranging
from simple (on the paper map, the map legend is the
metadata) to separate electronic text les, often multi-
page, associated with a specic electronic aerial or
satellite photo image.
The primary use of metadata is to correlate spatial
image information, particularly aerial photos, satellite
photos, and computer aided design (CAD) images, in
three-dimensional space, with tabular data obtained in
the eld. There are, thus, two aspects to this process.
The rst aspect is the creation of an electronic base
map image on which tabular data may be electronically
posted. Typically, the user will need to overlay images
together (a CAD image superimposed on an aerial photo,
for example) or combine smaller images together to
create larger maps, or some combination of both. In this
process, it is often possible to obtain access to additional
tabular data already associated with (i.e. linked to, or
posted on) images.
The second aspect is creating or obtaining the
tabular data, and orienting it in two- or three-
dimensional space. Tabular data are usually oriented
in three-dimensional space according to surveys taken
by hand on the ground, either using traditional
surveying methods (compass, transit, etc) or global
positioning satellite data. Because of the availability
of extremely inexpensive receivers, GPS is rapidly
becoming the dominate tool to obtain two- and three-
dimensional positioning data in the eld.
Typically the orientation metadata associated with
the spatial data must be used to convert (reproject or
recoordinate) the spatial data to conform to the
orientation data associated with the tabular data, or
vice-versa, or both. This process of correlation is called
rectication. Metadata are thus typically used to rectify
the spatial image data to the tabular data.
1.10 CONCLUSIONS
The data management process is intended to reduce the
amount of time spent on manipulating data and increase
the level of utility of the data to the end users. The
goals of the data management process should be as
follows:
Understand your data needs and how you are
collecting it;
Have a plan to accurately categorize all facets of
your data;
Promote accuracy of data through validation and
verication;
Promote consistency of data querying and
reporting; and
Understand the role of metadata in GIS.
The data presented in this book are categorized in a
way to facilitate effective data management, and through
querying and reporting, would be a veriable and
validated source of data for experimentation and
investigation purposes.
q 2006 by Taylor & Francis Group, LLC
DATA MANAGEMENT 1-9

Vous aimerez peut-être aussi