Vous êtes sur la page 1sur 3

Time Is On My Side

If you do much data analysis it won’t be long before you work with data measured over a range
of times. When you do see time-series data, you’ll find that time scales and time units have some
very quirky properties.
Time after Time
You might think that time is measured on a ratio scale given its ever finer divisions (i.e., hours,
minutes, seconds). Yet it doesn’t make sense to refer to a ratio of two times any more than the
ratio of two location coordinates. The starting point is also arbitrary. So time clearly isn’t
measured on a ratio scale but it can be measured on interval or ordinal scales. Time units are also
used for durations; however durations can be measured on a ratio scale. Durations can be used in
ratios and they have a starting point of zero.
Time measurements can be linear or cyclic. Year is linear, and
can be measured on either an interval scale or an ordinal scale.
For example, the year 1953 can be expressed as an integer
(ordinal scale) or a decimal (interval scale). Furthermore, all
values of linear time are unique. The year 1953 happened once
and will never recur. Linear time is like a river. You start at
some point and go with the flow. You can’t get back to your
starting point, but it still exists somewhere in time.
Katmandu, I’ll soon be seeing youSome time scales repeat. If day one is a Monday, then so is day
and your strange bewildering time
will keep me home. Cat Stevens eight. Likewise, month one is the same as month thirteen. So
time can also be treated as being measured on a repeating
ordinal scale. Durations don’t repeat; one day isn’t the same as eight days.
Does Anybody Really Know What Time It Is?
Most measurement scales are based on factors of ten. With time, though, there are 60 seconds
per minute, 60 minutes per hour, and 24 hours per day. Blame the Babylonians for starting this
craziness and every civilization for the next 4,000 years for being content with the status quo. In
contrast, calendars have evolved from the Hellenic calendar (~850 BC), the Roman calendar
(~750 BC), the Julian calendar (46 BC), to the Gregorian calendar (1582).
Everybody knows about seconds, minutes, hours, days, months, years, and even decades,
centuries, and millennia, but there are many other units used for time. A jiffy is either one tick of
a computer’s system clock (about 0.01 second) or the time required for light to travel one
centimeter (about 33.3564 picoseconds). A New York second is the time between when a traffic
signal turns from red to green and when the driver behind you honks his horn, about a second
and a half. An inna minute is the time between when you ask a teenager to do something and the
time he or she complies, usually about ten to thirty minutes. A warhol is being famous for fifteen
minutes; a kilowarhol is being famous for approximately ten days. A moment is a medieval unit
of time equal to about a minute and a half. A fortnight is two weeks. A platonic year is an
astronomical unit measuring the time required for planets to align (about 26,000 calendar years).
There have been several systems in which time units were based on factors of ten, most notably
by the Chinese (before the 17th century) and in France (during the 18th century). Decimal time
divided a day (i.e., one rotation of the earth) into 10 metric hours, each hour into 100 metric
minutes, and each minute into 100 metric seconds, sometimes termed a blink. A blink is 0.864
standard second, which is about twice the time it takes for you to blink your eye (from
www.neatorama.com/2009/01/30/fun-and-unusual-units-of-measurements/)
Then there’s geologic time, which is subdivided into eon, eras, periods, epochs, and ages. The
divisions are based on the rocks that were formed at the time and the fossils that occur within
them. Consequently, the divisions aren’t all the same lengths and there aren’t the same number
subdivisions in each division. For example, the Paleozoic era is twice as long as the Mesozoic
era, and four times longer than the Cenozoic era (which admittedly is still in progress). Likewise,
some periods are four times longer than others. Moreover, the lengths of the divisions can
change as more is learned about the history of the Earth. The units of the scale are also different
in different parts of the world. Geologic time is an ordinal scale devised because measurements
of the interval scale on which it is based (i.e., years) lacks accuracy and precision.
Astronomical time is confusing, relatively, and it’s different if you’re on board the Enterprise or
the Galactica. So the point is this—measuring time is complicated, not to mention time-
consuming. But there’s even more to it than that.

Time Of The Season


Selecting an appropriate time scale is especially important because the scale can dictate the
resolution and types of analyses that can be done. Resolution is an important matter. Select an
interval that is too small and your database may become unmanageably large. Select an interval
that is too large and you may not have enough resolution to investigate the time unit you are
interested in. A good rule-of-thumb is to select an interval that is at least one time unit smaller
than your unit of interest. For example, if you are interested in yearly trends, collect
measurements every month. If you only collect measurements yearly, you won’t be able to assess
the variability that occurs within a year. If you collect measurements more often than daily, you
may have to rollup the data to make it manageable.

Take Your Time


Time formats can be difficult to deal with. Most data analysis software offer a dozen or more
different formats for what you see. Behind the spreadsheet format, though, the database has a
number, which is the distance the time is from an arbitrary starting point, in an arbitrary unit of
time, almost always days. Convert a date-time format to a number format, and you’ll see what I
mean. The software formatting allows you to recognize values as times while the numbers allow
the software to calculate statistics. This quirk of time formatting also presents a potential for
disaster if you use more than one piece of software, which use different starting points or time
units. Always check that the formatted dates are the same between applications.

Time Will Tell


Time-series data are probably the most difficult type of data to analyze. Measurements involving
time are usually autocorrelated, so using conventional statistical procedures can produce biased
results. Besides their scale of measurement, there are several other aspects of temporal variables
that add to the confusion.
Ch-Ch-Ch-Ch-Changes—Time-series data can exhibit a variety of patterns, including
step changes, linear and nonlinear trends, and cyclic fluctuations. The effects may be
superimposed on each other within a given time period or spread over many different
time periods. For example, a change in the discharge of a river may be attributable to
abrupt and ephemeral causes such as failure of a dam or a sudden downpour (shocks),
abrupt and long-term causes such as natural changes in a drainage way or a man-made
diversion (step changes), long-term causes such as drought or changes in water
consumption (trends), repetitive changes such as seasonal cycles related to rainfall or
irrigation (cyclic fluctuations) as well as random variations. Confounded effects are often
impossible to separate, especially if the data record is short or the sampled intervals are
irregular or too large.
One Day at a Time—Time-series measurements may not all be collected at a single
instant in time. Some measurements are composites over time. For example, a flow
measurement (e.g., stream, air) may be an instantaneous discharge or a total discharge
over a selected time period. A sample may be collected at one time or be a composite of
several samples collected at discrete time intervals and combined into a single sample
container. The period over which each measurement is averaged is called the support.
Obviously, you can’t evaluate a given time interval if your support is the same or larger
than the interval.
For the Times They Are a Changing—There is a dilemma involving time-series that are
measured over many years. It goes like this. As knowledge and technology improve, the
greater the chance that there will be improvements in sampling and analysis procedures
that will reduce the overall variability of more recent measurements. That leads to
violations of one of the fundamental assumption of parametric statistical procedures,
equality of variances (also called homoscedasticity). Sometimes, you just can’t win.
In the Year 2525 … —With most types of analysis, both statistical and deterministic, data
analysts collect data over the entire range of the area of interest. If you want to analyze a
chemical reaction at 100 degrees, you might analyze the reaction at temperatures between
80 degrees and 120 degrees. You wouldn’t, however, test the reaction at 40 to 80 degrees
and extrapolate to what might happen at 100 degrees. In fact, scientists are taught never
to extrapolate outside the range of their data. With time-series data, though, you have to
extrapolate because you almost always want to know what will happen in the future. If
you wait to see what actually happens, then it’s no longer interesting because it’s the past.
And in the ultimate of ironies, you often can extrapolate time-series data because they are
… autocorrelated. So the same property that makes time-series data difficult to analyze is
what allows them to be extrapolated to future times, a process called forecasting. Mother
Nature has a wicked sense of humor.
Time Keeps on Slipping into the Future—With other types of data, even autocorrelated
spatial data, you can verify predictions whenever the need arises. With predictions for a
time-series, forecasts, you have to wait until the time in question arrives. Then you have
just one chance. You can’t go back if something goes wrong and you miss collecting the
verification data. Hence, you can’t control verification.
So those are a few points about how time is measured and analyzed. There’s much more to it
than that, but I’ll save those thoughts for another time.

Join the Stats with Cats group on Facebook.

http://statswithcats.wordpress.com/2010/08/15/time-is-on-my-side/

Vous aimerez peut-être aussi