Statistical Techniques for Transportation Engineering
5/5
()
About this ebook
Statistical Techniques for Transportation Engineering is written with a systematic approach in mind and covers a full range of data analysis topics, from the introductory level (basic probability, measures of dispersion, random variable, discrete and continuous distributions) through more generally used techniques (common statistical distributions, hypothesis testing), to advanced analysis and statistical modeling techniques (regression, AnoVa, and time series). The book also provides worked out examples and solved problems for a wide variety of transportation engineering challenges.
- Demonstrates how to effectively interpret, summarize, and report transportation data using appropriate statistical descriptors
- Teaches how to identify and apply appropriate analysis methods for transportation data
- Explains how to evaluate transportation proposals and schemes with statistical rigor
Kumar Molugaram
Professor, Head of Civil Engineering and Director of Infrastructure, Osmania University, Hyderabad, Telangana State. He published over 55 research papers in various international and National journals and conferences.
Related to Statistical Techniques for Transportation Engineering
Related ebooks
The Traffic Assignment Problem: Models and Methods Rating: 5 out of 5 stars5/5Traffic Flow Theory: Characteristics, Experimental Methods, and Numerical Techniques Rating: 4 out of 5 stars4/5Transportation Engineering: Theory, Practice and Modeling Rating: 5 out of 5 stars5/5Mathematical Theories of Traffic Flow Rating: 5 out of 5 stars5/5Advances in Artificial Transportation Systems and Simulation Rating: 5 out of 5 stars5/5Highway Engineering: Planning, Design, and Operations Rating: 4 out of 5 stars4/5Travel Fast or Smart?: A Manifesto for an Intelligent Transport Policy Rating: 0 out of 5 stars0 ratingsIntroduction to Network Traffic Flow Theory: Principles, Concepts, Models, and Methods Rating: 0 out of 5 stars0 ratingsRoad Traffic Modeling and Management: Using Statistical Monitoring and Deep Learning Rating: 0 out of 5 stars0 ratingsTraffic Engineer: Passbooks Study Guide Rating: 0 out of 5 stars0 ratingsHighway Engineer: Passbooks Study Guide Rating: 0 out of 5 stars0 ratingsMeasuring Road Safety with Surrogate Events Rating: 0 out of 5 stars0 ratingsGlobal Practices on Road Traffic Signal Control: Fixed-Time Control at Isolated Intersections Rating: 0 out of 5 stars0 ratingsUnbound Aggregates in Roads Rating: 0 out of 5 stars0 ratingsHandbook of Traffic Psychology Rating: 5 out of 5 stars5/5CAREC Road Safety Engineering Manual 3: Roadside Hazard Management Rating: 0 out of 5 stars0 ratingsData-Driven Traffic Engineering: Understanding of Traffic and Applications Based on Three-Phase Traffic Theory Rating: 0 out of 5 stars0 ratingsAsphalt Materials Science and Technology Rating: 4 out of 5 stars4/5Pavement Asset Management Rating: 0 out of 5 stars0 ratingsGridlock: Why We're Stuck in Traffic and What To Do About It Rating: 4 out of 5 stars4/5Modeling of Transport Demand: Analyzing, Calculating, and Forecasting Transport Demand Rating: 0 out of 5 stars0 ratingsStabilized Earth Roads Rating: 0 out of 5 stars0 ratingsTraffic engineering Standard Requirements Rating: 0 out of 5 stars0 ratingsSmart Traffic Light A Complete Guide - 2020 Edition Rating: 5 out of 5 stars5/5Safety and Intelligent Transport Systems Development in the People’s Republic of China Rating: 0 out of 5 stars0 ratingsParking Management for Smart Growth Rating: 5 out of 5 stars5/5The Christian Civil Engineer Technician Handbook Rating: 0 out of 5 stars0 ratingsComputational Intelligence for Sustainable Transportation and Mobility: Volume 1 Rating: 0 out of 5 stars0 ratingsInformed Urban Transport Systems: Classic and Emerging Mobility Methods toward Smart Cities Rating: 0 out of 5 stars0 ratings
Civil Engineering For You
3D Concrete Printing Technology: Construction and Building Applications Rating: 0 out of 5 stars0 ratingsThe Things We Make: The Unknown History of Invention from Cathedrals to Soda Cans Rating: 0 out of 5 stars0 ratingsStructures of Coastal Resilience Rating: 0 out of 5 stars0 ratingsConstruction Calculations Manual Rating: 4 out of 5 stars4/5Field Book for Quality Control in Earthwork Operations: Project Construction Management Book Rating: 0 out of 5 stars0 ratingsTroubleshooting and Repair of Diesel Engines Rating: 2 out of 5 stars2/5Detour New Mexico: Historic Destinations & Natural Wonders Rating: 0 out of 5 stars0 ratingsCivil Engineering Rating: 4 out of 5 stars4/5What Is Soil and Why is It Important?: 2nd Grade Science Workbook | Children's Earth Sciences Books Edition Rating: 0 out of 5 stars0 ratingsTwo-Stroke Engine Repair and Maintenance Rating: 0 out of 5 stars0 ratingsElon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future Rating: 4 out of 5 stars4/5Small Gas Engine Repair Rating: 4 out of 5 stars4/5MATLAB Demystified Rating: 5 out of 5 stars5/5Beyond Control: The Mississippi River’s New Channel to the Gulf of Mexico Rating: 0 out of 5 stars0 ratingsWords Whispered in Water: Why the Levees Broke in Hurricane Katrina Rating: 2 out of 5 stars2/5101 Outer Space Projects for the Evil Genius Rating: 0 out of 5 stars0 ratingsA Picture History of the Brooklyn Bridge Rating: 4 out of 5 stars4/5Rocks and Minerals of The World: Geology for Kids - Minerology and Sedimentology Rating: 5 out of 5 stars5/5Civil Engineering Materials: From Theory to Practice Rating: 0 out of 5 stars0 ratingsBuried Truths and the Hyatt Skywalks: The Legacy of America’s Epic Structural Failure Rating: 0 out of 5 stars0 ratingsSummary of Jayne Buxton's The Great Plant-Based Con Rating: 0 out of 5 stars0 ratingsMH370: Mystery Solved Rating: 5 out of 5 stars5/5Great Western: Manor Class Rating: 0 out of 5 stars0 ratingsThe End of the River Rating: 4 out of 5 stars4/5The Dangers of Automation in Airliners: Accidents Waiting to Happen Rating: 5 out of 5 stars5/5Summary of Mona Hanna-Attisha's What the Eyes Don't See Rating: 0 out of 5 stars0 ratingsStructures Failures Reasons and Mitigation Rating: 0 out of 5 stars0 ratingsSustainable Water Engineering Rating: 0 out of 5 stars0 ratingsAlong the Kirkwood Highway Rating: 0 out of 5 stars0 ratings
Reviews for Statistical Techniques for Transportation Engineering
1 rating0 reviews
Book preview
Statistical Techniques for Transportation Engineering - Kumar Molugaram
Statistical Techniques for Transportation Engineering
Kumar Molugaram
Professor, Department of Civil Engineering, University College of Engineering, Osmania University, Hyderabad, Telangana, India
G. Shanker Rao
Formerly Head of the Department, Department of Mathematics, Govt. Girraj P.G College, and Asst. Professor(c), University College of Engineering(A), Osmania University, Hyderabad
Table of Contents
Cover image
Title page
Copyright
Preface
Chapter 1. An Overview of Statistical Applications
Abstract
1.1 Introduction
1.2 Probability Functions and Statistics
1.3 Applications of Normal Distribution
1.4 Confidence Bounds
1.5 Determination of Sample Size
1.6 Random Variables Summation
1.7 The Binomial Distributions
1.8 The Poisson Distribution
1.9 Testing of Hypothesis
1.10 Summary
Chapter 2. Preliminaries
Abstract
2.1 Introduction
2.2 Basic Concepts
2.3 Tabulation of Data
2.4 Frequency Distribution
2.5 Cumulative Frequency Table
2.6 Measures of Central Tendency
2.7 Arithmetic Mean
2.8 Median
2.9 Mode
2.10 Geometric Mean
2.11 Harmonic Mean
2.12 Partition Values (Quartiles, Deciles, and Percentiles)
2.13 Measures of Dispersion
2.14 Range
2.15 Interquartile Range
2.16 Quartile Deviation
2.17 Mean Deviation
2.18 Standard Deviation
Chapter 3. Probability
Abstract
3.1 Introduction
3.2 Classical Probability
3.3 Relative Frequency Approach of Probability
3.4 Symbolic Notation
3.5 Axiomatic Theory of Probability
3.6 Independent and Dependent Events
3.7 Conditional Probability
3.8 Multiplication Theorem on Probability
3.9 Baye’s Theorem
Chapter 4. Random Variables
Abstract
4.1 Introduction
4.2 Discrete Random Variable
4.3 Probability Distribution for a Discrete Random Variable
4.4 Mean and Variance of a Discrete Distribution
4.5 Continuous Random Variable
4.6 Probability Density Function
4.7 Cumulative Distribution Function
4.8 Mean and Variance of a Continuous Random Variable
4.9 Joint Distributions
4.10 Conditional Probability Distribution
4.11 Independent Random Variables
4.12 Joint Probability Function of Continuous Random Variables
4.13 Joint Probability Distribution Function of Continuous Random Variables
4.14 Marginal Distribution Function
4.15 Conditional Probability Density Functions
4.16 Mathematical Expectation and Moments
4.17 Moments
4.18 Moment Generating Function
4.19 Properties of Moment Generating Function
4.20 Discrete Probability Distributions
4.21 Poisson Distribution
4.22 Discrete Uniform Distribution
4.23 The Negative Binomial and Geometric Distribution
4.24 Geometric Distribution
4.25 Continuous Probability Distributions
4.26 Normal Distribution
4.27 Characteristic Function
4.28 Gamma Distribution
4.29 Beta Distribution of First Kind
4.30 Weibull Distribution
Chapter 5. Curve Fitting
Abstract
5.1 Introduction
5.2 The Method of Least Squares
5.3 The Least-Squares Line
5.4 Fitting a Parabola by the Method of Least Squares
5.5 Fitting the exponential curve of the form y=a ebx
Chapter 6. Correlation and Regression
Abstract
6.1 Introduction
6.2 Correlation
6.3 Coefficient of Correlation
6.4 Methods of Finding Coefficient of Correlation
6.5 Scatter Diagram
6.6 Direct Method
6.7 Spearman’s Rank Correlation Coefficient
6.8 Calculation of r (Correlation Coefficient) (Karl Pearson’s Formula)
6.9 Regression
6.10 Regression Equation
6.11 Curve of Regression
6.12 Types of Regression
6.13 Regression Equations (Linear Fit)
6.14 Angle between Two Lines of Regression
6.15 Coefficient of Determination
6.16 Coefficient Nondetermination
6.17 Coefficient of Alienation
6.18 Multilinear Regression
6.19 Uses of Regression Analysis
Chapter 7. Sampling
Abstract
7.1 Introduction
7.2 Population
7.3 Sample
7.4 Sampling
7.5 Random Sampling
7.6 Simple Random Sampling
7.7 Stratified Sampling
7.8 Systematic Sampling
7.9 Sample Size Determination
7.10 Sampling Distribution
Chapter 8. Hypothesis Testing
Abstract
8.1 Introduction
8.2 Hypothesis
8.3 Hypothesis Testing
8.4 Types of Hypothesis
8.5 Computation of Test Statistic
8.6 Level of Significance
8.7 Critical Region
8.8 One-Tailed Test and Two-Tailed Test
8.9 Errors
8.10 Procedure for Hypothesis Testing
8.11 Important Tests of Hypothesis
8.12 Critical Values
8.13 Test of Significance—Large Samples
8.14 Test of Significance for Single Proportion
8.15 Testing of Significance for Difference of Proportions
Chapter 9. Chi-Square Distribution
Abstract
9.1 Introduction
9.2 Contingency Table
9.3 Calculation of Expected Frequencies
9.4 Chi-Square Distribution
9.5 Mean and Variance of Chi-Square
9.6 Additive Property of Independent Chi-Square Variate
9.7 Degrees of Freedom
9.8 Conditions for Using Chi-Square Test
9.9 Uses of Chi-Square Test
Chapter 10. Test of Significance—Small Samples
Abstract
10.1 Introduction
10.2 Moments About Mean
10.3 Properties of Probability Curve
10.4 Assumptions for t-Test
10.5 Uses of t-Distribution
10.6 Interval Estimate of Population Mean
10.7 Types of t-Test
10.8 Significant Values of t
10.9 Test of Significance of a Single Mean
10.10 Student’s t-Test for Difference of Means
10.11 Paired t-Test
10.12 F-Distribution
Chapter 11. ANOVA (Analysis of Variance)
Abstract
11.1 Introduction
11.2 Assumptions
11.3 One-Way ANOVA
11.4 Working Rule
Chapter 12. Analysis of Time Series
Abstract
12.1 Introduction
12.2 Purpose of Time Series Study
12.3 Editing of Data
12.4 Components of Time Series
12.5 Mathematical Model for a Time Series
12.6 Methods of Measuring Trend
Chapter 13. Index Numbers
Abstract
13.1 Introduction
13.2 Definitions and Characteristics
13.3 Types of Index Numbers
13.4 Problems in the Construction of Index Numbers
13.5 Method of Constructing Index Numbers
13.6 Tests for Consistency of Index Numbers
13.7 Quantity Index Numbers
13.8 Consumer Price Index Number
13.9 Chain Base Method
13.10 Base Conversion
13.11 Splicing
13.12 Deflation
Index
Copyright
Butterworth-Heinemann is an imprint of Elsevier
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2017 BSP Books Pvt Ltd. Published by Elsevier Inc. All rights reserved.
Distributed in India, Pakistan, Bangladesh, and Sri Lanka by BS Publications.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
ISBN: 978-0-12-811555-8
For Information on all Butterworth-Heinemann publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Joe Hayton
Acquisition Editor: Ken McCombs
Editorial Project Manager: Peter Jardim
Production Project Manager: Anusha Sambamoorthy
Designer: Miles Hitchen
Typeset by MPS Limited, Chennai, India
Preface
This book Statistical Techniques for Transportation Engineering
is mainly designed to meet the requirements of engineering students of various Indian Universities. It is intended to be used as a text in basic statistics for the transportation engineering applications of probability and statistics in traffic engineering problems have markedly increased during the past two decades. These techniques are helpful in the study of vehicular traffic. The present text discusses in detail the procedures that have been found useful in treatment of several types of problems encountered by the traffic engineers and it forms the basis of a first course in statistics. It lays foundation necessary to understand the basics of statistical applications in Transportation engineering and other branches of civil engineering.
The topics covered in this book are, Measures of central tendency, Measures of dispersion, Probability, Curve fitting, Correlation and Regression, Sampling, Hypothesis testing, Tests of significance, Index numbers, and Time series. All the concepts have been well presented. The explanations are clear. Numerical examples have been worked out throughout the book to illustrate the concepts and explained the techniques of solving the problems. We earnestly hope that the comprehensive treatment of the various topics covered in the book will be welcomed by both the teachers and the students.
Authors
Chapter 1
An Overview of Statistical Applications
Abstract
This chapter introduces an overview of statistical applications to transportation engineering. It gives an introductory idea on various aspects of statistics such as discrete and continuous functions, distributions describing randomness, and data organization. It also covers some of the aspects under common statistical estimators such as measure of central tendency, measure of dispersion. Further it gives an insight application of normal distribution, confidence bounds, sample size determinations, random variable summation, binomial distribution, Poisson distribution, testing of hypothesis, and other aspects. This chapter covers various aspects of statistical techniques most frequently employed by transportation engineers.
Keywords
Discrete functions; measure of central tendency; measure of dispersion; normal distribution; confidence bounds; sample size determination; central limit theorem; binomial distributions and testing of hypothesis
1.1 Introduction
Civil engineering is considered to be one of the oldest engineering disciplines. It deals with planning, analysis, design, construction, and maintenance of the physical and naturally built environment. The subject is grouped into several specialty areas namely, Structural Engineering, Construction Engineering and Management, Transportation Engineering, Water Resource Engineering, Surveying, Environmental Engineering, and Geotechnical Engineering. Transportation engineering involves the collection of huge amount of data for performing all types of traffic and transportation studies. The analysis is carried out based on the collected and observed data. The statistical aspects are also an important element in transportation engineering specifically traffic engineering. Statistics facilitates to resolve how much data will be obligatory, as well as what consequential inferences can confidently be finished based on that observed and collected data. Generally statistics is required whenever it is not possible to directly measure all of the values required. If the traffic engineer needs to know the average speed of all vehicles on a particular section of roadway, not all vehicles could be observed. Even if all speeds of vehicles could be measured over a specified time period, speeds of vehicles arriving before or after the study period or on a different day then the sample day would be unknown. In effect, no matter how many speeds are measured, there are always more that are not known. For all practical and statistical purposes, the number of vehicles using a particular section of roadway over time is infinite. Therefore the traffic engineering often observes and measures the characteristics of a finite sample of vehicles in a population that is effectively infinite. The mathematics of statistics is used to estimate characteristics that cannot be established with absolute certainty and to assess the degree of certainty that exists.
1.2 Probability Functions and Statistics
Before exploring some of the more complex statistical applications in traffic engineering, some basic principles of probability and statistics that are relevant to transportation and traffic engineering subject are reviewed.
1.2.1 Discrete Versus Continuous Functions
Discrete functions are made up of discrete variables, i.e., they can assume only specific whole values and not any value in between. Continuous functions, made up of continuous variables, on the other hand, can assume any value between two given values. For example, Let N=the number of cars in a family. N can equal 1, 2, 3, etc., but not 1.5, 1.6, 2.3. Therefore it is a discrete variable. Let H=the height of an individual car. H can equal 1.5, 1.75, 2.25, 2.50, and 2.75 m, etc., and therefore, is a continuous variable. Examples of discrete probability functions are the Bernoulli, binomial, and Poisson distributions, which will be discussed in the Chapter 4, Random Variables. Some examples of continuous distribution are the normal, exponential, and chi-square distributions.
1.2.2 Distributions Describing Randomness
Some events are very predictable, or should be predictable. If you add mass to a spring or a force to a beam, you can expect it to deflect a predictable amount. If you depress the gas pedal to a certain amount and you are on level terrain, you expect to be able to predict the speed of the vehicle. On the other hand, some events may be totally random. The emission of the next particle from a radioactive sample is said to be completely random. Some events may have very complex mechanisms and appear to be random for all practical purposes. In some cases, the underlying mechanism cannot be perceived, while in other cases we cannot afford the time or money necessary for the investigation. consider the question of who turns north and who turns south after crossing a bridge. Most of the time, we simply say there is a probability, p, that a vehicle will turn north, and we treat the outcome as a random event. However, if we studied who was driving each car and where each driver worked, we might expect to make the estimate a very predictable event, for each and every car. In fact, if we kept a record of their license plates and their past decisions, we could make very predictable estimates. The events to a large extent are not random. Obviously, it is not worth that trouble because the random assumption serves us well adequate. In fact, a number of things are modeled as random for all practical purposes, given the investment we can afford. Most of the time, these judgments are just fine and are very reasonable but, as with every engineering judgment, they can sometimes cause errors.
1.2.3 Data Organization
In the data collection process, the data is collected for use in traffic studies, the raw data can be looked at as individual pieces of data or grouped into classes of data for easier comprehension. Most of the data will fit into a common distribution. Some of the common distributions found in traffic engineering are the normal distribution, the exponential distribution, the chi-square distribution, the Bernoulli distribution, the binomial distribution, and the Poisson distribution. As part of the process for determining which distribution fits the data, one often summarizes the raw data into classes and creates a frequency distribution table. This makes the data more easily readable and understood. You could put the data into an array format, which means listing the data points in order from either lowest to highest or highest to lowest. This will give you some feeling for the character of the data but it is still not very helpful, particularly when you have a large number of data points.
1.2.4 Common Statistical Estimators
In dealing with a distribution, there are two key characteristics that are of interest. These are discussed in the following subsections.
1.2.4.1 Measures of Central Tendency
Measures of central tendency are measures that describe the center of data in one of several different ways. The Arithmetic mean is the average of all observed data. The true underlying mean of the population, μ, is an exact number that we do not know, but can estimate as:
(1.1)
=arithmetic average or mean of observed values, xi=ith individual value of statistic, N=sample size, number of values xi.
For grouped data, the average value of all observations in a given group is considered to be the midpoint value of the group. The overall average of the entire sample may then be found as:
(1.2)
where fj=number of observation in group j, mj=middle value of variable in group j, N=total sample size or number of observations.
The median is the middle value of all data when arranged in an array (ascending or descending order). The median divides a distribution in half: half of all observed values are higher than the median, half are lower than the median. For nongrouped data, it is the middle value; e.g., for the set of numbers (3, 4, 5, 5, 6, 7, 7, 7, 8), the median is 6. It is the fifth value (in ascending or descending order) in an array of 9 numbers. For grouped data, the easiest way to get the median is to read the 50 percentile point off a cumulative frequency distribution curve.
The mode is the value that occurs most frequently that is the most common single value. For example, in nongrouped data, for the set of numbers (3, 4, 5, 5, 6, 7, 7, 7, 8) the mode is 7. For the set of numbers (3, 3, 4, 5, 5, 5, 6, 7, 8, 8, 8, 9), both 5 and 8 are modes, and the data is said to be bimodal. For grouped data, the mode is estimated as the peak of the frequency distribution curve. For a perfectly symmetrical distribution, the mean, median, and mode will be the same.
1.2.4.2 Measures of Dispersion
Measures of dispersion are measures that describe how far the data spread from the center. The variance and standard deviation are statistical values that describe the magnitude of variation around the mean, with the variance defined as:
(1.3)
where S²=variance of the data, N=sample size, number of observations. All other variables are previously defined.
The standard deviation is the square root of the variance. It can be seen from the equation that what you are measuring is the distance of each data point from the mean. This equation can also be rewritten as:
(1.4)
For grouped data, the standard deviation is found from:
(1.5)
where all variables are as previously defined. The standard deviation (STD) may also be estimated as:
(1.6)
where P85=85th percentile value of the distribution (i.e., 85% of all data is at this value or less), P15=15th percentile value of the distribution (i.e., 15% of all data is at this value or less).
The xth percentile is defined as that value below which x% of the outcomes fall. P85 is the 85th percentile, often used in traffic speed studies; it is the speed that encompasses 85% of vehicles. P50 is the 50th percentile speed or the median.
1.3 Applications of Normal Distribution
One of the most common statistical distribution is the normal distribution, known by its characteristic bell shaped curve. The normal distribution is a continuous distribution. Probability is indicated by the area under the probability density function f(x) between specified values, such as P (40<x<50).
The equation for the normal distribution function is:
(1.7)
where x=normally distributed statistic, μ=true mean of the distribution, σ=true standard deviation of the distribution, π=3.14.
The probability of any occurrence between values x1 and x2 is given by the area under the distribution function between the two values. The area may be found by integration between the two limits. Likewise, the mean, μ, and the variance, α², can be found through integration. The normal distribution is the most common distribution, because any process that is the sum of many parts tends to be normally distributed. Speed, travel time, and delay are all commonly described using the normal distribution. The function is completely defined by two parameters: the mean and the variance. All other values in Eq. (1.6) including π, are constants. The notation for a normal distribution is x: N[μ, α²], which means that the variable x is normally distributed with a mean of μ and a variance of σ².
1.3.1 The Standard Normal Distribution
For the normal distribution, the integration cannot be done in closed form due to the complexity of the equation for f(x); thus, tables for a standard normal
distribution, with zero mean (μ=0) and unit variance (σ²=1), are constructed. The standard normal is denoted z: N[0,1]. Any value of x on any normal distribution, denoted x: N[μ,σ²], can be converted to an equivalent value of z on the standard normal distribution. This can also be done in reverse when needed. The translation of an arbitrary normal distribution of values of x to equivalent values of z on the standard normal distribution is accomplished as:
(1.8)
where z=equivalent statistic on the standard normal distribution, z: N[0,1]; x=statistic on any arbitrary normal distribution, x: N[μ,σ²] other variables as previously defined.
1.3.2 Characteristics of the Normal Distribution Function
The forgoing exercise allow one to compute relevant areas under the normal curve. Some numbers occur frequently in practice, and it is useful to have those in mind. For instance, what is the probability that the next observation will be within one standard deviation of the mean, given that the distribution is normal? That is, what is the probability that x is in the range (μ±1.00σ)? For example, the tentative value can find that this probability is 68.3% by adopting some standard method.
The following ranges have frequent use in statistical analysis involving normal distributions:
• 68.3% of the observations are within μ±1.00σ
• 95.0% of the observations are within μ±1.96σ
• 95.5% of the observations are within μ±2.00σ
• 99.7% of the observations are within μ±3.00σ
The total probability under the normal curve is 1.00, and the normal curve is symmetric around the mean. It is also useful to note that the normal distribution is asymptotic to the x. These critical characteristics will prove to be useful throughout the text.
1.4 Confidence Bounds
What would happen if we asked everyone in class (70 people) to collect 50 samples of speed data and to compute their own estimate of the mean. How many estimates would there be? What distribution would they have? There would be 70 estimates and the histogram of these 70 means would look normally distributed. Thus the estimate of the mean
is itself a random variable that is normally distributed.
Usually we compute only one estimate of the mean (or any other quantity), but in this class exercise we are confronted with the reality that there is a range of outcomes. We may, therefore, ask how good is our estimate of the mean. How confident are we that our estimate is correct? Consider that:
1. The estimate of the mean quickly tends to be normally distributed.
2. The expected value (the true mean) of this distribution is the unknown fixed mean of the original distribution.
3. The standard deviation of this new distribution of means is the standard deviation of the original distribution divided by the square root of the number of samples, N. (This assumes independent samples and infinite population.)
The standard deviation of this distribution of the means is called the standard error of the mean (E):
(1.9)
where the sample standard deviation, s, is used to estimate σ, approximates the true population, μ, as follows:
μ±E, with 68.3% confidence
μ±1.96 E, with 95% confidence
μ±3.00 E, with 99.7% confidence
The ±term (E, 1.96E, or 3.00E, depending upon the confidence level) in the above equation is also called the tolerance and is given the symbol e.
Consider the following: 54 speeds are observed, and the mean is computed as 47.8 km/h, with a standard deviation of 7.80 km/h. What are the 95% confidence bounds?
Thus it is said that there is a 95% chance that the true mean lies between 45.7 and 49.9 km/h. Further, while not proven here, any random variable consisting of sample means tends to be normally distributed for reasonably large n, regardless of the original distribution of individual values.
1.5 Determination of Sample Size
We can rewrite the equation for confidence bounds to solve for N, given that we want to achieve a specified tolerance and confidence. Resolving the 95% confidence bound equation for N gives:
(1.10)
where 1.96² is used only for 95% confidence. If 99.7% confidence is desired, then the 1.96² would be replaced by 3².
Consider another example: With 99.7% and 95% confidence, estimate the true mean of the speed on a highway, ±1 mph. We know from previous work that the standard deviation is 7.2 km/h. How many samples do we need to collect?
samples for 99.7% confidence, and
samples for 95% confidence
Consider further that a spot speed study is needed at a location with unknown speed characteristics. A tolerance of ±0.2 km/h and a confidence of 95% is desired. What sample size is required? Since the speed characteristics are unknown, a standard deviation of 5 km/h (a most common result in speed studies) is assumed. Then for 95% confidence, N=(1.96²×5²)/0.2²=2401 samples. This number is unreasonably high. It would be too expensive to collect such a large amount of data. Thus the choices are to either reduce the confidence or increase the tolerance. A 95% confidence level is considered the minimum that is acceptable: thus in this case, the tolerance would be increased. With a tolerance of 0.5 mph:
Thus the increase of just 0.3 mph in tolerance resulted in a decrease of 2017 samples required. Note that the sample size required is dependent on s, which was assumed at the beginning. After the study is completed and the mean and standard deviation are computed, N should be rechecked. If N is greater (i.e., the actual s is greater than the assumed s) then more samples may need to be taken.
Another example: An arterial is to be studied, and it is desired to estimate the mean travel time to a tolerance of ±5 seconds with 95%, confidence. Based on prior knowledge and experience, it is estimated that the standard deviation of the travel times is about 15 seconds. How many samples are required?
Based on an application of Eq. 13.10, N=1.96²(15²)/(5²)=34.6, which is rounded to 35 samples.
As the data is collected, the s computed is 22 seconds, not 15 seconds. If the sample size is kept at nor about ±7.3 seconds. If the confidence bounds must be kept at ±5 seconds, then the sample size must be increased so that N≥1.96²(22²)/(5²)=74.4 or 75 samples. Additional data will have to be collected to meet the desired tolerance and confidence level.
1.6 Random Variables Summation
One of the most common occurrences in probability and statistics is the summation of random variables, often in the form Y=a1X1+a2X2 or in the more general form:
(1.11)
where the summation is over i, usually from 1 to n.
It is relatively straightforward to prove that the expected value (or mean) μY of the random variable Y is given by:
(1.12)
and that if the random variables xi of the random variable Y is given by:
(1.13)
The fact that the coefficients, ai, are multiplied has great practical significance for us in all our statistical work.
1.6.1 The Central Limit Theorem
One of the most impressive and useful theorems in probability is that the slim of n similarly distributed random variables tends to the normal distribution, no matter what the initial, underlying distribution is. That is, the random variable Y=ΣXi, where the Xi have the same distribution, tends to the normal distribution.
The words tends to
can be read as tends to look like
the normal distribution. In mathematical terms, the actual distribution of the random variable Y approaches the normal distribution asymptotically.
1.6.1.1 Sum of Travel Times
Consider a trip made up of 15 components, all with the same underlying distribution, each with a mean of 10 minutes and standard deviation of 3.5 minutes. The underlying distribution is unknown. What can you say about the total travel time?
While there might be an odd situation to contradict this, n=15 should be quite sufficient to say that the distribution of total travel times tends to look normal. From the standard equation, the mean of the distribution of total travel times is found by adding 15 terms (ai μi) where
ai=1 and μi=10 minutes, or
μy=15×(1×10)=150 minutes
The variance of the distribution of total travel times is found from where ai is again 1, and σi is 3.5 minutes. Then:
The standard deviation, σy is, therefore, 13.6 minutes.
If the total travel times are taken to be normally distributed, 95% of all observations of total travel time will lie between the mean (150 minutes) ±1.96 standard deviations (13.6 minutes), or:
Xy=150±1.96 (13.6)
Thus 95% of all total travel times would be expected to fall within the range of 123–177 minutes (values rounded to the nearest minute).
1.6.1.2 Hourly Volumes
Five-minute counts are taken, and they tend to look rather smoothly distributed but with some skewness (asymmetry). Based on many observations, the mean tends to be 45 vehicles in the 5-minute count, with a standard deviation of seven vehicles. What can be said of the hourly volume?
The hourly volume is the sum of 12 five-minute distributions, which should logically be basically the same if traffic levels are stable. Thus the hourly volume will tend to look normal, and will have a mean computed using Eq. (1.12), with ai=l, μi=45 vehicles (veh), and n=12, or 12×(1×45)=40 veh/h. The variance is computed using Eq. (1.13), with ai=1, σi=7, and n=12, or 12×(1²×7²)=588 (veh/h)². The standard deviation is 24.2 veh/h. Based on the assumption of normality, 95% of hourly volumes would be between 540±1.96(24.2)=540±47 veh/h (rounded to the nearest whole vehicle).
Note that the summation has had an interesting effect. The σ/μ ratio for the 5-minute count distribution was 7/45=0.156, but for the hourly volumes it was 47/540=0.087. This is due to the summation, which tends to remove extremes by canceling highs
with lows
and thereby introduces stability. The mean of the sum grows in proportion n, but the standard deviation grows in proportion to the square root of n.
1.6.1.3 Sum of Normal Distributions
Although not proven here, it is true that the sum of any two normal distributions is itself normally distributed. By extension, if one normal is formed by n1 summations of one underlying distribution and another normal is formed by n2 summations of another underlying distribution, the sum of the total also tends to the normal.
Thus in the foregoing travel time example, not all of the elements had to have exactly the same distribution as long as subgroupings each tended to the normal.
1.7 The Binomial Distributions
1.7.1 Bernoulli and the Binomial Distribution
The Bernoulli distribution is the simplest discrete distribution, consisting or only two possible outcomes: yes or no, heads or tails, one or zero, etc. The first occurs with probability p, and therefore the second occurs with probability (1–p=q). This is modeled as:
In traffic engineering, it represents any basic choice—to park or not to park; to take this route or that; to take auto or transit (for one individual). It is obviously more useful to look at more than one individual, however, which leads us to the binomial distribution. The binomial distribution can be thought of in two common