Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Statistical Techniques for Transportation Engineering
Statistical Techniques for Transportation Engineering
Statistical Techniques for Transportation Engineering
Ebook953 pages5 hours

Statistical Techniques for Transportation Engineering

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

Statistical Techniques for Transportation Engineering is written with a systematic approach in mind and covers a full range of data analysis topics, from the introductory level (basic probability, measures of dispersion, random variable, discrete and continuous distributions) through more generally used techniques (common statistical distributions, hypothesis testing), to advanced analysis and statistical modeling techniques (regression, AnoVa, and time series). The book also provides worked out examples and solved problems for a wide variety of transportation engineering challenges.

  • Demonstrates how to effectively interpret, summarize, and report transportation data using appropriate statistical descriptors
  • Teaches how to identify and apply appropriate analysis methods for transportation data
  • Explains how to evaluate transportation proposals and schemes with statistical rigor
LanguageEnglish
Release dateMar 3, 2017
ISBN9780128116425
Statistical Techniques for Transportation Engineering
Author

Kumar Molugaram

Professor, Head of Civil Engineering and Director of Infrastructure, Osmania University, Hyderabad, Telangana State. He published over 55 research papers in various international and National journals and conferences.

Related to Statistical Techniques for Transportation Engineering

Related ebooks

Civil Engineering For You

View More

Related articles

Reviews for Statistical Techniques for Transportation Engineering

Rating: 5 out of 5 stars
5/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Statistical Techniques for Transportation Engineering - Kumar Molugaram

    Statistical Techniques for Transportation Engineering

    Kumar Molugaram

    Professor, Department of Civil Engineering, University College of Engineering, Osmania University, Hyderabad, Telangana, India

    G. Shanker Rao

    Formerly Head of the Department, Department of Mathematics, Govt. Girraj P.G College, and Asst. Professor(c), University College of Engineering(A), Osmania University, Hyderabad

    Table of Contents

    Cover image

    Title page

    Copyright

    Preface

    Chapter 1. An Overview of Statistical Applications

    Abstract

    1.1 Introduction

    1.2 Probability Functions and Statistics

    1.3 Applications of Normal Distribution

    1.4 Confidence Bounds

    1.5 Determination of Sample Size

    1.6 Random Variables Summation

    1.7 The Binomial Distributions

    1.8 The Poisson Distribution

    1.9 Testing of Hypothesis

    1.10 Summary

    Chapter 2. Preliminaries

    Abstract

    2.1 Introduction

    2.2 Basic Concepts

    2.3 Tabulation of Data

    2.4 Frequency Distribution

    2.5 Cumulative Frequency Table

    2.6 Measures of Central Tendency

    2.7 Arithmetic Mean

    2.8 Median

    2.9 Mode

    2.10 Geometric Mean

    2.11 Harmonic Mean

    2.12 Partition Values (Quartiles, Deciles, and Percentiles)

    2.13 Measures of Dispersion

    2.14 Range

    2.15 Interquartile Range

    2.16 Quartile Deviation

    2.17 Mean Deviation

    2.18 Standard Deviation

    Chapter 3. Probability

    Abstract

    3.1 Introduction

    3.2 Classical Probability

    3.3 Relative Frequency Approach of Probability

    3.4 Symbolic Notation

    3.5 Axiomatic Theory of Probability

    3.6 Independent and Dependent Events

    3.7 Conditional Probability

    3.8 Multiplication Theorem on Probability

    3.9 Baye’s Theorem

    Chapter 4. Random Variables

    Abstract

    4.1 Introduction

    4.2 Discrete Random Variable

    4.3 Probability Distribution for a Discrete Random Variable

    4.4 Mean and Variance of a Discrete Distribution

    4.5 Continuous Random Variable

    4.6 Probability Density Function

    4.7 Cumulative Distribution Function

    4.8 Mean and Variance of a Continuous Random Variable

    4.9 Joint Distributions

    4.10 Conditional Probability Distribution

    4.11 Independent Random Variables

    4.12 Joint Probability Function of Continuous Random Variables

    4.13 Joint Probability Distribution Function of Continuous Random Variables

    4.14 Marginal Distribution Function

    4.15 Conditional Probability Density Functions

    4.16 Mathematical Expectation and Moments

    4.17 Moments

    4.18 Moment Generating Function

    4.19 Properties of Moment Generating Function

    4.20 Discrete Probability Distributions

    4.21 Poisson Distribution

    4.22 Discrete Uniform Distribution

    4.23 The Negative Binomial and Geometric Distribution

    4.24 Geometric Distribution

    4.25 Continuous Probability Distributions

    4.26 Normal Distribution

    4.27 Characteristic Function

    4.28 Gamma Distribution

    4.29 Beta Distribution of First Kind

    4.30 Weibull Distribution

    Chapter 5. Curve Fitting

    Abstract

    5.1 Introduction

    5.2 The Method of Least Squares

    5.3 The Least-Squares Line

    5.4 Fitting a Parabola by the Method of Least Squares

    5.5 Fitting the exponential curve of the form y=a ebx

    Chapter 6. Correlation and Regression

    Abstract

    6.1 Introduction

    6.2 Correlation

    6.3 Coefficient of Correlation

    6.4 Methods of Finding Coefficient of Correlation

    6.5 Scatter Diagram

    6.6 Direct Method

    6.7 Spearman’s Rank Correlation Coefficient

    6.8 Calculation of r (Correlation Coefficient) (Karl Pearson’s Formula)

    6.9 Regression

    6.10 Regression Equation

    6.11 Curve of Regression

    6.12 Types of Regression

    6.13 Regression Equations (Linear Fit)

    6.14 Angle between Two Lines of Regression

    6.15 Coefficient of Determination

    6.16 Coefficient Nondetermination

    6.17 Coefficient of Alienation

    6.18 Multilinear Regression

    6.19 Uses of Regression Analysis

    Chapter 7. Sampling

    Abstract

    7.1 Introduction

    7.2 Population

    7.3 Sample

    7.4 Sampling

    7.5 Random Sampling

    7.6 Simple Random Sampling

    7.7 Stratified Sampling

    7.8 Systematic Sampling

    7.9 Sample Size Determination

    7.10 Sampling Distribution

    Chapter 8. Hypothesis Testing

    Abstract

    8.1 Introduction

    8.2 Hypothesis

    8.3 Hypothesis Testing

    8.4 Types of Hypothesis

    8.5 Computation of Test Statistic

    8.6 Level of Significance

    8.7 Critical Region

    8.8 One-Tailed Test and Two-Tailed Test

    8.9 Errors

    8.10 Procedure for Hypothesis Testing

    8.11 Important Tests of Hypothesis

    8.12 Critical Values

    8.13 Test of Significance—Large Samples

    8.14 Test of Significance for Single Proportion

    8.15 Testing of Significance for Difference of Proportions

    Chapter 9. Chi-Square Distribution

    Abstract

    9.1 Introduction

    9.2 Contingency Table

    9.3 Calculation of Expected Frequencies

    9.4 Chi-Square Distribution

    9.5 Mean and Variance of Chi-Square

    9.6 Additive Property of Independent Chi-Square Variate

    9.7 Degrees of Freedom

    9.8 Conditions for Using Chi-Square Test

    9.9 Uses of Chi-Square Test

    Chapter 10. Test of Significance—Small Samples

    Abstract

    10.1 Introduction

    10.2 Moments About Mean

    10.3 Properties of Probability Curve

    10.4 Assumptions for t-Test

    10.5 Uses of t-Distribution

    10.6 Interval Estimate of Population Mean

    10.7 Types of t-Test

    10.8 Significant Values of t

    10.9 Test of Significance of a Single Mean

    10.10 Student’s t-Test for Difference of Means

    10.11 Paired t-Test

    10.12 F-Distribution

    Chapter 11. ANOVA (Analysis of Variance)

    Abstract

    11.1 Introduction

    11.2 Assumptions

    11.3 One-Way ANOVA

    11.4 Working Rule

    Chapter 12. Analysis of Time Series

    Abstract

    12.1 Introduction

    12.2 Purpose of Time Series Study

    12.3 Editing of Data

    12.4 Components of Time Series

    12.5 Mathematical Model for a Time Series

    12.6 Methods of Measuring Trend

    Chapter 13. Index Numbers

    Abstract

    13.1 Introduction

    13.2 Definitions and Characteristics

    13.3 Types of Index Numbers

    13.4 Problems in the Construction of Index Numbers

    13.5 Method of Constructing Index Numbers

    13.6 Tests for Consistency of Index Numbers

    13.7 Quantity Index Numbers

    13.8 Consumer Price Index Number

    13.9 Chain Base Method

    13.10 Base Conversion

    13.11 Splicing

    13.12 Deflation

    Index

    Copyright

    Butterworth-Heinemann is an imprint of Elsevier

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

    50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

    Copyright © 2017 BSP Books Pvt Ltd. Published by Elsevier Inc. All rights reserved.

    Distributed in India, Pakistan, Bangladesh, and Sri Lanka by BS Publications.

    No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

    This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    British Library Cataloguing-in-Publication Data

    A catalogue record for this book is available from the British Library

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    ISBN: 978-0-12-811555-8

    For Information on all Butterworth-Heinemann publications visit our website at https://www.elsevier.com/books-and-journals

    Publisher: Joe Hayton

    Acquisition Editor: Ken McCombs

    Editorial Project Manager: Peter Jardim

    Production Project Manager: Anusha Sambamoorthy

    Designer: Miles Hitchen

    Typeset by MPS Limited, Chennai, India

    Preface

    This book Statistical Techniques for Transportation Engineering is mainly designed to meet the requirements of engineering students of various Indian Universities. It is intended to be used as a text in basic statistics for the transportation engineering applications of probability and statistics in traffic engineering problems have markedly increased during the past two decades. These techniques are helpful in the study of vehicular traffic. The present text discusses in detail the procedures that have been found useful in treatment of several types of problems encountered by the traffic engineers and it forms the basis of a first course in statistics. It lays foundation necessary to understand the basics of statistical applications in Transportation engineering and other branches of civil engineering.

    The topics covered in this book are, Measures of central tendency, Measures of dispersion, Probability, Curve fitting, Correlation and Regression, Sampling, Hypothesis testing, Tests of significance, Index numbers, and Time series. All the concepts have been well presented. The explanations are clear. Numerical examples have been worked out throughout the book to illustrate the concepts and explained the techniques of solving the problems. We earnestly hope that the comprehensive treatment of the various topics covered in the book will be welcomed by both the teachers and the students.

    Authors

    Chapter 1

    An Overview of Statistical Applications

    Abstract

    This chapter introduces an overview of statistical applications to transportation engineering. It gives an introductory idea on various aspects of statistics such as discrete and continuous functions, distributions describing randomness, and data organization. It also covers some of the aspects under common statistical estimators such as measure of central tendency, measure of dispersion. Further it gives an insight application of normal distribution, confidence bounds, sample size determinations, random variable summation, binomial distribution, Poisson distribution, testing of hypothesis, and other aspects. This chapter covers various aspects of statistical techniques most frequently employed by transportation engineers.

    Keywords

    Discrete functions; measure of central tendency; measure of dispersion; normal distribution; confidence bounds; sample size determination; central limit theorem; binomial distributions and testing of hypothesis

    1.1 Introduction

    Civil engineering is considered to be one of the oldest engineering disciplines. It deals with planning, analysis, design, construction, and maintenance of the physical and naturally built environment. The subject is grouped into several specialty areas namely, Structural Engineering, Construction Engineering and Management, Transportation Engineering, Water Resource Engineering, Surveying, Environmental Engineering, and Geotechnical Engineering. Transportation engineering involves the collection of huge amount of data for performing all types of traffic and transportation studies. The analysis is carried out based on the collected and observed data. The statistical aspects are also an important element in transportation engineering specifically traffic engineering. Statistics facilitates to resolve how much data will be obligatory, as well as what consequential inferences can confidently be finished based on that observed and collected data. Generally statistics is required whenever it is not possible to directly measure all of the values required. If the traffic engineer needs to know the average speed of all vehicles on a particular section of roadway, not all vehicles could be observed. Even if all speeds of vehicles could be measured over a specified time period, speeds of vehicles arriving before or after the study period or on a different day then the sample day would be unknown. In effect, no matter how many speeds are measured, there are always more that are not known. For all practical and statistical purposes, the number of vehicles using a particular section of roadway over time is infinite. Therefore the traffic engineering often observes and measures the characteristics of a finite sample of vehicles in a population that is effectively infinite. The mathematics of statistics is used to estimate characteristics that cannot be established with absolute certainty and to assess the degree of certainty that exists.

    1.2 Probability Functions and Statistics

    Before exploring some of the more complex statistical applications in traffic engineering, some basic principles of probability and statistics that are relevant to transportation and traffic engineering subject are reviewed.

    1.2.1 Discrete Versus Continuous Functions

    Discrete functions are made up of discrete variables, i.e., they can assume only specific whole values and not any value in between. Continuous functions, made up of continuous variables, on the other hand, can assume any value between two given values. For example, Let N=the number of cars in a family. N can equal 1, 2, 3, etc., but not 1.5, 1.6, 2.3. Therefore it is a discrete variable. Let H=the height of an individual car. H can equal 1.5, 1.75, 2.25, 2.50, and 2.75 m, etc., and therefore, is a continuous variable. Examples of discrete probability functions are the Bernoulli, binomial, and Poisson distributions, which will be discussed in the Chapter 4, Random Variables. Some examples of continuous distribution are the normal, exponential, and chi-square distributions.

    1.2.2 Distributions Describing Randomness

    Some events are very predictable, or should be predictable. If you add mass to a spring or a force to a beam, you can expect it to deflect a predictable amount. If you depress the gas pedal to a certain amount and you are on level terrain, you expect to be able to predict the speed of the vehicle. On the other hand, some events may be totally random. The emission of the next particle from a radioactive sample is said to be completely random. Some events may have very complex mechanisms and appear to be random for all practical purposes. In some cases, the underlying mechanism cannot be perceived, while in other cases we cannot afford the time or money necessary for the investigation. consider the question of who turns north and who turns south after crossing a bridge. Most of the time, we simply say there is a probability, p, that a vehicle will turn north, and we treat the outcome as a random event. However, if we studied who was driving each car and where each driver worked, we might expect to make the estimate a very predictable event, for each and every car. In fact, if we kept a record of their license plates and their past decisions, we could make very predictable estimates. The events to a large extent are not random. Obviously, it is not worth that trouble because the random assumption serves us well adequate. In fact, a number of things are modeled as random for all practical purposes, given the investment we can afford. Most of the time, these judgments are just fine and are very reasonable but, as with every engineering judgment, they can sometimes cause errors.

    1.2.3 Data Organization

    In the data collection process, the data is collected for use in traffic studies, the raw data can be looked at as individual pieces of data or grouped into classes of data for easier comprehension. Most of the data will fit into a common distribution. Some of the common distributions found in traffic engineering are the normal distribution, the exponential distribution, the chi-square distribution, the Bernoulli distribution, the binomial distribution, and the Poisson distribution. As part of the process for determining which distribution fits the data, one often summarizes the raw data into classes and creates a frequency distribution table. This makes the data more easily readable and understood. You could put the data into an array format, which means listing the data points in order from either lowest to highest or highest to lowest. This will give you some feeling for the character of the data but it is still not very helpful, particularly when you have a large number of data points.

    1.2.4 Common Statistical Estimators

    In dealing with a distribution, there are two key characteristics that are of interest. These are discussed in the following subsections.

    1.2.4.1 Measures of Central Tendency

    Measures of central tendency are measures that describe the center of data in one of several different ways. The Arithmetic mean is the average of all observed data. The true underlying mean of the population, μ, is an exact number that we do not know, but can estimate as:

    (1.1)

    =arithmetic average or mean of observed values, xi=ith individual value of statistic, N=sample size, number of values xi.

    For grouped data, the average value of all observations in a given group is considered to be the midpoint value of the group. The overall average of the entire sample may then be found as:

    (1.2)

    where fj=number of observation in group j, mj=middle value of variable in group j, N=total sample size or number of observations.

    The median is the middle value of all data when arranged in an array (ascending or descending order). The median divides a distribution in half: half of all observed values are higher than the median, half are lower than the median. For nongrouped data, it is the middle value; e.g., for the set of numbers (3, 4, 5, 5, 6, 7, 7, 7, 8), the median is 6. It is the fifth value (in ascending or descending order) in an array of 9 numbers. For grouped data, the easiest way to get the median is to read the 50 percentile point off a cumulative frequency distribution curve.

    The mode is the value that occurs most frequently that is the most common single value. For example, in nongrouped data, for the set of numbers (3, 4, 5, 5, 6, 7, 7, 7, 8) the mode is 7. For the set of numbers (3, 3, 4, 5, 5, 5, 6, 7, 8, 8, 8, 9), both 5 and 8 are modes, and the data is said to be bimodal. For grouped data, the mode is estimated as the peak of the frequency distribution curve. For a perfectly symmetrical distribution, the mean, median, and mode will be the same.

    1.2.4.2 Measures of Dispersion

    Measures of dispersion are measures that describe how far the data spread from the center. The variance and standard deviation are statistical values that describe the magnitude of variation around the mean, with the variance defined as:

    (1.3)

    where S²=variance of the data, N=sample size, number of observations. All other variables are previously defined.

    The standard deviation is the square root of the variance. It can be seen from the equation that what you are measuring is the distance of each data point from the mean. This equation can also be rewritten as:

    (1.4)

    For grouped data, the standard deviation is found from:

    (1.5)

    where all variables are as previously defined. The standard deviation (STD) may also be estimated as:

    (1.6)

    where P85=85th percentile value of the distribution (i.e., 85% of all data is at this value or less), P15=15th percentile value of the distribution (i.e., 15% of all data is at this value or less).

    The xth percentile is defined as that value below which x% of the outcomes fall. P85 is the 85th percentile, often used in traffic speed studies; it is the speed that encompasses 85% of vehicles. P50 is the 50th percentile speed or the median.

    1.3 Applications of Normal Distribution

    One of the most common statistical distribution is the normal distribution, known by its characteristic bell shaped curve. The normal distribution is a continuous distribution. Probability is indicated by the area under the probability density function f(x) between specified values, such as P (40<x<50).

    The equation for the normal distribution function is:

    (1.7)

    where x=normally distributed statistic, μ=true mean of the distribution, σ=true standard deviation of the distribution, π=3.14.

    The probability of any occurrence between values x1 and x2 is given by the area under the distribution function between the two values. The area may be found by integration between the two limits. Likewise, the mean, μ, and the variance, α², can be found through integration. The normal distribution is the most common distribution, because any process that is the sum of many parts tends to be normally distributed. Speed, travel time, and delay are all commonly described using the normal distribution. The function is completely defined by two parameters: the mean and the variance. All other values in Eq. (1.6) including π, are constants. The notation for a normal distribution is x: N[μ, α²], which means that the variable x is normally distributed with a mean of μ and a variance of σ².

    1.3.1 The Standard Normal Distribution

    For the normal distribution, the integration cannot be done in closed form due to the complexity of the equation for f(x); thus, tables for a standard normal distribution, with zero mean (μ=0) and unit variance (σ²=1), are constructed. The standard normal is denoted z: N[0,1]. Any value of x on any normal distribution, denoted x: N[μ,σ²], can be converted to an equivalent value of z on the standard normal distribution. This can also be done in reverse when needed. The translation of an arbitrary normal distribution of values of x to equivalent values of z on the standard normal distribution is accomplished as:

    (1.8)

    where z=equivalent statistic on the standard normal distribution, z: N[0,1]; x=statistic on any arbitrary normal distribution, x: N[μ,σ²] other variables as previously defined.

    1.3.2 Characteristics of the Normal Distribution Function

    The forgoing exercise allow one to compute relevant areas under the normal curve. Some numbers occur frequently in practice, and it is useful to have those in mind. For instance, what is the probability that the next observation will be within one standard deviation of the mean, given that the distribution is normal? That is, what is the probability that x is in the range (μ±1.00σ)? For example, the tentative value can find that this probability is 68.3% by adopting some standard method.

    The following ranges have frequent use in statistical analysis involving normal distributions:

    • 68.3% of the observations are within μ±1.00σ

    • 95.0% of the observations are within μ±1.96σ

    • 95.5% of the observations are within μ±2.00σ

    • 99.7% of the observations are within μ±3.00σ

    The total probability under the normal curve is 1.00, and the normal curve is symmetric around the mean. It is also useful to note that the normal distribution is asymptotic to the x. These critical characteristics will prove to be useful throughout the text.

    1.4 Confidence Bounds

    What would happen if we asked everyone in class (70 people) to collect 50 samples of speed data and to compute their own estimate of the mean. How many estimates would there be? What distribution would they have? There would be 70 estimates and the histogram of these 70 means would look normally distributed. Thus the estimate of the mean is itself a random variable that is normally distributed.

    Usually we compute only one estimate of the mean (or any other quantity), but in this class exercise we are confronted with the reality that there is a range of outcomes. We may, therefore, ask how good is our estimate of the mean. How confident are we that our estimate is correct? Consider that:

    1. The estimate of the mean quickly tends to be normally distributed.

    2. The expected value (the true mean) of this distribution is the unknown fixed mean of the original distribution.

    3. The standard deviation of this new distribution of means is the standard deviation of the original distribution divided by the square root of the number of samples, N. (This assumes independent samples and infinite population.)

    The standard deviation of this distribution of the means is called the standard error of the mean (E):

    (1.9)

    where the sample standard deviation, s, is used to estimate σ, approximates the true population, μ, as follows:

    μ±E, with 68.3% confidence

    μ±1.96 E, with 95% confidence

    μ±3.00 E, with 99.7% confidence

    The ±term (E, 1.96E, or 3.00E, depending upon the confidence level) in the above equation is also called the tolerance and is given the symbol e.

    Consider the following: 54 speeds are observed, and the mean is computed as 47.8 km/h, with a standard deviation of 7.80 km/h. What are the 95% confidence bounds?

    Thus it is said that there is a 95% chance that the true mean lies between 45.7 and 49.9 km/h. Further, while not proven here, any random variable consisting of sample means tends to be normally distributed for reasonably large n, regardless of the original distribution of individual values.

    1.5 Determination of Sample Size

    We can rewrite the equation for confidence bounds to solve for N, given that we want to achieve a specified tolerance and confidence. Resolving the 95% confidence bound equation for N gives:

    (1.10)

    where 1.96² is used only for 95% confidence. If 99.7% confidence is desired, then the 1.96² would be replaced by 3².

    Consider another example: With 99.7% and 95% confidence, estimate the true mean of the speed on a highway, ±1 mph. We know from previous work that the standard deviation is 7.2 km/h. How many samples do we need to collect?

    samples for 99.7% confidence, and

    samples for 95% confidence

    Consider further that a spot speed study is needed at a location with unknown speed characteristics. A tolerance of ±0.2 km/h and a confidence of 95% is desired. What sample size is required? Since the speed characteristics are unknown, a standard deviation of 5 km/h (a most common result in speed studies) is assumed. Then for 95% confidence, N=(1.96²×5²)/0.2²=2401 samples. This number is unreasonably high. It would be too expensive to collect such a large amount of data. Thus the choices are to either reduce the confidence or increase the tolerance. A 95% confidence level is considered the minimum that is acceptable: thus in this case, the tolerance would be increased. With a tolerance of 0.5 mph:

    Thus the increase of just 0.3 mph in tolerance resulted in a decrease of 2017 samples required. Note that the sample size required is dependent on s, which was assumed at the beginning. After the study is completed and the mean and standard deviation are computed, N should be rechecked. If N is greater (i.e., the actual s is greater than the assumed s) then more samples may need to be taken.

    Another example: An arterial is to be studied, and it is desired to estimate the mean travel time to a tolerance of ±5 seconds with 95%, confidence. Based on prior knowledge and experience, it is estimated that the standard deviation of the travel times is about 15 seconds. How many samples are required?

    Based on an application of Eq. 13.10, N=1.96²(15²)/(5²)=34.6, which is rounded to 35 samples.

    As the data is collected, the s computed is 22 seconds, not 15 seconds. If the sample size is kept at nor about ±7.3 seconds. If the confidence bounds must be kept at ±5 seconds, then the sample size must be increased so that N≥1.96²(22²)/(5²)=74.4 or 75 samples. Additional data will have to be collected to meet the desired tolerance and confidence level.

    1.6 Random Variables Summation

    One of the most common occurrences in probability and statistics is the summation of random variables, often in the form Y=a1X1+a2X2 or in the more general form:

    (1.11)

    where the summation is over i, usually from 1 to n.

    It is relatively straightforward to prove that the expected value (or mean) μY of the random variable Y is given by:

    (1.12)

    and that if the random variables xi of the random variable Y is given by:

    (1.13)

    The fact that the coefficients, ai, are multiplied has great practical significance for us in all our statistical work.

    1.6.1 The Central Limit Theorem

    One of the most impressive and useful theorems in probability is that the slim of n similarly distributed random variables tends to the normal distribution, no matter what the initial, underlying distribution is. That is, the random variable Y=ΣXi, where the Xi have the same distribution, tends to the normal distribution.

    The words tends to can be read as tends to look like the normal distribution. In mathematical terms, the actual distribution of the random variable Y approaches the normal distribution asymptotically.

    1.6.1.1 Sum of Travel Times

    Consider a trip made up of 15 components, all with the same underlying distribution, each with a mean of 10 minutes and standard deviation of 3.5 minutes. The underlying distribution is unknown. What can you say about the total travel time?

    While there might be an odd situation to contradict this, n=15 should be quite sufficient to say that the distribution of total travel times tends to look normal. From the standard equation, the mean of the distribution of total travel times is found by adding 15 terms (ai μi) where

    ai=1 and μi=10 minutes, or

    μy=15×(1×10)=150 minutes

    The variance of the distribution of total travel times is found from where ai is again 1, and σi is 3.5 minutes. Then:

    The standard deviation, σy is, therefore, 13.6 minutes.

    If the total travel times are taken to be normally distributed, 95% of all observations of total travel time will lie between the mean (150 minutes) ±1.96 standard deviations (13.6 minutes), or:

    Xy=150±1.96 (13.6)

    Thus 95% of all total travel times would be expected to fall within the range of 123–177 minutes (values rounded to the nearest minute).

    1.6.1.2 Hourly Volumes

    Five-minute counts are taken, and they tend to look rather smoothly distributed but with some skewness (asymmetry). Based on many observations, the mean tends to be 45 vehicles in the 5-minute count, with a standard deviation of seven vehicles. What can be said of the hourly volume?

    The hourly volume is the sum of 12 five-minute distributions, which should logically be basically the same if traffic levels are stable. Thus the hourly volume will tend to look normal, and will have a mean computed using Eq. (1.12), with ai=l, μi=45 vehicles (veh), and n=12, or 12×(1×45)=40 veh/h. The variance is computed using Eq. (1.13), with ai=1, σi=7, and n=12, or 12×(1²×7²)=588 (veh/h)². The standard deviation is 24.2 veh/h. Based on the assumption of normality, 95% of hourly volumes would be between 540±1.96(24.2)=540±47 veh/h (rounded to the nearest whole vehicle).

    Note that the summation has had an interesting effect. The σ/μ ratio for the 5-minute count distribution was 7/45=0.156, but for the hourly volumes it was 47/540=0.087. This is due to the summation, which tends to remove extremes by canceling highs with lows and thereby introduces stability. The mean of the sum grows in proportion n, but the standard deviation grows in proportion to the square root of n.

    1.6.1.3 Sum of Normal Distributions

    Although not proven here, it is true that the sum of any two normal distributions is itself normally distributed. By extension, if one normal is formed by n1 summations of one underlying distribution and another normal is formed by n2 summations of another underlying distribution, the sum of the total also tends to the normal.

    Thus in the foregoing travel time example, not all of the elements had to have exactly the same distribution as long as subgroupings each tended to the normal.

    1.7 The Binomial Distributions

    1.7.1 Bernoulli and the Binomial Distribution

    The Bernoulli distribution is the simplest discrete distribution, consisting or only two possible outcomes: yes or no, heads or tails, one or zero, etc. The first occurs with probability p, and therefore the second occurs with probability (1–p=q). This is modeled as:

    In traffic engineering, it represents any basic choice—to park or not to park; to take this route or that; to take auto or transit (for one individual). It is obviously more useful to look at more than one individual, however, which leads us to the binomial distribution. The binomial distribution can be thought of in two common

    Enjoying the preview?
    Page 1 of 1