Vous êtes sur la page 1sur 13

# Introduction to Statistical Thinking for Decision Making This site builds up the basic ideas of business statistics systematically

and correctly. It is a combination of lectures and computer-based practice, joining theory firmly with practice. It introduces techniques for summarizing and presenting data, estimation, confidence intervals and hypothesis testing. The presentation focuses more on understanding of key concepts and statistical thinking, and less on formulas and calculations, which can now be done on small computers through user-friendly Statistical JavaScript A, etc. A Spanish version of this site is available at Razonamiento Estadstico para la Toma de Decisiones Gerenciales and its collection of JavaScript. Today's good decisions are driven by data. In all aspects of our lives, and importantly in the business context, an amazing diversity of data is available for inspection and analytical insight. Business managers and professionals are increasingly required to justify decisions on the basis of data. They need statistical model-based decision support systems. Statistical skills enable them to intelligently collect, analyze and interpret data relevant to their decision-making. Statistical concepts and statistical thinking enable them to:

solve problems in a diversity of contexts. add substance to decisions. reduce guesswork. This Web site is a course in statistics appreciation; i.e., acquiring a feel for the statistical way of thinking. It hopes to make sound statistical thinking understandable in business terms. An introductory course in statistics, it is designed to provide you with the basic concepts and methods of statistical analysis for processes and products. Materials in this Web site are tailored to help you make better decisions and to get you thinking statistically. A cardinal objective for this Web site is to embed statistical thinking into managers, who must often decide with little information. In competitive environment, business managers must design quality into products, and into the processes of making the products. They must facilitate a process of never-ending improvement at all stages of manufacturing and service. This is a strategy that employs statistical methods, particularly statistically designed experiments, and produces processes that provide high yield and products that seldom fail. Moreover, it facilitates development of robust products that are insensitive to changes in the environment and internal component variation. Carefully planned statistical studies remove hindrances to high quality and productivity at every stage of production. This saves time and money. It is well recognized that quality must be engineered into products as early as possible in the design process. One must know how to use carefully planned, cost-effective statistical experiments to improve, optimize and make robust products and processes. Business Statistics is a science assisting you to make business decisions under uncertainties based on some numerical and measurable scales. Decision making processes must be based on data, not on personal opinion nor on belief. The Devil is in the Deviations: Variation is inevitable in life! Every process, every measurement, every sample has variation. Managers need to understand variation for two key reasons. First, so that they can lead others to apply statistical thinking in day-to-day activities and secondly, to apply the concept for the purpose of continuous improvement. This course will provide you with hands-on experience to promote the use of statistical thinking and techniques to apply them to make educated decisions, whenever you encounter variation in business data. You will learn techniques to intelligently assess and manage the risks inherent in decision-making. Therefore, remember that: Just like weather, if you cannot control something, you should learn how to measure and analyze it, in order to predict it, effectively.

computer-assisted learning: The computer-assisted learning provides you a"hands-on" experience which will enhance your understanding of the concepts and techniques covered in this site. Java, once an esoteric programming language for animating Web pages, is now a full-fledged platform for building JavaScript E-labs' learning objects with useful applications. As you used to do experiments in physics labs to learn physics, computer-assisted learning enables you to use any online interactive tool available on the Internet to perform experiments. The purpose is the same; i.e., to understand statistical concepts by using statistical applets which are entertaining and educating. The appearance of computer software, JavaScript, Statistical Demonstration Applets, and Online Computation are the most important events in the process of teaching and learning concepts in model-based, statistical decision making courses. These e-lab Technologies allow you to construct numerical examples to understand the concepts, and to find their significance for yourself. Unfortunately, most classroom courses are not learning systems. The way the instructors attempt to help their students acquire skills and knowledge has absolutely nothing to do with the way students actually learn. Many instructors rely on lectures and tests, and memorization. All too often, they rely on"telling." No one remembers much that's taught by telling, and what's told doesn't translate into usable skills. Certainly, we learn by doing, failing, and practicing until we do it right. The computer assisted learning serves this purpose. A course in appreciation of statistical thinking gives business professionals an edge. Professionals with strong quantitative skills are in demand. This phenomenon will grow as the impetus for data-based decisions strengthens and the amount and availability of data increases. The statistical toolkit can be developed and enhanced at all stages of a career. Decision making process under uncertainty is largely based on application of statistics for probability assessment of uncontrollable events (or factors), as well as risk assessment of your decision. For the foundation of decision making visit Operations/Operational Research site. For more statistical-based Web sites with decision making applications, visit Decision Science Resources, and Modeling and Simulation Resources sites. The main objective for this course is to learn statistical thinking; to emphasize more on concepts, and less theory and fewer recipes, and finally to foster active learning using the useful and interesting Web-sites. It is already a known fact that"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." So, let's be ahead of our time. Further Readings:
Chernoff H., A Conversation With Herman Chernoff, Statistical Science, Vol. 11, No. 4, 335-350, 1996. Churchman C., The Design of Inquiring Systems, Basic Books, New York, 1971. Early in the book he stated that knowledge could be considered as a collection of information, or as an activity, or as a potential. He also noted that knowledge resides in the user and not in the collection. Rustagi M., et al. (eds.), Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His Sixtieth Birthday, Academic Press, 1983.

The Birth of Probability and Statistics The original idea of"statistics" was the collection of information about and for the"state". The word statistics derives directly, not from any classical Greek or Latin roots, but from the Italian word for state. The birth of statistics occurred in mid-17th century. A commoner, named John Graunt, who was a native of London, began reviewing a weekly church publication issued by the local parish clerk that listed the number of births, christenings, and deaths in each parish. These so called Bills of Mortality also listed the causes of death. Graunt who was a shopkeeper organized this data in the form we call descriptive statistics, which was published as Natural and Political Observations Made upon the Bills of Mortality. Shortly thereafter he was elected

as a member of Royal Society. Thus, statistics has to borrow some concepts from sociology, such as the concept of Population. It has been argued that since statistics usually involves the study of human behavior, it cannot claim the precision of the physical sciences. Probability has much longer history. Probability is derived from the verb to probe meaning to"find out" what is not too easily accessible or understandable. The word"proof" has the same origin that provides necessary details to understand what is claimed to be true. Probability originated from the study of games of chance and gambling during the 16th century. Probability theory was a branch of mathematics studied by Blaise Pascal and Pierre de Fermat in the seventeenth century. Currently in 21st century, probabilistic modeling is used to control the flow of traffic through a highway system, a telephone interchange, or a computer processor; find the genetic makeup of individuals or populations; quality control; insurance; investment; and other sectors of business and industry. New and ever growing diverse fields of human activities are using statistics; however, it seems that this field itself remains obscure to the public. Professor Bradley Efron expressed this fact nicely: During the 20th Century statistical thinking and methodology have become the scientific framework for literally dozens of fields including education, agriculture, economics, biology, and medicine, and with increasing influence recently on the hard sciences such as astronomy, geology, and physics. In other words, we have grown from a small obscure field into a big obscure field.

Statistical Modeling for Decision-Making under Uncertainties: From Data to the Instrumental Knowledge In this diverse world of ours, no two things are exactly the same. A statistician is interested in both the differences and the similarities; i.e., both departures and patterns. The actuarial tables published by insurance companies reflect their statistical analysis of the average life expectancy of men and women at any given age. From these numbers, the insurance companies then calculate the appropriate premiums for a particular individual to purchase a given amount of insurance. Exploratory analysis of data makes use of numerical and graphical techniques to study patterns and departures from patterns. The widely used descriptive statistical techniques are: Frequency Distribution; Histograms; Boxplot; Scattergrams and Error Bar plots; and diagnostic plots. In examining distribution of data, you should be able to detect important characteristics, such as shape, location, variability, and unusual values. From careful observations of patterns in data, you can generate conjectures about relationships among variables. The notion of how one variable may be associated with another permeates almost all of statistics, from simple comparisons of proportions through linear regression. The difference between association and causation must accompany this conceptual development. Data must be collected according to a well-developed plan if valid information on a conjecture is to be obtained. The plan must identify important variables related to the conjecture, and specify how they are to be measured. From the data collection plan, a statistical model can be formulated from which inferences can be drawn.

As an example of statistical modeling with managerial implications, such as "what-if" analysis, consider regression analysis. Regression analysis is a powerful technique for studying relationship between dependent variables (i.e., output, performance measure) and independent variables (i.e., inputs, factors, decision variables). Summarizing relationships among the variables by the most appropriate equation (i.e., modeling) allows us to predict or identify the most influential factors and study their impacts on the output for any changes in their current values. Frequently, for example the marketing managers are faced with the question, What Sample Size Do I Need? This is an important and common statistical decision, which should be given due consideration, since an inadequate sample size invariably leads to wasted resources. The sample size determination section provides a practical solution to this risky decision. Statistical models are currently used in various fields of business and science. However, the terminology differs from field to field. For example, the fitting of models to data, called calibration, history matching, and data assimilation, are all synonymous with parameter estimation. Your organization database contains a wealth of information, yet the decision technology group members tap a fraction of it. Employees waste time scouring multiple sources for a database. The decision-makers are frustrated because they cannot get business-critical data exactly when they need it. Therefore, too many decisions are based on guesswork, not facts. Many opportunities are also missed, if they are even noticed at all. Knowledge is what we know well. Information is the communication of knowledge. In every knowledge exchange, there is a sender and a receiver. The sender make common what is private, does the informing, the communicating. Information can be classified as explicit and tacit forms. The explicit information can be explained in structured form, while tacit information is inconsistent and fuzzy to explain. Know that data are only crude information and not knowledge by themselves. Data is known to be crude information and not knowledge by itself. The sequence from data to knowledge is: from Data to Information, from Information to Facts, and finally, from Facts to Knowledge. Data becomes information, when it becomes relevant to your decision problem. Information becomes fact, when the data can support it. Facts are what the data reveals. However the decisive instrumental (i.e., applied) knowledge is expressed together with some statistical degree of confidence. Fact becomes knowledge, when it is used in the successful completion of a decision process. Once you have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same sense that mankind with writing is superhuman compared to mankind before writing. The following figure illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.

Click on the image to enlarge it and THEN print it. The Path from Statistical Data to Managerial Knowledge

Statistical Decision-Making Process Unlike the deterministic decision-making process, such as linear optimization by solving systems of equations, Parametric systems of equations and in decision making under pure uncertainty, the variables are often more numerous and more difficult to measure and control. However, the steps are the same. They are: 1. 2. 3. 4. Simplification Building a decision model Testing the model Using the model to find the solution:

It is a simplified representation of the actual situation It need not be complete or exact in all respects It concentrates on the most essential relationships and ignores the less essential ones. o It is more easily understood than the empirical (i.e., observed) situation, and hence permits the problem to be solved more readily with minimum time and effort. 5. It can be used again and again for similar problems or can be modified. Fortunately the probabilistic and statistical methods for analysis and decision making under uncertainty are more numerous and powerful today than ever before. The computer makes possible many practical applications. A few examples of business applications are the following:

o o o

An auditor can use random sampling techniques to audit the accounts receivable for clients. A plant manager can use statistical quality control techniques to assure the quality of his production with a minimum of testing or inspection. A financial analyst may use regression and correlation to help understand the relationship of a financial ratio to a set of other variables in business. A market researcher may use test of significace to accept or reject the hypotheses about a group of buyers to which the firm wishes to sell a particular product. A sales manager may use statistical techniques to forecast sales for the coming year.

Questions Concerning Statistical the Decision-Making Process: 1. Objectives or Hypotheses: What are the objectives of the study or the questions to be answered? What is the population to which the investigators intend to refer their findings? 2. Statistical Design: Is the study a planned experiment (i.e., primary data), or an analysis of records ( i.e., secondary data)? How is the sample to be selected? Are there possible sources of selection, which would make the sample atypical or non-representative? If so, what provision is to be made to deal with this bias? What is the nature of the control group, standard of comparison, or cost? Remember that statistical modeling means reflections before actions. 3. Observations: Are there clear definition of variables, including classifications, measurements (and/or counting), and the outcomes? Is the method of classification or of measurement consistent for all the subjects and relevant to Item No. 1.? Are there possible biased in measurement (and/or counting) and, if so, what provisions must be made to deal with them? Are the observations reliable and replicable (to defend your finding)? 4. Analysis: Are the data sufficient and worthy of statistical analysis? If so, are the necessary conditions of the methods of statistical analysis appropriate to the source and nature of the data? The analysis must be correctly performed and interpreted. 5. Conclusions: Which conclusions are justifiable by the findings? Which are not? Are the conclusions relevant to the questions posed in Item No. 1? 6. Representation of Findings: The finding must be represented clearly, objectively, in sufficient but non-technical terms and detail to enable the decision-maker (e.g., a manager) to understand and judge them for himself? Is the finding internally consistent; i.e., do the numbers added up properly? Can the different representation be reconciled? 7. Managerial Summary: When your findings and recommendation(s) are not clearly put, or framed in an appropriate manner understandable by the decision maker, then the decision maker does not feel convinced of the findings and therefore will not implement any of the recommendations. You have wasted the time, money, etc. for nothing. What is Business Statistics? The main objective of Business Statistics is to make inferences (e.g., prediction, making decisions) about certain characteristics of a population based on information contained in a random sample from the entire population. The condition for randomness is essential to make sure the sample is representative of the population.

Business Statistics is the science of good' decision making in the face of uncertainty and is used in many disciplines, such as financial analysis, econometrics, auditing, production and operations, and marketing research. It provides knowledge and skills to interpret and use statistical techniques in a variety of business applications. A typical Business Statistics course is intended for business majors, and covers statistical study, descriptive statistics (collection, description, analysis, and summary of data), probability, and the binomial and normal distributions, test of hypotheses and confidence intervals, linear regression, and correlation. Statistics is a science of making decisions with respect to the characteristics of a group of persons or objects on the basis of numerical information obtained from a randomly selected sample of the group. Statisticians refer to this numerical observation as realization of a random sample. However, notice that one cannot see a random sample. A random sample is only a sample of a finite outcomes of a random process. At the planning stage of a statistical investigation, the question of sample size (n) is critical. For example, sample size for sampling from a finite population of size N, is set at: N+1, rounded up to the nearest integer. Clearly, a larger sample provides more relevant information, and as a result a more accurate estimation and better statistical judgement regarding test of hypotheses. Under-lit Streets and the Crimes Rate: It is a fact that if residential city streets are under-lit then major crimes take place therein. Suppose you are working in the Mayers office and put you in charge of helping him/her in deciding which manufacturers to buy the light bulbs from in order to reduce the crime rate by at least a certain amount, given that there is a limited budget?

Click on the image to enlarge it and THEN print it. Activities Associated with the General Statistical Thinking and Its Applications The above figure illustrates the idea of statistical inference from a random sample about the population. It also provides estimation for the population's parameters; namely the expected value x, the standard deviation, and the cumulative distribution function (cdf) Fx, and their corresponding sample statistics, mean , sample standard deviation Sx, and empirical (i.e., observed) cumulative distribution function (cdf), respectively. The major task of Statistics is the scientific methodology for collecting, analyzing, interpreting a random sample in order to draw inference about some particular characteristic of a specific Homogenous Population. For two major reasons, it is often impossible to study an entire population: The process would be too expensive or too time-consuming. The process would be destructive. In either case, we would resort to looking at a sample chosen from the population and trying to infer information about the entire population by only examining the smaller sample. Very often the numbers, which interest us most about the population, are the mean and standard deviation , any number -- like the mean or standard deviation -- which is calculated from an entire population, is called a Parameter. If the very same numbers are derived only from the data of a sample, then the resulting numbers are called Statistics. Frequently, Greek letters represent parameters and Latin letters represent statistics (as shown in the above Figure). The uncertainties in extending and generalizing sampling results to the population are measures and expressed by probabilistic statements called Inferential