Vous êtes sur la page 1sur 7

Why Is It Important To Select The Best Fitting Distribution?

Probability distributions can be viewed as a tool for dealing with uncertainty: you use distributions to perform specific calculations, and apply the results to make well-grounded business decisions. However, if you use a wrong tool, you will get wrong results. If you select and apply an inappropriate distribution (the one that doesn't fit to your data well), your subsequent calculations will be incorrect, and that will certainly result in wrong decisions. In many industries, the use of incorrect models can have serious consequences such as inability to complete tasks or projects in time leading to substantial time and money loss, wrong engineering design resulting in damage of expensive equipment etc. In some specific areas such as hydrology, using appropriate distributions can be even more critical. Distribution fitting allows you to develop valid models of random processes you deal with, protecting you from potential time and money loss which can arise due to invalid model selection, and enabling you to make better business decisions.

Which Distribution Should I Choose? (Overview)


Over the last several centuries, numerous probability distributions have been developed to address the data analysis needs in various industries, and a number of statistical methods exist to assist you in selecting the best fitting distribution. In most cases, you need to fit two or more distributions, compare the results, and select the most valid model. The "candidate" distributions you fit should be chosen depending on the nature of your probability data. For example, if you need to analyze the time between failures of technical devices, you should fit nonnegativedistributions such as Exponential or Weibull, since the failure time cannot be negative. You can also apply some other identification methods based on properties of your data. For example, you can build a histogram and determine whether the data are symmetric, left-skewed, or right-skewed, and use the distributions which have the same shape. To actually fit the "candidate" distributions you selected, you need to employ statistical methods allowing to estimate distribution parameters based on your sample data. The solution of this problem involves the use of certain algorithms implemented in specialized software. After the distributions are fitted, it is necessary to determine how well the distributions you selected fit to your data. This can be done using the specific goodness of fit tests or visually by comparing the empirical (based on sample data) and theoretical (fitted) distribution graphs. As a result, you will select the most valid model describing your data. top

How Can I Apply The Selected Distribution?


What kind of information can you obtain using the distribution you selected, and how to apply that information to make business decisions?

Calculate Probabilities
Calculating probabilities is the most common way of using distributions. Once you calculate the probability, you can use it to make informed decisions: for instance, if the probability of a good outcome is high enough, then the decision you are about to make is probably correct. Some of the typical answers you can get using probability distributions in various industries: investment: the probability of the stock price being less than $50 actuarial science: the probability of a claim size higher than $10,000 market research: the probability of a purchase between $500 and $750 customer support: the probability that a customer will be served in less than 5 minutes project management: the probability that the project will be completed in 6 months reliability engineering: the probability that the device will not fail during 5 years

Make Estimates And Projections


Making estimates or projections is an inverse problem requiring you to set a fixed, desired probability value. For example, in project management, assuming that a project should be finished in time and on budget with 95% probability, you can obtain a realistic time estimate which takes into account good/bad weather, timely supply of materials, oil prices, strikes, and other factors affecting your business. Another example: you are an engineer, and need to determine an appropriate warranty term for the device you are designing. You would like to ensure that the device will not fail during the warranty term with 99% probability (i.e. 1 out of 100 devices will fail on the average). Based on the fixed probability, you can calculate how many hours the device can work properly, and make your design decisions using this estimate.

Calculate Statistics
The most frequently used statistic is the distribution mean (the expected value) representing the average amount you can expect as the outcome if a large number of observations is considered. You can use the mean value to take a quick look at your data, however, you should not base your decisions on this statistic alone. The standard deviation indicates the spread of your data about the mean, and one of the most obvious applications of this statistic is in finance and investment where it is used to determine the volatility, as well as to quanitfy the risk associated with a given security, or a portfolio of securities. Another useful statistic is the mode value which indicates the most likely outcome. For instance, in project management, this statistic is quite frequently used to determine the most likely amount of time required for successful project completion.

Specific Applications
There are a large number of applications of probability distributions in specific industries, to name just a few: queueing systems: service time, waiting time calculation

transportation and logistics: MDBF (Mean Distance Between Failures) calculation survival analysis: survival probability, hazard rate estimation telecommunication: signal fading modeling reliability engineering: MTBF (Mean Time Between Failures), MTTF (Mean Time To Failure), failure probability, and failure rate estimation mining: concentration analysis hydrology: exceedance probability and return period calculation

The Distribution Fitting Process


Once you have selected the candidate distributions which can supposedly provide a good fit (see the article above), you are ready to actually fit these distributions to your data. The process of fitting distributions involves the use of certain statistical methods allowing to estimate the distribution parameters based on the sample data. This is where distribution fitting software can be very useful: it implements the parameter estimation methods for most commonly used distributions, so you can save your time and focus on the data analysis itself. If you are fitting several different distributions, which is usually the case, you need to estimate the parameters of each distribution separately. The input of distribution fitting software usually includes: The Your data in one of the accepted formats Distributions you want to fit Distribution fitting options distribution fitting results include the following elements: Graphs of your input data Parameters of the fitted distributions Graphs of the fitted distributions Additional graphs and tables helping you select the best fitting distribution

Manual Distribution Fitting


The process of fitting probability distributions to data is usually computationally intensive, and it is not feasible to perform this task using manual methods. However, sometimes you might already know the underlying distribution. For example, if you are analyzing the distribution of the customer service time, you might want to narrow your choice to the Exponential distribution which is quite frequently used for this kind of analysis. You would only need to estimate the distribution parameters based on the sample data, which can be easily done using distribution fitting software. In addition, you might already know not just the distribution model, but also the approximate values of the parameters of this model, based on the nature of your data. In this case, the goal of your analysis might be to verify whether your assumption regarding the probability distribution is correct.

Automated Distribution Fitting


One of the benefits of using distribution fitting software for probability data analysis is the ability to automatically fit a large number of distributions to your data in a batch. This is the preferred mode of operation if you have no or little information about the underlying probability distribution you are trying to determine. top

Selecting The Best Fitting Distribution


After the distributions are fitted, you can compare them and select the best fitting model. There are a number of statistical methods and tools available which can help you perform this task. These tools are usually implemented in distribution fitting software in the form of various graphs and tables displayed along with the estimated distribution parameters.

Distribution Graphs
The distribution graphs enable you to: Visually assess the goodness of fit of a certain distribution Compare several fitted models Some of the graphs display both your input data (e.g. the histogram) and fitted distributions at the same time: Probability Density Function Graph Cumulative Distribution Function Graph The following graphs display the fitted distributions only: P-P Plot Q-Q Plot Probability Difference Graph Each graph has its own meaning and interpretation. Typically, distribution fitting software will display these graphs for one or several fitted distributions, depending on your choice. In manual fitting mode, the graphs update automatically while you modify the distribution parameters, making the process of fitting more interactive.

Goodness of Fit Tests


As the name suggests, the goodness of fit tests can be used to determine whether a certain distribution is a good fit. Calculating the goodness of fit statistics also enables you to order the fitted distributions accordingly to how good they fit to your data. This particular feature is very helpful for comparing the fitted models. The most commonly used goodness of fit tests are Kolmogorov-Smirnov, AndersonDarling, and Chi-Squared. From the viewpoint of a user, the logic of applying these tests is the same, however, they are different in how they are performed (implemented). The Kolmogorov-Smirnov test can be considered the most widely used goodness of fit test. top

Applying The Selected Distribution


The ultimate goal of your analysis is to obtain the information which will help you make informed decisions under uncertainty. The information you need can be derived using the best fitting distribution, which is themodel of the real-world random process you are dealing with.

Typical Applications
Some of the typical applications of probability distributions include: Calculating probabilities Making estimates Calculating statistics The calculations can be done using the corresponding functions of the distribution you have selected, including the Cumulative Distribution Function (CDF), Inverse CDF, Hazard Function etc. Calculating probabilities is one of the most popular applications: in a typical data analysis, you would define a good (desired) outcome, and calculate the probability of that outcome. If the probability is high enough, then the decision that will result in the desired outcome is worth making. On the other hand, if the probability is too low, then you should make the opposite decision. For example, if you are analyzing the distribution of the customer service time, the outcomes might look like: Good outcome: A customer can be served in 5 minutes or less Bad outcome: It takes more than 5 minutes to serve a customer The corresponding decisions are: Decision A: Do not hire extra staff Decision B: Hire additional staff The probabilities can be easily calculated using the Cumulative Distribution Function (CDF) of the selected distribution, for instance, CDF(5) represents the probability of the good outcome. If this value is less than a certain fixed level (e.g. < 95%), you might consider hiring additional staff to reduce the service time and improve the customer experience. The probability of 90% would mean that 10% of your customers have to wait longer and might be unhappy with the customer service your company offers. Sometimes you might want to define more than two outcomes: The Outcome A: A customer can be served in under 5 minutes or less Outcome B: A customer can be served in 5 to 6 minutes Outcome C: It takes more than 6 minutes to serve a customer probabilities can be calculated in a similar way, and might look like:

Probability(Outcome A) = 90% Probability(Outcome B) = 7% Probability(Outcome C) = 3% In this case, your decision might be not to hire additional staff, because only 3% of your customers are served in more than 6 minutes. Making estimates is an inverse problem requiring you to specify a fixed probability value. For example, you would like to estimate how long it takes to serve 95% of the customers. To make the estimate, you can use the Inverse Cumulative Distribution Function (ICDF) of the distribution you have selected: ICDF(0.95)=5.5 minutes. The interpretation is that even though only 90% of the customers are served in under 5 minutes (see the example above), another 5% wait for 0.5 minutes (30 seconds) more, which is quite acceptable. Calculating statistics can be useful to take a quick look at your data (note that it is not correct to base your decisions on the statistics alone). The most useful statistics include: Mean (the average value) Mode (the most likely value)

For example, you might find out that a customer is most likely to be served in 2 minutes, but there are many customers which require more time, so the average service time is 3 minutes.

Specific Applications
Even though probability distributions can be applied in any industry dealing with random data, there are additional applications arising in specific industries (actuarial science, finance, reliability engineering, hydrology etc.), enabling business analysts, engineers and scientists to make informed decisions under uncertainty.

VER: http://www.mathwave.com/articles/distribution-fitting-types.html

http://www.mathwave.com/articles/distribution-fitting-goodness-of-fit.html

http://www.mathwave.com/articles/distribution-fitting-graphs.html

Vrios tpicos explicados aqui: http://www.mathwave.com/articles/distribution_fitting.html

Burr (Singh-Maddala) Distribution


EasyFit: select the best fitting distribution and use it to make better decisions. learn more

Parameters
- shape parameter ( - shape parameter ( - scale parameter ( - location parameter ( ) yields the three-parameter Burr distribution) ) )

Domain Probability Density Function (PDF)

Escolher a melhor. Comparar com a 2 e 3 melhor. Variar o parmetro de escolha da melhor: Kolmogorov Smirnov ou Anderson Darling ou ChiSquared

Explicar o que significa cada um deles etc etc

Vous aimerez peut-être aussi