• Training set • Validation set • Test set -------------------------------------------------------------------- • Confusion matrix : • true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease. • true negatives (TN): We predicted no, and they don't have the disease. • false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") • false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.") • Accuracy: Overall, how often is the classifier correct? • (TP+TN)/total • Misclassification Rate: Overall, how often is it wrong? • (FP+FN)/total • equivalent to 1 minus Accuracy • also known as "Error Rate" • True Positive Rate: When it's actually yes, how often does it predict yes? • TP/actual yes • also known as "Sensitivity" or "Recall" • False Positive Rate: When it's actually no, how often does it predict yes? • FP/actual no • True Negative Rate: When it's actually no, how often does it predict no? • TN/actual no • equivalent to 1 minus False Positive Rate • also known as "Specificity" • Precision: When it predicts yes, how often is it correct? • TP/predicted yes • Exercise : Receiver Operating Characteristics (ROC) curve Statistical Decision making • Parametric/ Non parametric • Supervised/ Unsupervised Parametric decision making: • Refers to the situation in which we know or willing to assume probability distribution function or density function for each class • Before using this function, parameters has to be estimated. • Examples???? • Poisson distribution • Normal distribution Poisson distribution Poisson distribution contd.. • The Poisson distribution can be used to calculate the probabilities of various numbers of "successes" based on the mean number of successes. • In order to apply the Poisson distribution, the various events must be independent • Eg: • Suppose you knew that the mean number of calls to a fire station on a weekday is 8. What is the probability that on a given weekday there would be 11 calls? Poisson distribution contd.. • The number of photons emitted from a X-ray source during a given time interval Normal distribution
If feature is normally distributed for each class,
• Parameter estimation has to be done Classification task:
To estimate the probabilities that a pattern belongs to various classes based
on set of features Eg: To estimate the probabilities that a patient has various diseases given some symptoms or lab tests. • From past experience, probability of occurrence of these symptoms and test results are known. • Also know the probabilities of occurrence of these diseases in the population from which the patient came This information can be mathematically processed for getting the decision Bayes theorem – Bayesian decision making • Bayesian decision making refers to choosing the most likely class, given the feature • Feature value is denoted by x • Class of interest is C • P(x) – Probability distribution for feature x in the entire population • P(C) – prior probability that a random sample is a member of class C • P(x|C)- Conditional probability of obtaining feature value x given that sample is from class C • P(C|x)????? Estimate the probability that a sample belongs to class C, given that it has a feature value x
• P(C|x) = (P(C) P(x|C))/ P(x)
Scenario: What is the probability that a person has a cold given that he or she has a fever?? • Classes • Feature • Prior probability of a person having a cold is P(C) =0.01 • Probability of having a fever, given that the person has a cold: P(x|C) =0.4 • P(x) – Probability of fever in the entire population =0.02 Probability that a person has a cold given that he or she has a fever?? P(C|x)???