srmist

© All Rights Reserved

8 vues

srmist

© All Rights Reserved

- Hypothesis Testing
- We Understand the World by Asking Questions and Searching for Answers
- Finding+New+Business+Opportunity+at+Dabur
- Developing Protocol
- Dp Ls 722 Quantitative Data Analysis
- Statistics Introduction.docx
- Usama Brm Report
- Project on Labour Welfare in Sri Aravind Enterprises
- SIJMD4OCT2014
- Ct 31415419
- MIR - Science for Everyone - Khurgin Ya. I. - Yes, No or Maybe - 1985
- Bản gốc bài 4
- Sample Survey Design
- Acceptance sampling
- 1461254_634767525787421250.ppt
- 1Front Page
- Statistics Summary 675
- ToR for a Vitamin a Coverage Survey in Selected Provinces in Indonesia
- analisis de negocios
- L7. Sampling

Vous êtes sur la page 1sur 114

Overall objective:

The student is able to apply the basic concepts of statistics and principles of

scientific enquiry in planning and evaluating the results of dental practice and

participate in and conduct descriptive exploratory and survey students in dental

and evaluate apply results of research studies in health, dental medicine and

related fields in the practice of dental.

Behavioral objective:The student is able to

Design a study, identifying a population and methods of selection of the

sample required

Present data in appropriate tables, graphs and diagrams

Calculate averages, variation, linear correlation and regression.

Calculate the confidence intervals and simple tests of significance using

normal, t, F, 2 distributions.

Compute commonly used vital and health statistical and estimate

population using arithmetic progression methods.

Construct instruments for eliciting data through questioning observation

and measurement methods and techniques.

Quantify, analyze describe and interpret data.

Critique dental studies.

Select and write a clear statement of a research able problem.

Search and analyze the literature for facts and theory relating to the

problem.

Identify and state relevant assumption methods of selection of the sample

required.

further research

Prepare and write a scientific report of the study.

Methods of Teaching: Lectures and discussion with power point presentations

Seminars and practical with power point presentations

Methods of evaluation:

Regular attendance, Seminars, written test and dissertation

Suggested practical:

Each student will select and present critique of dental study.

Survey and asses selected studies in dental with particular reference

to the research process presentation of individually selected problems at each step

of the research process and are independent for evaluation and discussion.

QUESTION PATTERN

Time: 1 hour

Short Answer

52

10 Marks

Short Note

5X6

30 Marks

Internal Assessment

10 Marks

UNIT

I

DESCRIPTION

1.1 Introduction and overview of Biostatistics

1.2 Scope of Biostatistics

1.3 Biostatistics in Dentistry

1.4 Applying study result to patient care

II

(Central tendency, dispersion, plotting)

2.2 Correlation and regression

III

3.2 Statistical inference with mean, proportion and normal deviate

3.3 Sampling distributions (t, F, 2)

IV

4.2 Non-Parametric tests

a). Sign test

b). Wilcoxon Signed Rank tests

c). Mann Whitney U test

d). Wald Wolfwitch Run test

e). Krushkal Wallis test

5.2 Principle and various methods of research process

5.3 Utilization of research, the result section has a research report &

conclusions

5.4 The Checklist for the reading literature

STATISTICS

Different authors give different definition for statistics from time to time.

But, a definition must aim at laid down the meaning; scope and definition of

subject. Statistics is used in two senses Viz, singular and plural.

In the singular sense it denotes numerical facts whereas; in the

plural sense it denotes statistical methods.

Among them, two authors C. E. Croxton and D. J. Cowdon give the

precious definition for statistics, and Prof. Horce Secrist gives the best

definition.

According to C.E.Croxton and D.J.Cowden,

A branch of mathematics that deals with collection, Classification, analysis

and interpretation of numerical data is called as statistics.

From this definition, the main divisions of statistics are,

i.

Collection of Data,

ii.

Classification of data,

iii.

Analysis of Data,

iv.

Statistics is a field of study concerned with

(1) The collection , organization, summarization, and analysis of

data, and

(2) The drawing of inferences about a body of data when only a

part of the data is observed.

Simply put, we may say that data are numbers, numbers contain

information, and the purpose of statistics is to investigate and evaluate the

nature and meaning of this information.

Statistics is the science of compiling, classifying, and tabulating numerical

data and expressing the results in a mathematical or graphical form.

of causes, numerically expressed, enumerated or estimated according to

reasonable standard of accuracy, collected in a systematic manner for a

predetermined purpose and placed in relation to each other is called

statistics.

This definition gives the characteristics of the statistics. The characteristics

of statistics are,

It is aggregate of facts.

It is affected to a marked extent by multiplicity of causes.

It is numerically expressed.

It should be enumerated or estimated.

It should be collected in a systematic manner for a predetermined

purpose

It should be collected with reasonable standard of accuracy.

It should be placed in relation to each other.

BIOSTATISTICS

The tools of statistics are employed in many fields-business, education,

psychology, agriculture, and economics, to mentioned only few. When the

data analyzed are derived from the biological sciences and medicine, we

use the term biostatistics to distinguish this particular application of

statistical tools and concepts.

Biostatistics is that branch of statistics concerned with mathematical facts

and data relating to the biological events. Medical statistics is a further

specialty of Biostatistics, when the mathematical facts and data are related

to health, preventive medicine and disease.

The essential features of statistics are evident from various definitions of

statistics:

a) Principles and methods for the data collection of presentation,

analysis and interpretation of numerical data of different kinds

i. Observational data. Quantitative data

ii. Data that have been obtained by a repetitive operation

iii. Data affected to a marked degree of a multiplicity of

causes

b) The science and art of dealing with variation in such a way as

to obtain reliable results.

c) Controlled objective methods whereby group trends are

abstracted from observations on many separate individuals.

d) The science of experimentation which may be regarded as

mathematics applied to experimental data.

The objective of dental science is primarily to improve the oral health of

an individual and hence relevant knowledge has to be obtained by

observation of groups of individuals. The treatment of a patient with best

course of action depends on the overall oral hygiene or health status.

Fundamental processes involved in the organization of oral health

care services are:

Acquisition of information i.e., monitoring data, from independent

study and systematic enquiry (scientific research)

Dissemination of information e.g., by teaching, demonstrating,

writing, publishing.

related services such as environmental control (e.g., fluoride

adjustment,

regulation

of

harmful

substances,

etc)

and

Judgment or evaluation by the application of proportional ethics,

laws, regulation, policies, guidelines, criteria and standards.

Administration i.e., the management of personnel, facilities,

materials, funds and other resources to facilitate four processes

outlined above.

Uses of Biostatistics:

1) To define normalcy

2) To test whether the difference between two populations, regarding a

particular attribute is a real or a chance occurrence.

3) To study the correlation or association between two or more

attributes in the same population.

4) To evaluate the efficacy of vaccines, sera etc. by control studies.

5) To locate, define and measure the extent of morbidity and mortality

in the community.

6) To evaluate the achievements of public health programs.

7) To fix priority in public health programs.

Uses of Biostatistics in dental science:

1) To assess the state of oral health in the community and to determine

the availability and utilization of dental care facilities.

2) To indicate the basic factors underlying the state of oral health by

diagnosing the community and solutions to such problems

3) To determine success or failure specific oral health programs or to

evaluate the program action.

4) To promote health legislation and in creating administrative

standards for oral health.

research:

To maintain the patient record

To maintain the patient previous treatment and next or further

treatment procedure

Long time process of record to helpful to seen previous treatment

procedure also helps in the current treatment idea.

Suppose new drug launch in the market, biostatistics analysis gives

idea this drug is more effective than other drugs.

Statistical analysis to gives idea about which drug commonly or

averagely used for particular treatment or all treatments

To estimate number of patients visiting in future (weekly, monthly or

yearly)

To know which age of people or male/female have more dental

problems

Dental problems vary by area, culture, habits, or water also by the

village, district, city, countries.

A dental complaint varies for age, sex, area, culture, habits, etc.

To compare two or more of treatment, drug, or surgeons, or time

taken for same complaint, which is better? Or all are no difference.

Any one of the drugs may be used for a treatment, whether this two

effect are same or not same

Compare and estimate for treatment time, cure level, etc.

Compare and estimate students intelligence

To estimate a person when will get a dental complaint or when will

cure of a treatment taken patient

very poor / poor / average / good / very good knowledges

Patient record also given patient family history of fast, present and

future.

To do the basic calculation: total number of patients visiting, average

number of patient visit by age, sex, treatment, complaint, finished,

and undergoing, etc.

It gives, enough or we want to improve for patient details

Before treatment and after treatments, it is significant.

Number of patient visits varies department wise, if varies why? To

analyze and find out the inference.

Applications of Biostatistics in patient care / applying study results

in patient care:

A patient record gives overview and idea of the patient treatment

and further steps

Suitable treatment or method to apply the patient

To know the maximum, minimum, and average value of any of the

patients character.

The character varying patient-patient or else, if vary what are the

reason

Previous analyses give what disease attacks for which type of

population (age, sex, area, culture, etc.). These analyses is much

helpful to give the instruction to prevent or take care of the disease,

Suppose more number of drugs available in the market, then we

select suitable drug for satisfying patient co-operate, cost, time or

any one of satisfaction or all.

treatment cures level is very low then advice to medical research for

develops the treatment, here we use statistical analysis, whether

newly developed treatment is effective?

To estimate patient cure time, next visit, number of visits for

particular treatment, etc.

To estimate the number of patient in future

A statistical analysis inference; particular disease gives major

problem or most affect the regular life. In the situation, taking further

steps to prevent or control these diseases.

Why need Statistics?

The objectives of this paper are twofold:

(1) To teach the student to organize and summarize data and

(2) To teach the student how to reach decisions about large body of

data by examining only small part of the data.

The concepts and methods necessary for achieving the first objective are

presented under the heading of descriptive statistics and the second

objective is reached through the study of what is called inferential

statistics.

Need of quantifying the data: As per the definition of STATISTICS (i.e., A

branch of mathematics that deals with collection, Classification, analysis

and interpretation of numerical data) it mainly deals with numerical data.

Hence, whenever we have the numerical data then only statistics can be

applied. But in many situations researcher cant get numerical data. (i.e., it

will be of mixture of numerical and qualitative characteristics)

essential to quantify the qualitative information into quantitative by giving

ranks or scale values.

While conducting an oral health examination, the investigator makes

observations according to his judgment of the situation. This depends on

his skill, knowledge, experience and temperament.

Grading of plaque scores or malocclusion or the quality of diet of an

individual are situations, which are influenced by the particular investigator

who makes the observations. If the same observer repeats the observation

on the same case after some time lapse, he may or may not agree with his

previous assessment. Similarly, if more than one investigator observes the

same individual, all of them may not agree in their assessment. The

variability in measurement can be handled using statistics.

Epidemiology and biostatistics are sister sciences or disciplines. The

former collects facts relating to groups of population in place, times and

situations, while the later converts all facts into figures and at the end

translates them into facts, interpreting the significance of their results.

Facts are qualitative in nature and do not admit several kinds of statistical

treatment and hence have to be converts into figures for statistical

analysis.

Both the science of epidemiology and biostatistics deal with factsfigures-facts, which is termed as quantitative methodology.

In

community

dentistry,

the

approach

is

primarily

through

intensive studies, by collecting facts, which are quantitative and later,

expressed into figures, which are quantitative.

Example:

have good oral hygiene or otherwise, the circumstances when it takes

place and also the age at which various upsets take place, whether it is

equally distributed among the sexes, which group is at risk of developing

diseases leading to mortality., which areas of town-rural or urban are more

or less affected by the diseases. As most of these events are counted, they

are the foundations of dentistry. And because these numbers come in with

variation between people or from place to place or from time to time,

statistics finds its role in dentistry.

Data:

The raw material of statistics is data. For our purposes we may define data

as numbers. The two kinds of numbers that we use in statistics are

numbers that the result from the taking in the usual sense of the term of

a measurement, and those that result from the process of counting;

Example: When a nurse weighing a patient or takes a patients

temperature, a measurement consisting of a number such as 150 pounds

or 100 degrees Fahrenheit, is obtained.

Quite a different type of number is obtained when a hospital administrator

counts the number of patients-perhaps 20-discharged from the hospital on

a given day. Each of the three numbers is a datum, and three taken

together are data.

Variable

If, as we observe a characteristic, we find that it takes on different values in

different persons, places, or things, we label the characteristic a variable.

Example: Diastolic blood pressure, heart rate, heights of adult males

Random variable

results is frequently referred to as a value of the respective variable. When

the values obtained arise as a result of chance factors, so that they cannot

be exactly predicted in advance, the variable is called a random variable.

Example: Adult height-when a child is born, we cannot predict exactly his

or her height at maturity. Attained adult height is the result of numerous

genetic and environmental factors. Values resulting from measurement

procedures are often referred to as observations or measurements.

Population

The average people thinks of a population as a collection of entities,

usually people. A population or collection of entities may, however, consist

of animals, machines, places, or cells.

For our purposes, we define a population of entities as the largest

collection of entities for which we have an interest at a particular time. If we

take a measurement of some variable on each of the entities in a

population, we generate a population of values of that variable. We may,

therefore, define a population of values as the largest collection of values

of a random variable for which we have an interest at a particular time.

Populations may be finite or infinite. If a population of values consists of a

fixed number of these values, the population is said be finite. If, on the

other hand, population consists of an endless succession of values, the

population is an infinite one.

Example: We are interested in the weights of all the children enrolled in a

certain country elementary school system; our population consists of all

these weights. If our interest lies only in the weights of first grade students

in the system, we have different population-weights of first grade students

enrolled in the school system. Hence populations are determine or defined

by our sphere of interest.

Sample

A sample may be defined simply as a part of a population. Suppose our

population consists of the weights of all the elementary school children

enrolled in a certain country school system. If we collect for analysis the

weights of only a fraction of these children, we have only a part of our

population of weights, that is, we have a sample.

TYPES OF VARIABLE

(1). Quantitative variable

A quantitative variable is one that can be measured in the usual sense.

Measurements made on quantitative variables convey information

regarding amount.

Example: Weights of preschool children, age of the patients.

(2). Qualitative variable

Some characteristics are not capable of being measured in the sense that

height, weight, and age are measured. Many characteristics can be

characterized only.

Example: When an ill person is given a medical diagnosis

Object is said to posses or not posses some characteristic of interest. In

such cases measuring consist of categorizing.

(3). Discrete random variable

Variables may be characterized further as to whether they are discrete or

continuous.

A discrete random variable is characterized by gaps or interruptions in the

values that it can assume. These gaps or interruptions indicate the

absence of values between particular values that the variable can assume.

Example: The number of daily admissions to a general hospital is a

discrete random variable since the number of admissions each day must

admissions on a given day cannot be number such as 1.5, 2.432, and

3.9009.

The number of decayed, missing, or filled teeth per child in an elementary

school is another example of discrete random variable.

(4). Continuous random variable

A continuous random variable does not posses the gaps or interruptions

characteristic of a discrete random variable. A continuous random variable

can assume any value within a specified relevant interval of values

assumed by the variable.

Example: Height, weight, age, water fluoride of individual

It is necessary to express the data measurements clearly, either in units or

as categories. Each level of measurement form scales of measurements

which are defined by the degree of accuracy and sophistication of the

measuring device.

Measurement: This may be defined as the assignment of numbers to

objects or events according to a set of rules. The various measurement

scales result from the fact that measurement may be carried out under

different set of rules.

Commonly following scales are used

i). Nominal scale: (By name, label, and tag)

The lowest measurement scale is the nominal scale. As the name implies it

consist of naming observation or classifying them into various mutually

exclusive and collectively exhaustive categories.

Example: includes such dichotomies,

Outcome of cancer Dead, alive

Whenever observation are not only different from category to category but

can be ranked according to some criterion, they are said to be measured

on an ordinal scale.

Example: OHI score Poor, Fair, Good

Students intelligence Above average, Average, Below Average

iii). Interval Scale: (Number between characters)

The interval scale is more sophisticated scale than the nominal and ordinal

scale in that with this scale it is not only possible to order measurements,

but also the distance between any two measurements is known. The

interval scale unlike the nominal and ordinal scale is a truly quantitative

scale. We know say, that the difference between measurements of 20 and

a measurement of 30 is equal to the difference between measurements of

30 and 40. The ability to do this implies the use of a unit distance and a

zero point, both of which are arbitrary.

Example: Age of the patient, BP, Water fluoride level.

iv). Ratio scale: (Relative Magnitude)

The highest level of measurement is the ratio scale. This scale is

characterized by the fact that equality of ratios as well as equality of

intervals may be determined. Fundamental to the ratio scale is a true zero

point.

Example: Gingival bleeding per 1000 people, Height by weight

RELIABILITY OF DATA

If the agency has used proper methods to collect the data, the statistics

may be relied upon.

Reliability indicates the consistent result in repeated observation. Many

determine reliability of data. Major factors are:

Inherent variation like unused reagents used after a lapse of long time.

Zero marked in weighing machine is not obtained, etc.

Observers

variation

like

the

same

person

doing

repeated

recording, etc.

Variable fluctuations like reply by respondents according to their

capability of understanding questions and replying.

Inter-observer variations like many people, many instruments at

recording.

VALIDITY OF DATA

Data obtained by measurement should measure what it is supposed to

measure. Concept of validity relies upon the specific situations at data

collection.

Example:

Infertility of no issues is not valid

Fever in non malaria area is not valid

observation correctly identified by a test. Specificity is true negative

observation correctly identified by a test.

Notation for test validation of measurements of data

True picture (e.g.

Disease)

Total

Test Result

(e.g.

Screening

Test)

(a+b)

(c+d)

(a+c)

(b+d)

(a+b+c+d)

Total

Sensitivity

a

(a c)

Sensitivity = Number of Positive value of test result and true picture / total

number of Positive value of true picture

Specificity

d

(b d )

total number of Negative value of true picture

Positive predictive value

a

( a b)

true picture / total number of Positive value of test result

Negative predictive value

d

(c d )

true picture / total number of Negative value of test result

SOURCES OF DATA

The performance of statistical activities is motivated by the need to answer

a question. For example, clinicians may want answers to questions

regarding the relative merits of competing treatment procedures.

Administrators may want answers to questions regarding such areas of

concern as employee morale or facility utilization. When we determine that

the appropriate approach to seeking an answer to a question will require

the use of statistics, we begin to search for suitable data to serve as the

raw material for our investigation.

Before the data collection, type of data should be decided. That is,

primary data or secondary data. The choice of data depend on,

Nature and scope of study,

Availability of finance, time factors,

The degree of accuracy needed,

Nature of investigation (individual or government study).

Generally most of the survey primary data is preferable.

The main sources of data are

1). Routinely kept records

2). Surveys

3). Experiments

a). Primary source

It is difficult to imagine any type of organization that does not keep records

of day-to-day transaction of its activities. OP medical records, for example,

patient habits while OP sheet contain a patient habits on the facilities of

business activities. When the need for data arises, we should look for them

first among routinely kept records.

2. Surveys

If the data needed to answer a question are not available from routinely

kept records, the logical source may be a survey. Suppose, for example,

that the administrator of a clinic wishes to obtain information regarding the

mode of transportation used by patients to visit the clinic. If admission

conduct a survey among patients to obtain this information.

3. Experiments

Frequently the data needed to answer a question are available only as the

result of an experiment. A nurse may wish to know which of several

strategies is best for maximizing patient compliance. The nurse might

conduct an experiment in which the different strategies of motivating

compliance are tried with different patients. Subsequent evaluation of the

responses to the different strategies might enable the nurse to decide

which is most effective.

The first hand information that is collected for the first time by the

investigator for the purpose of his study is called primary data.

This is first hand information.

This data is original in character.

The primary data collection methods: To collect the primary data five

methods are commonly used. They are,

1. Direct personal investigation

4. Questionnaire method

6. Enumeration method

personally meets the informants and collects the information by asking

them questions. The person form that the information is collected is called

must be keen observer and tactful and courteous in behavior.

Suitability:

This method can be employed, when

High accuracy is needed.

The coverage area is small.

The confidential data is needed.

The intensive study is needed. And

Sufficient time is available.

Merits:

Original (first hand) data is collected.

The collected data are highly reliable.

The high degree of accuracy can be achieved.

Due to personal approach response will be more.

Correct information can be extracted from the informant.

Cross-examination is possible.

Miss interpretation on the informant part can be avoided.

Demerits:

This method is not advisable when coverage area is large and

time, finance factor are low.

Possibility of bias is more.

Untrained investigator cannot bring good result.

It is expensive and time consuming.

(2). Oral health examination:

When information is needed on the oral diseases, this method provides

more valid information than health interview. It is conducted by dentists,

technicians, and the trained investigator. This method cannot be

has to consider the treatment to people suffering from certain diseases.

(3). Indirect oral investigation:

If the informant is unwilling (reluctant) to provide information, this

method can be used. But in this method the investigator dont meet the

actual informant. Alternatively, the investigator meets the witnesses or third

parties or friends who are in touch with the informant. Investigator

interviews the people who are directly or indirectly connected to the

informant and collect the information.

For example:

smoking habit the informant wont provide information. Even, they wont

response the study. On such situations the investigator has to approach

friends, neighbors, etc., of the actual informant to collect the information.

Usually police department adopts this method.

Example: Police department, riots, alliance, etc.,

Merits:

It is simple and convenient method.

It is suitable when the investigation area is large.

It saves time, money and labor factors.

The information is unbiased.

Adequate information can be collected.

Demerits:

The result is based on third parties prejudice.

To get adequate information much number of persons may be

interviewed.

Interview with an improper man will spoil the result.

Bad information will spoil the result.

In this method, a separate questionnaire consisting of a list of

questions for the enquiry is prepared. There are two ways collect

information through this method,

(1). Mailed questionnaire

This questionnaire is sent to the informants requesting them to do extend

their co-operation by fill-upping the questionnaire and correct replay of the

questionnaire. To get the quick and better response, the postal expense is

borne by the investigator. After receiving the sent questionnaires back

analysis is carried out. The research workers of state and central

governments adopt this method.

(ii). Direct questionnaire method

The investigator directly meets the informants and collects the information

by asking questions, from questionnaire.

Suitability:

This method is advisable, if,

The coverage area is wide.

There is a legal compulsion to supply information, so that nonresponse risk is eliminated.

Merits:

This method is most and economical comparing with other

methods.

This method of data collection covers wide area and reduces

money, time and labor

Bias is less since the data is collected directly from the

respondents.

Demerits:

There is no direct contact between the investigator and

respondent.

The accuracy and reliability are less.

This method is suitable among literate people only.

There is the possibility of delay in receiving questionnaire.

The people may furnish wrong information.

Asking supplementary questions is not possible.

Framing questionnaire:

In

this

mailed

questionnaire

method,

questionnaire

is

the

the success of investigation is based on the questionnaire. So the

questionnaire must be designed with adequate skill, efficiency and

experience.

Characteristics of Good questionnaire:

Number of questions should be minimum

Questions should be short and simple to understand.

Questions should be arranged in logical order.

Questions may have multiple-choice answers.

Personal questions are to be avoided.

The questions that require calculations are to be avoided.

Questions of sensitive and personal type should be avoided.

The wordings of questionnaire shouldnt hurt the feelings of

respondents.

Questionnaire information must be given.

Questionnaire should look attractive.

The process of refining the validation of questionnaire by collecting

information from the related respondents in small number with the framed

questionnaire in the view of overcoming the shortcomings of questionnaire

is called as pre test. If any shortcoming is found in the questionnaire, it will

be incorporated in the questionnaire. After the required changes are

incorporated, pilot study is employed.

Pilot study: Whenever the investigator has to deal with large survey, he

should not plunge directly. After the pre-test is over, to overcome the

shortcomings of the analysis pilot study is carried out. This is a small-scale

survey with a small number of persons. The collected data through the pilot

study is analyzed. If any technical difficulty in the analysis is found then the

questionnaire will be altered. The main survey is taken if the pilot study

doesnt reveal any analytical difficulties. (See Figure 1.)

Questionnaire

Pre-test

Is Error?

Yes

No

Pilot survey

Is

analytical

difficulty?

No

Main Survey

Yes

Figure 1.

In this method instead of collecting the information by the

researcher, local agents are appointed to collect the information. They

collect the information from the informant and the collected data is sent to

the actual researcher or investigator. The data collection is done according

to local correspondents taste. Newspaper agencies, magazines, etc. adopt

this method.

Suitability:

If the data is required regularly from the wide area, this method can

be used.

Merits:

Extensive information is collected.

This is most cheep economical method.

Information will be collected regularly.

Demerits:

Information may be biased.

Degree of accuracy cant be maintained.

Data may be of duplicate nature.

(6). Enumerator method:

In this method, a number of enumerators are selected and trained to

collect the data. They are provided the questionnaires and trained to fill up

the questionnaire. They meet the informant along with the questionnaire

and collect the data by filling up the questionnaire. The enumerator

explains the object, purpose of the study to the informant.

Merits:

Intensive information is collected.

This method yields reliable and accurate results.

This method is helpful even if the informants are illiterate,

because the investigator is going to record the information.

Due to personal contact, the non-response is less.

Demerits

This method leads to more money and time

Personal bias of enumerator leads to wrong conclusion.

The second hand information that is, collected from the already

existing sources for the study is called as secondary data. That is, the

researcher gets the required information from the information that is

already collected by some one for his purpose. The sources of secondary

data are,

Published sources:

The data that is published by the various governments, local and

international agencies are published data.

International publications:

IMF, IBRD, ICAFE and LINO etc., publish the data regular time

intervals.

Central and state governments:

Department of union and state government regularly publish the

data. The other organizations are, RBI-Bulletin; census of India; Indian

trade journal etc,

Semi-official publications:

corporation etc, publish the statistical data.

Research institutions publication:

The research institutions such as Indian statistical institution (ISI); Indian

agricultural statistics research institute (IASRI) etc., publish the data.

Journals and newspapers:

Some journals like Indian finance, commence etc, publish the

current and important material on statistics and socio-economic problems.

Unpublished sources:

There are various unpublished data sources. Various government

and private office maintain them. These are the data carried out by the

researchers in universities or research institutions.

The secondary data is not a reliable one and the data taken in olden

days will be inadequate. So before using the secondary data in the

analysis, some precautions must be taken.

The precaution steps are,

Suitability of data:

The available data should be suitable for his study. This

characteristic is to be examined by the investigator himself. The data

should be coherent with scope of the present analysis.

Adequacy of data:

After the suitability is tested, the data must be adequate for the

study. That is adequate data must be extracted from the source to carry

out analysis.

Reliability of data:

If the agency has used proper methods to collect the data, the statistics

may be relied upon.

COLLECTION OF DATA

The first and foremost step of the research process is data collection.

Before the statistical investigation, the researcher has to know the nature,

objective and scope of investigation, time and type of investigation and the

desired degree of study.

The two types of investigation are

Census/complete enumeration method.

Sampling method.

Census Method

A data collection method that investigates or collects information each and

every unit of the population is called as census method. That is, in this

method the data is collected from all the population units. For e.g., To study

the average height of the students of a particular college then the

investigator has to investigate (Measure) all the students height in that

college.

Population: The collection of individual items about which the study of the

investigation is concerned is called as population.

Merits:

The data is collected from all the items of study. Hence, bias is

minimized data is more accurate reliable and

The highest accuracy can be maintained.

Results drawn from the data collected through this method is

more representative and true.

Demerits:

When the coverage area is wide, this method is not suitable.

Because it will take more money, time and energy.

The cost needed is more, hence the organization that posses

huge finance and manpower can only adopt this method.

If the population size is infinite, this method is not suitable.

If the study is of destructive type product this method is not

suitable.

Destructive type product: The product that cant be used after its initial

use is called destructive type product.

Type of population: The two types of population are,

Hypothetical

Existent population.

The collection of concrete objects or persons under the study of

investigation constitutes the existent population. The existent population

may be finite or infinite. An existent population that consists of countable

number of individuals or objects is called as finite population.

An existent population that consists of un-countable no of individuals

or objects is called infinite population. E.g., In the study of economical level

of a particular college students, the totality of that college students and it

will be finite. Hence it is a finite population. E.g., In the study of

characteristic pattern of stars in the sky. All the stars in the sky constitute

the population. But there are infinite. Hence it is an infinite population.

The collection of non-concrete object, which exists only in

imagination and un-countable constitutes hypothetical or theoretical

population. For e.g., In the study pattern of the result of the coin tossing

experiment, the researcher couldnt get the concrete result. He can only

imagine the result as head and tail.

hypothetical population.

Sampling Method

The method or technique that is adopted to select the sample from the

population is called as sampling method.

Sample: A finite subset or small part of population that has exactly

duplicate characteristic of population used to make valid inference

regarding the entire mass of population is called as sample.

Objectives:

To get more information about the population with minimum effort

time and cost.

To estimate the population parameters through its statistic.

To obtain the degree of precision of the drawn result through its

statistic.

To draw valid conclusion about the population.

To give desired result with required precision with the given

minimum cost.

To identify the true representative of the population.

Merits:

It is more economical. (i.e.,) it saves time, money and energy

because of limited number of investigation units.

It helps to achieve high degree of accuracy.

It helps to get reliable results for the population.

It serves as the alternative method of census.

It helps to organize and administrate the survey easy.

be used.

Demerits:

Careful planning must be followed otherwise the result will be

incorrect and biased.

The result is based on the investigator. The attitude of personnel

will affect the result.

There is possibility of large errors.

Hence

The sample must be true representative of population

Experienced personnel have to be employed to the fieldwork.

The sample size must be adequate number.

The coverage area should be small.

The two types of sampling methods are,

Probability sampling

Non-probability sampling.

Probability sampling: The sampling method that follows some standard

procedure and selects the units with pre-defined probability is called

probability sampling.

The six types of probability sampling method are,

1). Simple (Equal) Random (chance) Sampling.

2). Stratified Random Sampling.

3). Systematic Random Sampling.

4). Cluster Sampling.

5). Multistage Sampling.

(1). Simple random sampling: Sampling procedure that is used to select

the sample from the population in such a way that each population units

called as simple random sample.

This is the simplest method to select the sample. This method is

applicable when the population is of homogenous nature. This simple

random sample can be selected by two ways.

(i). Lottery method:

In this method, all the population units are numbered or named. Then the

numbers or the names are written on different slips or cards of same size and

shape so that a card is not distinguished from others.

These cards are placed in a box and shuffled well so that no particular

card gets any preference in selection. From that box sample is selected one by

one, till the desired number of units are selected.

very large, this method is not suitable.

(ii). Random number table method:

In this method is sample is selected from the population by making

use of random number table. The table which contains random digits

arranged in row and column format is known as Random number table.

Selection process:

Random number table is arrangement of five digit numbers in row

and column format.

Selection process may be proceeded row wise or column wise.

Assign numbers to the population units.

Decide the sample size.

Count the number digits of population size. (i.e.,) k.

Read out number with k-digits from the random number table.

If the read number is greater than the population size, ignore it and

select the next number.

If the read number is less than the population size includes the

corresponding population unit in the sample.

Precede this process until required numbers of sample units are

selected.

There are several standard random number tables are available. Among

them some are,

L.H.C Tippets random number table: 10,400 four-digit numbers.

Fisher and Yates random number table:15,000 two digit numbers.

Kendall and B.B Smiths random number table: 25,000 four-digit numbers.

Rand corporations random number table: 2,00,000 five-digit numbers.

Merits:

There is less chance for personal bias.

As the sample size increases; the selected sample will be more

representative one.

Sampling errors can be measured.

This method saves money, time and labor.

Demerits:

This method requires complete list of population. But in many

enquires it is not possible.

As the sample size decreases the sample wont represent the

population.

If the population units are of heterogeneous nature this method

cant be employed.

(2). Stratified random sampling: A sampling method that selects sample

from the heterogeneous population by dividing the population into

sampling.

Since the population is of heterogeneous nature the population is

divided into stratums that are of homogenous nature. From that each

stratum, a number of sample units that constitutes the sample is selected.

The two types of stratified random sampling method are,

(i). Proportional method: If the sample is selected from the stratum

proportionate to its size, then the sample is selected by proportional

method.

(ii). Optimum method: If the sample is selected from the stratum by

considering the cost, then the sample is selected by optimum allocation

method. That is, based on the cost, the sample is selected.

Merits:

The sample selected by this method is more representative of

population.

If ensures grater accuracy.

For the heterogeneous population this method is more reliable.

Demerits:

The process of dividing the population into strata requires more

time money and experience.

If the stratification is not proper, then the sampling bias will prevail

in the sample.

(3). Systematic sampling: A probability sampling method that selects

sample by making using up-to-date complete list of population units is

called as systematic sampling. In this method, the selection of first

sampling unit is selected with probability, so it is also known as quasirandom sampling. After the selection of first unit is selected then the

remaining units of sample are automatically selected using the random

start range.

then this method can be used.

Selection procedure:

Assume that we have to select n units from N population units.

Arrange the items in numerical or alphabetical or geographical

or any other order.

Find the sampling interval K = N / n such that nk = N.

Select the random start i such that i < k.

Select the sample units of i-th, i+k-th, i+2k-th,.., i+(n-1) k-th

units to constitute the systematic sample.

Hence the random start determines the (Whole) sample.

Merit:

This method is simple and operationally more convenient.

Time and work involved in selection procedure is less.

Demerit:

This sample maynt represent the population.

If the population size is not multiple of sample size, one cant get

required number of sampling units.

(4). Cluster sampling: A probability sampling method that selects the

sample by grouping the population units into some groups called clusterssimilarity of objects, and selects the sampling units through the selection of

clusters is known as cluster sampling.

Cluster sampling is same as stratified random sampling, but the only

difference is, in the former the entire units of the selected clusters

constitute sample. But in the later case, the sampling units are selected

from the selected strata.

Merits:

It is suitable in large-scale survey, where the list preparation is

difficult.

Demerits:

It has less accurate than other methods.

(5). Multistage Sampling: When we consider the available resources,

concentrating on limited number of units for study, multistage sampling

helps us a lot. In national sample survey multiphase sampling is used. For

total health care programme the question is which village, which house

and which person is answered in this type of sampling.

I stage

Village selection

II stage

Household selection

III stage

Person selection

selected samples will be advantageous. Sampling error enhancement is

expected, since variation between the final units will be lesser (within the

group than between groups). Unequal size at different stages may pose

analytical difficulties.

Another Example:

I stage - Urine sugar positive case are selected by screening tests

II stage All +ve cases under stage I are subjected for PPBS and these

who have above critical level of PPBS are selected.

III stage Among PPBS above critical level +ve, retinoscopy for diabetic

retinopathy is done and positive retinopathy cases are selected.

Non-probability sampling: The sampling method that doesnt follow any

standard procedure and selects the units with unknown probability is called

probability sampling method.

The three types of non-probability sampling methods are,

1. Judgment or purposive sampling.

2. Convenience sampling.

3. Quota sampling.

Judgment/purposive sampling: The sampling method, which selects the

sample units to achieve a specific purpose, is called as judgment or

purposive sampling method. In this method the samplers choice plays

major role in collecting the sampling unit.

For e.g. to know or study the cultural activity of the students in a

particular college the sampler has to select the students who are interested

in cultural activity. Then only the study reveals the valid conclusion. If not

so the sample does not reflect the population characteristics- Cultural skill

of the college. Hence he has to find the students who are involved in that

activity; from them the investigator has to collect the information.

Merits:

It is simple method

The sample collected is more representative.

This method can be adopted for public policy, to make decision,

etc.,

Demerits:

Due to sampler interest, the sample maynt be true representative

of population.

Difficult to correct sampling errors.

The estimates will not be accurate.

In this method population is divided into various quotas and then

from the quota the sample is selected. The sample size per quota is

personal judgment. This is also known as stratified purposive sampling

method.

Merits:

This method reduces money and time.

Demerits:

Result is based on the investigators.

Personal bias is possible.

Since sample selection is based on random sampling. Sampling

errors cant be estimated.

Convenience sampling: The sampling method that selects the sample

units based on the continent of investigator is called as convenient

sampling. If

The universe is not clearly defined.

Sample unit is not clear.

Complete list is not available.

Then this method can be used.

Demerits:

This sample is not true representative of population

The results are biased.

But this method can be used for pilot study.

Applications of Sampling Designs

perpetuating factors which influence health and disease.

2. Evaluation of health programmes.

3. Impact studies.

4. Coverage surveys.

5. Planning, administration and implementation of activities.

6. Forecasting the future.

7. Environmental studies.

8. Evaluation of health status.

PRESENTATION OF DATA

After the data collection is over, the researcher has raw data. (i.e., The

information prior to the proper arrangement is known as raw data.) They

are huge and conducive. As such, the researcher cant carryout analysis

and they wont furnish any useful information. So to condense and present

the data into compact manner we go for presentation of data. Presentation

of data has three main types of presentations. They are,

1. Classification,

2. Tabulation, and

3. Graphical representation.

groups according to their common characteristics and separating them into

different but related parts is called as classification.

Objects:

The raw data are classified,

To condense the mass of data.

To present the data in simpler form.

To facilitate comparison and statistical treatment.

To bring out relation.

To facilitate further analysis.

To eliminate the unnecessary data.

Rules for classification:

The classes should be rigidly defined. (I.e.) there shouldnt be any

ambiguity in their rules.

The classes shouldnt overlap (i.e.) each item of data must have its

place in only one class.

The classification must be flexible to adjustment of new situations.

The items included in total and sub total of class and subclass must

be same.

Types of classification:

Geographical classification: Classifying the data based on the

area of its occurrence such as states, districts, Taluks etc., is called

as geographical classification.

Chronological classification: Classifying the data based on the

time of its occurrence such as decades, Years, Months, etc., is

called as chronological classification.

Quantitative classification: Classifying the data based on some

characteristics that is capable of quantitative measurement like age,

price, weight etc., is called as quantitative classification.

Qualitative classification: Classifying the data based on the

qualitative characteristics such as sex, honesty, literacy, etc., is

called as qualitative classification.

in this type of classification.

rows and columns in accordance with some characteristics is called as

tabulation.

Objects:

To simplify complex data.

To clarify characteristics of data.

To facilitate comparison.

To detect errors and omissions in the data.

To facilitate statistical processing.

The parts of table are:

1. Table number,

2. Title,

3. Head note,

4. Caption,

5. Strata,

6. Body of table,

7. Foot-note,

8. Source-note.

The table number is used for identify and reference of the table in

future. For the reference and explanation the columns may also have

numbers.

Each table has to be given a suitable title. Suitable in the sense, it

must describe the content of table.

Head note is a statement about the tables that is placed below the

table title within brackets. Usually the measurements of the table units are

placed such as, in-millions; in crores; etc,

The headings of the columns are called as captions. They must be

brief and self-explanatory. This caption may have sub-headings.

The row headings names are called stabs.

The most important part of the table that contains the numerical

information is called body of table. To provide any explanation about the

items in the table, footnote is used.

Types of tabulation:

1. One-way tabulation,

2. Two way tabulation, and

3. Manifold tabulation.

One-way Table: The table that displays information on a single variable is

called as one-way table or univariate table. The variable may be discrete or

categorical.

Two-way Table: The table that displays information on categories of a

single variable over the categories of another variable is known as two-way

table or bi-variate table.

Manifold table: The table that shows information on more than two

variables categories is known as manifold table.

Frequency Distribution: A tabulation type that summarizes the raw data

in the form of table along with variable values or variable class intervals

and their corresponding frequencies is known as Frequency table. It may

be one-way or two-way or manifold type.

Moreover, Frequency table

1) Organizes the data into compact manner without loss of

essential information.

classes or discrete points.

There are three types of frequency tables. They are,

1. Discrete frequency table.

2. Continuous frequency table.

3. Relative frequency table.

Discrete Frequency table: A Frequency table that shows the distribution

of frequencies at different distinct values of variable is known as discrete

frequency table.

Procedure to form discrete frequency table:

1. Draw a table with three columns namely, variable, tally marks and

frequency.

2. Take the first observation.

3. Write down the observation in the variable column and put a tally

mark (|) against the written observation in the tally mark column.

4. Take the next observation.

5. Check weather the observation is entered in the variable column

or not.

6. If it is entered, put another tally mark against the written

observation. Else, go to the step 3.

Repeat the procedures starting from 4 6 until all the observations

are entered in the table.

7. Count number of tally marks for each variable and put the totals

in the frequencies column.

8. The resultant table is called as discrete Frequency Table.

9. If for any variable row has four tally marks, then the next

occurrence of that variable is marked by putting a cross mark

over the four bars. This process facilitates counting process.

distribution of frequencies over different class intervals of values is known

as continuous frequency table.

Procedure to form Continuous frequency table:

1. Draw a table with three columns namely, variable, tally marks and

frequency columns.

2. Find the smallest and largest observations in the data set.

3. Decide the class interval.

4. Write down the class limits with equal class intervals under the

heading variables.

5. Take the first observation.

6. Decide in which class it falls.

7. Put a tally mark (|) against the variable class in the tally mark

column.

8. Take the next observation.

9. Repeat the procedures starting from 6 - 8 until all the

observations are entered in the table.

10.

Count number of tally marks for each variable class and put

11.The resultant table is called as continuous Frequency Table.

Relative Frequency Table:

observations is known as relative frequency distribution.

It is noted that, the sum of relative frequency is equal to one when

the frequencies are expressed as fractions and the total is 100 when the

frequencies are expressed as percentage.

Graphical representation:

neat, concise systematic and understandable manner. But, the large

amount of information, extending over a large number of columns is

difficult to understand the significance of data. Hence, the statisticians are

necessitated to introduce diagrams and graphs.

Classification is the process of grouping of data into homogenous

groups or categories. Tabulation is the process of presenting the classified

data in tabular form.

The process of highlighting the salient features of study through

graphs and charts is called as graphical representation. This type of

presentation made easy to understand. Moreover, attractive graphs and

charts make understood at a glance for even layman.

Merits:

Diagrams are attractive and create interest in the mid of readers.

Diagrams are easily understandable to even for the layman.

In interpretation, diagram saves much time.

i.e., human beings maynt like go through numerical figures. But they

may like to go through diagrams.

Diagrams make data simple.

i.e., at a glance of look on diagrams remembered and readers can

easily understand the pattern of data.

A diagram facilitates comparison of two or more sets of data.

Diagrams reveal more information than data in a table.

Limitations:

Diagrams cant be analyzed or used for further analysis.

Diagrams shows approximate values only

It exposes only limited facts.

(i.e.) all details cant be presented in the form of diagrams.

This is supplementing to tabulation not an alternative one.

Rules for making diagrams:

Every diagram must be given a suitable title of bold letters.

The title conveys the main fact depicted by the diagram.

Sub-headings may also be given.

Title should be brief and self-explanatory.

Due to comparison, diagram must be drawn accurately and

neatly.

Each diagram should be numbered for further reference.

The type of diagram should be selected according to the nature

of data.

When many items are shown in the diagram, through different

patterns such as dots, crossing etc., index must be given.

Diagram must be simple as understandable by the layman.

There are two types of graphical representation. They are,

1. Graphs,

a. Frequency curves,

b. Frequency polygon, and

c. Ogives.

i. Less than ogives, and

ii. More than Ogives.

2. Charts/ Diagrams.

a. Bar chart,

i. Simple bar chart,

ii. Multiple bar chart,

iii. Stacked bar chart, and

b. Pie- chart, and

c. Histogram.

One-dimensional diagram: The diagram that is drawn to the single set of

data set is called one-dimensional diagram. The bar and pie diagram are

belongs to this one-dimensional diagram.

Bar chart: The visual representation of (qualitative or categorical or

discrete numerical) data is called as bar chart. The bars are proportionate

height to the frequency. The bars may be horizontal or vertical. The

distances between the bars are kept uniform. Bar charts are drawn only for

single discrete quantitative or categorical variables.

The types of bar diagrams are

Simple bar chart.

Multiple bar chart,

Stacked bar chart.

Simple bar chart: The bar diagram that is drawn for a single set of

categorical or numerical data is called as simple bar diagram.

Multiple bar chart: The bar diagram that is drawn to single variable with

more than one phenomenon is called as multiple bar diagram. This

facilitates the comparison. The categories of a single variable are drawn

side by side. The differentiation is shown by different colors or patterns

such as lines dots etc,

Stacked bar chart: A type of bar diagram that is drawn for single variable

with any number of (categorical or numerical) categories is called as

Stacked bar diagram. In this diagram the categorical variables categories

are placed on the bar by dividing the portion of bar.

Percentage bar chart: Percentage bar diagram is a kind of stacked bar

chart, drawn for percentage of frequencies of categorical variables with the

equal bar height is called as percentage bar diagram. The division of bars

of categories is made with the percentages. But in this case bars are of

equal heights to 100%. But in the stacked bar diagram the height of bars

are unequal. That is, bars are proportional to the frequencies of the base

variables category.

Pie diagram: The graphical representation of single variables categories

in circle form is called pie diagram. In this graph the circle is divided into

the various pieces based on the frequency. This type of diagram provides

high understanding ability at a glance. The each slide is divided by taking

the whole data equal to 360 degrees.

Relative Frequency Histogram:

relative frequency histogram.

Histogram: A bar diagram where the bars are constructed continuously

without (leaving space between bars) on the class intervals in such a way

that the height of bars are proportional to the frequencies of relative

classes is known as Histogram.

Frequency polygon: The graph formed by plotting the frequencies

against the mid points of continuous frequency distribution and joining the

points by straight lines is known as Frequency polygon.

This can also be obtained from the histogram by joining the top mid

points of bars with straight lines.

Frequency Curve:

The graph that is formed by plotting the frequencies against the mid

points of continuous frequency distribution and joining the points by freehand curve is known as Frequency polygon.

This can also be obtained from the histogram by joining the top mid

points of bars with free hand curve.

Ogives:

The graph obtained by plotting the cumulative frequencies

against the class limits of continuous frequency distribution is known as

Ogives.

The two types of Ogives are,

1. Less than Ogive.

2. More than Ogive.

Less than Ogive:

The graph obtained by plotting the less than cumulative

frequencies against the upper class limits of continuous frequency

distribution and joining the points of smooth curve are known as less than

Ogive.

More than Ogive:

The graph obtained by plotting the more than cumulative

frequencies against the lower class limits of continuous frequency

distribution and joining the points of smooth curve are known as more than

Ogive.

DATA ANALYSIS

The process of drawing or obtaining the representative measure

from the raw, mass amount of data is called data analysis. To carry out, the

analysis, statistical methods are used. Hence it is called statistical data

analysis.

The three type of data analysis are

Univariate data analysis.

Bivariate data analysis.

Multivariate data analysis.

Analyzing or drawing

univariate data analysis. That is, the characteristics of single data set are

studied. The three types of Univariate Data Analysis Tools are,

1. Measures of Central Tendency,

2. Measures of Dispersion,

3. Skewness, and

4. Kurtosis.

Analyzing or obtaining the representative measure for two sets of

variables by considering both the variables simultaneously is called

bivariate data analysis. The variables type may be quantitative or

qualitative.

The two types of bivaritate measures are,

Associative measure and

Functional measure

Associative measure: The measure that is used to measure the interrelationship between the two types of variables is called associative

measure.

The two types of associative measures are,

Correlation and

Chi-square association

Chi-square association: The bivariate method that is used to measure

the relationship between two qualitative variables is called chi square

variables are dependent or independent.

Functional measure: The process of finding relationship between the two

sets of variables in the form of equation is called functional measure. In

this case, variables can be classified as dependent and independent.

The statistical method that finds the functional relation of two sets of

variables is known as regression analysis.

Multivariate analysis:

The simultaneous study of several related and equally important

random variables is called multivariate data analysis. That is, multivariate

tool is used to deal more number of variables under study.

The multivariate analysis is classified into.

Dependent analysis and

Interdependent analysis

Dependence analysis:

The method of studying the association between two sets viz.

dependent and independent variables is called dependence analysis. That

is, the relationship between the dependent set and independent set is

analyzed by this dependence analysis.

The five dependence analysis methods are,

Multiple regression,

Discriminant analysis,

Logit analysis,

Multivariate analysis of variance and

Canonical correlation.

Inter dependence methods:

is called interdependence analysis. In this study no distinction will be made

such as dependent and independent.

The five interdependence methods are,

Principal component analysis:

Factor analysis

Cluster analysis

Log linear models and

Multidimensional scaling

Factor analysis: A data reduction technique that studies the inter

relationship among a set of variables by introducing new set of variables

that are fewer in number than the original set of variables is called factor

analysis.

Profile analysis: The graphical method of comparing a number of ordinal

variables based on different groups is called profile analysis. That is the

common opinion nature about the ordinal variables is studied.

Friedman test: A non-parametric statistical method that is applied to

ranking data set to find the common agreement of ranking between the

respondents about the various factors is called Frideman test.

Kendalls w test: This procedure is similar to Fridman test. The merit of

this method is it provides Kendalls concordance value that represents the

amount of common agreement between the respondents.

Logistic regression: This method is used to examine the relationship

among the set of variables. That is, the statistical method that is used to

study about a dichotomous response variable, which is explained by a

number of explanatory variables, is called as logistic regression. (It may be

ordinal or interval or ranking data)

The assumptions for logistic regression are,

The model for response and explanatory variable is log linear.

DESCRIPTIVE STATISTICS

Measures of Central Tendency:

A single (single) representative measure

Describes the characteristics of entire mass of data

There are three types of measures of central tendency. They are

Mean,

Arithmetic Mean,

Weighted Mean,

Geometric Mean,

Harmonic Mean.

Median,

Mode,

The characteristics of good average are:

It should be preciously (rigidly) defined.

It should be

Easy to understand.

Easy (Simple) to compute.

Based on all observation.

Capable of further analysis.

Its definition should be in the form of mathematical formula.

It should not be influenced by extreme values.

It should have sampling stability. (Least affected by sampling

fluctuations)

Merits of averages:

It facilitate quick understanding of complex data:

The purpose of average is to represent a group of values in

simple and concise manner. That is, an average condenses the

mass of data into a single figure.

It facilitates comparison.

It facilitates to know about universe from sample.

If helps in decision-making.

It establishes mathematical relationship.

Mean: A single representative figure of a mass amount of data which

obtained by adding together all the values and dividing the sum by the total

number observations is called mean (i.e.) if the series x 1, x2, x3, , xn has

the n observations. Than the mean value of this series will be,

(i)For ungroupedData:

n

Xi

X i 1

n

(ii) For DiscretefrequencyDistribution :

n

fi xi

X i 1

,

N

n

WhereN fi is thetotalfrequency.

i 1

n

fi xi

X i 1

,

N

n

WhereN fi is thetotalfrequency.

i 1

xi ' s areMid pointsof classinterval.

(iv)DeviationFormula:

n

fi di

X A i 1

,

N

n

WhereN fi is thetotalfrequency.

i 1

xi ' s areMid pointsof classinterval.

A is assumedmeanfrom within theseries.

di ' s arethedeviationvalues,[i.e.,di (xi A)]

(iv) Step DeviationFormula:

n

fi d i

X A i 1

h,

N

n

WhereN fi is thetotalfrequency.

i 1

d i ' s arethedeviationvalues,[i.e.,d i (xi A)]

xi ' s areMid pointsof classinterval.

A is assumedmeanfromwithin theseries.

h is thewidth of classinterval.

(v)For CombinedSeries:

combinedserie

k

k

n

n i xi

1

X i n

n

i 1 i

Properties:

1. The sum of deviations taken from arithmetic mean is zero. (i.e.,)

(xi-x) = 0

2. The sum of squares taken from the mean other than is minimum.

(i.e.,)

X

n

i 1

the observations.

Merits:

i 1

It is used in further calculations.

It is based on all the items.

It provides a good basis for comparison.

It is a more stable measure.

It is considered as good or idle average.

Demerits:

Mean is unduly affected by extreme values.

It is unrealistic.

It may lead to wrong conclusion.

It is not useful for studying the qualitative characters.

It is not suitable measure in case of highly skewed distribution.

It gives greater importance for bigger values and smaller

importance for the smaller values in the series.

It cannot calculate for the frequency distribution with open-end

class.

Median: A measure of location calculated from the set of values that

divides the series into two equal parts is called as median. That is one of

part of data set contains the items less then median and another part of

data set contains the items greater then median value. But the number of

observations on both the sides is equal.

1). For ungrouped data:

a. Arrange the observations in either ascending or descending order

of magnitude.

b. Find the number of observations in the data set. (i.e., n).

Median

1

2

th

observation.

d. If

n is

even,

then

the

median

of

the

data

set

is,

th

n th

n 1

observatio

n

observatio

n

Median

1. Form the cumulative frequencies.

n

fi

2. Find i 1 , where n f is thesumof frequencie

s.

i 1 i

2

n

fi

3. Find the cumulative frequency just greater than i 1 .

2

is the median of the set of observation.

3). For grouped data: (Continuous frequency distribution)

1. Form the cumulative frequencies.

n

fi

n

2. Find i 1 , where

f is thesumof frequencie

s.

i 1 i

2

n

fi

3. Find the cumulative frequency just grater than i 1 .

2

f

i

i 1 m

Median L

c

f

Where' L ' is thelowerlimit of themedianclass.

' m' is thecumulativefrequencyof themedianclass.

' f' is thefrequencyof themedianclass.

' c' is thewidth of theclassinterval.

Merits:

It is easy to understand and compute.

It is quite rigidly defined.

It eliminates the effect of extreme items.

It is amenable to further process.

Median can be calculated for even qualitative phenomenon.

Its value generally lies in the distribution.

It can be calculated for frequency distribution with open-end

class interval.

This can be located graphically.

Demerits:

If the series is of irregular nature, median cannot be

computed.

It ignores the extreme values.

In the case of continuous case and even number of

observations, median is estimated but not calculated.

It is not based on all observations.

It is not amenable to algebraic treatments.

It is affected by the fluctuations of sampling.

exclusive type class interval. To calculate the median the class

interval has to be converted into inclusive type class interval

by adding the value to both the limits (Upper And Lower).

Mode: A single value that appears more number of times (more

frequently) than other observations in the data set is called as

mode.

1). for ungrouped Data:

i). count the observations frequency.

ii). The observation that has occurred more number of times is

the mode of that data set.

2). For Grouped data: (Discrete frequency Distribution)

i). from the frequency distribution identify the highest

frequency.

ii). The observation corresponding to the highest frequency is

the mode of distribution.

3). For Grouped data: (continuous frequency Distribution)

i). From the frequency distribution identify the highest

frequency.

ii). The class interval corresponding to the highest frequency

is the modal class.

iii). Find mode by using the formula,

Mode L

f1 f0

c

2f1 f0 f2

' f ' is thefrequencyof theclasspreceeding

to themodal class.

0

' f ' is the frequencyof themodeclass.

1

' f ' is thefrequencyof theclasssuccedingto themodal class.

2

' c' is thewidth of theclassinterval.

Merits:

and precise.

This value can be determined to the open-end class interval.

Demerits:

It is ill-defined (If there is two observations occurs equal

number of times we cant calculate the mode-bi-modal

distribution)

It is amenable to further mathematical treatment.

It is not based on all observations.

It is difficult to compute, when there are both positive and

negative data in the series.

It is stable only when the sample size is large.

or any one or more observation is zero, we cant find the mode

of distribution.

Characteristics

Precious

Definition

Procedure

Understanding

Calculation

Observations

Utilization

Further

Mean

Median

Mode

Given

Given

Not given

Easy

Easy

Easy

Easy

Not all

obsn:s

Not

Easy

Easy

Not all

obsn:s

Not

All obsn:s

Amenable

treatment

Sampling

fluctuations

Effect of extreme

values

Least

affected

Much

affected

amenable

Much

affected

Not

affected

amenable

Much

affected

Not affected

noted that, among the tools mean holds many of the idle average

characteristics. Hence, Mean is considered as good or idle average.

Measures of dispersion:

The statistical tool that measures the variation or the scattered ness of

values from its representative (Central) value is called as dispersion.

Properties of good measure of variation are,

It should be easy to calculate and understand.

It should be rigorously defined.

It should be based on all observations and amenable to further

treatment.

It must have sampling stability.

If should not affected by extreme values.

The types measures of dispersion are,

Range:

Range,

Mean deviation.

The simplest measure of dispersion that is calculated by

subtracting the minimum value from the maximum value of the data set is

called as range.

i.e., Range = maximum value - minimum value.

dispersion that is defined as positive square root of arithmetic means of

squared deviation values from arithmetic mean is called as standard

deviation. Standard deviation is denoted by .

That is, to stabilize the negative and positive variations. The square

of deviations is taken.

Formula for calculating standard deviation value is,

N

2

Xi X

Population s tan dard Deviation i 1

N

If we have sample, then the sample standard deviation(s) is,

n

2

Xi X

i 1

n 1

Merits:

It is rigorously defined.

Its value is always definite.

It is based on all observation of data.

It is amenable for further analysis.

It is less affected by sampling fluctuations.

It serves basis for measuring coefficient of correlation. Sampling

and statistical inference.

This is the most appropriate measure for the variability,

measurement of distribution.

characteristics of an ideal measure of dispersion.

Demerits:

It is not easy to understand and calculate.

It gives more weight to extreme values by squaring them.

It cannot be used for comparison

Co-efficient of variation or relative measure: This is a measure of

relative variation rather than absolute variation. In order to decide which

of the two distributions is more variable, we compare the coefficient of

variation. The distribution with greater CV is said to be more variable.

Such a measured is found in the coefficient of variation, which expresses

the standard deviation as a percentage of the mean. The formula is given

by

Co efficient of Variation C .V

deviation and

x 100

C.V

s

X

100

(Where, s- is

Variation. The data set with greater co-efficient of variation will have more

variability (or less precise / less consistent / less homogeneous).

(i). The standard deviation is useful as a measure of variation within a

given set of data. When one desires to compare the dispersion in two

sets of data, however, comparing the two standard deviations may lead

to fallacious results.

(ii). It is used to compare two variables involved are measured in different

units

Example

We may wish to know, for a certain population, whether serum

cholesterol levels, measured in milligrams per 100ml, are more variable

than body weight, measured in pounds.

(iii). Although the same unit of measurement used, the two

measurements may be quite different.

Example

If we compare the standard deviation of weights of first grade

children with the standard deviation of weights of high school freshmen,

we may find that the latter standard deviation is numerically larger than

the former, because the weights themselves are larger, not because the

dispersion is greater.

PROBABILITY DISTRIBUTIONS

The relationship between the values of a random variable and the

probabilities of their occurrence may be summarized by means of a

device called a probability distribution. A probability distribution may be

expressed in the form of a table, a graph, or a formula. Knowledge of the

probability distribution of a random variable provides the clinician

researcher with a powerful tool for summarizing and describing a set of

data and for reaching conclusions about a population of data on the basis

of a sample of data drawn from the population.

There are two types of probability distribution

(1). Discrete

(2) Continuous

The probability distribution of discrete random variable is table, graph, or

other device used to specify all possible values of a random variable

along with their respective probabilities.

The following are two essential properties of a probability distribution of a

discrete variable

(1)

0 P( X x) 1

( 2)

P( X

x) 1

1. Binomial

2. Poisson

The binomial distribution is one of the most widely encountered

probability distributions in applied statistics. The distribution is derived

from a process known as a Bernoulli trial, named in honor of the Swiss

mathematician James Bernoulli (1654-1705), who made significant

contributions in the field of probability, including, in particular, the binomial

distribution. When a random process or experiment, called a trial, can

result in only one of two mutually exclusive outcomes, such as dead or

alive, sick or well, male or female, the trial is called a Bernoulli trial.

The Bernoulli process A sequence of Bernoulli trials forms a Bernoulli

process under the following conditions.

1.

is denoted a failure.

2.

3.

The trials are independent; that is, the outcome of any particular trial

Example1:

We are interested in being able to compare the probability of x successes

in n Bernoulli trials. For example, suppose that in a certain population

52% of all recorded births are males. We interpret this to mean that the

probability of a recorded male birth is 0.52. If we are randomly select five

birth records from this population, what is the probability that exactly

three of the records will be for male births?

Solution: Suppose the five birth records selected result in this sequence

of sexes

MFMMF

In coded we would write this as

10110

Since the probability of a success is denoted by, p=0.52

And the probability of a failure is denoted by, q= 1-p = 1-0.52 = 0.48

The probability of the above sequence of outcomes is found by means of

the multiplication rule to be

P (1, 0, 1, 1, 0) = pqppq = q2p3

Three successes and two failures could occur in any of the following

additional sequences as well

Number

1

2

3

4

5

6

Sequence

10110

11100

10011

11010

11001

10101

Probability

pqppq

q2p3

pppqq

q2p3

pqqpp

q2p3

ppqqp

q2p3

ppqqp

q2p3

pqpqp

q2p3

7

8

9

10

01110

00111

01011

01101

qpppq

qqppp

qpqpp

qppqp

q2p3

q2p3

q2p3

q2p3

random sample of size 5, drawn from the specified population, of

observing three successes (record of a male birth) and two failures

(record of a female birth)?

The answer to the question is

10(0.48)2(0.52)3 = 10(0.2304)(0.140608) = 0.32

General formula:

n

f ( x) p x q ( n x )

x

0, elsewhere

Where, f(x) = P(X=x)

n = Number of trials

x = the random variable of success

p = probability of a success

q= probability of a failure = 1-p

This distribution satisfy the discrete probability distribution properties

1.

f(x)0, for all real values of x. this follows from the fact that n and p

n

x

x

x

are both nonnegative and, hence x , p , q (1 p ) are all non negative and,

2.

n x ( n x )

p q

x

is equal to1.

Example2:

Suppose that it is known that 30% of certain populations are immune to

some disease. If a random sample of size 10 is selected from this

population, what is the probability that will contain exactly four immune

persons?

Solution:

The probability of an immune persons to be 0.3 i.e. p =.0.3 and q = 1-p =

1-0.3 = 0.7

10

(0.3) 4 (0.7) (104 )

4

f ( 4)

10!

(0.0081) (0.117649 )

4! 6!

0.2001

The binomial distribution has two parameters, n and p. they are

parameters in the sense that they are sufficient to specify a binomial

distribution. The binomial distribution is really a family of distributions with

each possible value of n and p designating a different member of the

family. The mean and variance of the binomial distribution are = np and

2 = np(1-p), respectively.

Strictly speaking, the binomial distribution is applicable in situations

where sampling is from an infinite population or from a finite population

with replacement. Since in actual practice samples are usually drawn

without replacement from finite populations, the question arises as to the

appropriateness of the binomial distribution under these circumstances.

Whether or not the binomial is appropriate depends how drastic is the

generally agreed that when n is small relative to N, the binomial model is

appropriate.

POISSON DISTRIBUTION

The next discrete distribution that we consider is the Poisson distribution,

named for the French mathematician Simeon Denis Poisson (17811840), who is generally credited for publishing its derivation in 1837. This

distribution has been used extensively as a probability model in biology

and medicine.

If x is the number of occurrences of some random event in an interval of

time or space (or some volume of matter), the probability that x will occur

is given by

f ( x)

e x

x!

x 0, 1, 2, ........

and 0

The Greek letter (lambda) is called the parameter of the distribution and

is the average number of occurrences of the random event in the interval

(or volume)

The symbol e is the constant = 2.7183

It can be shown that

1.

2.

f ( x) 1

x

distribution.

The Poisson Process

We have seen that the binomial distribution results from a set of

assumptions about an underlying process yielding a set of numerical

observations. Such, also is the case with the Poisson distribution. The

following statements describe what is known as the Poisson process.

1.

a second occurrence of the event in the same, or any other, interval.

2.

3.

4.

An interesting feature of the Poisson distribution is the fact that the mean

and variance are equal.

When to Use the Poisson Model

The Poisson distribution is employed as a model when counts are made

of events or entities that are distributed at random in space or time. One

may suspect that a certain process obeys the Poisson law, and under this

assumption probabilities of the occurrence of events or entities within

some unit of space or time may be calculated.

Examples:

Number of failure of surgery for experienced doctors.

Number of rain days in summer from 1947 to last year.

Number of unexpected holidays declared by a hospital.

Number of major accidents in a street road.

Example for calculation:

India between 1977 and 1987 closely followed a Poisson distribution with

parameter =2.75. Find the probability that a randomly selected month

will be one in which three adolescent suicides occurred.

Solution: The Poisson distribution is given by

f ( x)

e x

x!

x 0, 1, 2, ........

and 0

e 2.75 ( 2.75) 3

3!

(0.063928) (20.796875)

0.221583

6

f ( X 3) P ( X 3)

CORRELATION

Correlation: The statistical method that discovers amount of relationship

and the direction between two sets of quantitative variables is called as

correlation. The correlation provides nature and indent of the relationship.

(i.e.) if correlation between A and B is 0.48 then the negative sign

express that the relationship is negative and the value 0.48 expresses

the amount of relation between the variables A and B.

Correlations value will always lie on the interval of 1 and +1 (i.e.,

-1 1).

Assumptions:

a.

b.

c.

variable

The nature of correlation:

correlation, if its value lie on the interval 0 and +1. Two variables are

positively correlated if for an increase in the value of one variable there is

also an increase in the value of the other variable or for a decrease in the

value of one variable there is also a decrease of in the value of the other

variable. That is the two variables change in the same direction.

Examples:

Age and weight of the patient

Weight and blood pressure

correlation, if its value lie on the interval 0 and -1. Two variables are

negatively correlated if for an increase in the value of one variable there

is a decrease in the value of the other variable; that is the two variables

change in the opposite direction.

Examples:

Number of patient visiting and number of clinics

the value of one variable has no connection with the change in the value

of other variable.

Example: We should expect Zero correlation between

Age and tooth color of a person

Simple and multiple correlation

The correlation between two variables is called simple correlation. The

correlation in the case of more than two variables is called multiple

correlation.

Scatter Diagram

Let us consider a set of paired values of the variables x and y. along the

horizontal axis we represent the values of y and along the vertical axis

the values of x. plot the values (x,y) on a graph paper. We get a collection

of dots. The figure so obtained is called a scatter diagram. From the

scatter diagram we can obtain a rough idea of the correlation between

two variables x and y.

If all these dots cluster around a line the correlation is called linear

correlation. If the dots cluster around a curve, the correlation is called a

non-linear or curve linear correlation. We can also get an idea about of

whether the correlation is positive or negative from the scatter diagram.

They are illustrated in the following diagrams

Formula:

The formula is given by

n

Formula (1):

x y

i 1

i 1

2

i

i 1

2

i

x y

i 1

Formula (2):

i 1

2

i

i 1

y

i 1

i 1

y

i 1

2

i

i 1

1 n

( xi x) ( yi y) Covariance of x and y

Formula (3): r n i 1

x y

(SD of x) (SD of y)

In words, that is

sum of the product of the deviations of x and y pairs from their respective means

sum of squares of the sum of squares of the

The coefficient of correlation r lies between -1 and +1 inclusive of those

values.

1)

together.

2)

variables x and y.

3)

4)

variables x and y .

5)

Uses of r 2 :

It is customary to mention by statisticians that r is of even more use in its

squared form r 2 is also called coefficient of determination, which

measures the proportion of the total variance in y which is associated

with or can be explained by the variance in x.

The proportion r 2 of is also substituted to percent by multiplying by 100.

For example,

Correlation (r) of x and y is 0.9982

Then

r 2 =0.99640324

r 2 100=99.640324%

variance of X. the remaining variance i.e., 100-99.64=0.36 represents the

variance due to otherwise unexplained deviations of around the

regression line.

r 2 =99.64

S x2. y 0.36

care in two stages.

Stage1: r may be significant yet very weak and no practical consequence

is there. It is called important difference.

Stage2: r may be strong (close to +1) but small sample size may prove

this to be not significant. It is called significant difference.

The Karl Pearsons formula for calculating r is developed on the

assumptions that the values of the variables are exactly measurable. In

some situations it may not be possible to give precise values for the

variables. In such case we can use another measure of correlation

coefficient called rank correlation coefficient. We rank the observations in

ascending or descending order using the numbers 1, 2, 3,, n and

measure the degree of relationship between the ranks instead of actual

numerical values. The rank correlation coefficient when there are n ranks

in each variable is given by the formula

n

6 d i2

i 1

2

n( n 1)

of x and y.

And n=number of observations.

Note: Tied ranks: when the values of variables x and y are given we can

rank the values in each of the variables and determine the Spearmans

rank correltion coefficient. If two or more observations have the same

rank we assign them the mean rank. In this case ther e is a correlation

factor in the formula for . The formula for is given by

d i2

m( m

12

i 1

n(n 1)

1)

For example if a rank is repeated 2 times in x-series and 3 times in yseries, the correlation factor is

2(2 2 1) 3(3 2 1)

12

12

1)

2)

+1 or

-1 r +1)

Limitations

1)

variables is linear.

2)

the variables. It does not suggest that the variations in y are caused by

variables in x or vice versa. A high correlation between variables x and y

may describe any one of the following situations.

a) Variation in y is caused by variation in x.

b) Variation in x is caused by variation in y.

c) The x and y are jointly dependent.

d) The correlation between x and y may be due to chance.

Correlation are sometimes observed between variables not conceivably

be casually related.

For example

If a high correlation is found between the number of births and the

number of murders in a country it does not prove that number of births of

babies is determined by number of murders. This type of correlation is

called spurious correlation or chance correlation and they do not provide

any casual relationship between variables involved.

REGRESSION

Regression analysis is helpful in ascertaining the probable form of the

relationship between variables. The ultimate objective when this method

of analysis is employed usually is to predict or estimate the value of

another variable.

The Regression Model

In the typical regression problem, as in most problems in applied

statistics, researchers have available for analysis a sample of

observations from some real or hypothetical population. Based on the

results of their analysis of the sample data, they are interested in

reaching decisions about the population from which the sample is

presumed to have been drawn. It is important, therefore, that the

researchers understand the nature of the population in which they are

interested. They should know enough about the population to be able

either to construct a mathematical model for its representation or to

determine if it reasonably fits some established model. These are two

types

(1). Simple linear regression model

(2). Multiple linear regression model

(1). Simple Linear Regression Model

y = a+bx

( I )

x = independent or

explanatory variable

a = intercept

Assumptions:

interest. The variable x is usually referred to as the independent variable,

since frequently it is controlled by the investigator; that is, values of x may

be selected by the investigator, and corresponding to each preselected

value of x, one or more value of y are obtained. The other variable y

accordingly is called the dependent variable, and we speak of the y on x.

The following are the assumptions underlying the simple linear

regression model.

i.

ii.

iii.

For each x value, they are y values which have common variance

y2.x and their means lie in the true regression line.

(i)

i 1

i 1

yi na b xi

n

x y

i 1

..............(1)

i 1

i 1

a xi b xi2

.............(2)

Using equ (1) and (2) estimating the values of a and b substituting

these values into equ (I).

Formulae for find out y on x and x on y (Regression lines):

y on x

and

x on y

y y r

xxr

y

x

x

y y

y

x x

..(3)

(4)

b y. x

and

bx. y

and bx. y :

b y. x

Method (1): When means and correlation coefficient are not known

n

by . x

x y

i 1

n

x

i 1

bx. y

and

2

i

x y

i

i 1

n

y

i 1

2

i

Method (2): When means are known and correlation coefficient is not

known

n

by . x

( xi x)( yi y )

i 1

( x x)

and bx. y

( x x)( y y )

i 1

( y y)

i 1

i 1

b y. x r

y

x

and bx. y r

x

y

r 2 b y.x b y.x

r b y. x b y. x

b y. x

and bx. y are positive and r is negative if the regression coefficients are

negative. In no real situation we get one regression coefficient positive

and the other negative.

Logistic regression: This method is used to examine the relationship

among the set of variables. That is, the statistical method that is used to

study about a dichotomous response variable, which is explained by a

be ordinal or interval or ranking data)

The assumptions for logistic regression are,

TESTING OF HYPOTHESIS

INTRODUCTION

The purpose of hypothesis is to aid the clinician, researcher, or

administrator in reaching a conclusion concerning a population by

examining a sample from that population.

BASIC CONCEPTS

A hypothesis may be defined simply as statement about one or more

populations.

Types of hypothesis: Researchers are concerned with two types of

hypotheses research hypotheses and statistical hypotheses.

Research hypothesis is the conjecture or supposition that motivates the

research.

Statistical hypotheses are hypotheses that are stated in such a way

that they may be evaluated by appropriate statistical techniques.

HYPOTHESIS TESTING STEPS

For convenience, hypothesis testing will be presented as a ten-step

procedure. There is nothing magical or sacred about this particular

format. It merely breaks the process down into logical sequence or

actions and decisions.

1)

Data

The nature of the data that form the basis of the testing procedures must

be understood, since this determines the particular test to be employed.

Whether the data consist of counts or measurements, for example, must

be determined

2)

Assumptions

example, assumptions about the normality of the population distribution,

equality of variances, and independence of samples.

3)

Hypotheses

these should be stated explicitly.

Null hypothesis

Null hypothesis is the hypothesis to be tested. It is designated by the

symbol H0. The null hypothesis is sometimes referred to as a hypothesis

of no difference, since it is statement of agreement with (no difference

from) conditions presumed to be true in the population of interest. In

general, the null hypothesis is set up for the express purpose of being

discredited. Consequently, the compliment of the conclusion that the

researcher is seeking to reach becomes the statement of the null

hypothesis. In the testing process null hypothesis either is rejected or is

not rejected. If the null hypothesis is not rejected, we will say that the

data on which the test is based do not provide sufficient evidence to

cause rejection.

Alternative hypothesis

If null hypothesis is rejected, we will say that the data at hand are not

compatible with the null hypothesis, but are supportive of some other

hypothesis. The alternative hypothesis is a statement of what we will

believe is true if our sample data cause us to reject the null hypothesis.

Usually the alternative hypothesis and the research hypothesis are same,

and in fact the two terms are used interchangeably. We shall designate

the alternative hypothesis by the symbol HA or H1.

Rules for stating Statistical hypothesis

When the hypotheses are of the type considered in this chapter an

indication of equality (either =, , or ) must appear in the null hypothesis,

suppose for example, that we want tro answer the question: can we

conclude that a certain population mean is not 50? The null hypothesis is

H0: = 50

greater than 50. Our hypotheses are

H0: 50

Vs

HA: > 50

50, the null hypothesis is

H0: 50

Vs

HA: < 50

In summary, we may state the following rules of thumb for deciding what

statement goes in the null hypothesis and what statement goes in the

alternative hypothesis:

a)

b)

either =, , or .

c)

d)

is, the two together exhaust all possibilities regarding the value that the

hypothesized parameter can assume.

A Precaution

inference, in general, leads to the proof of a hypothesis; it merely

indicates whether the hypothesis is supported or is not supported by the

available data. When we fail to reject a null hypothesis, therefore, we do

not say that it is true, but that it may be true. When we speak of accepting

a null hypothesis, we have this limitation in mind and do not wish to

convey the idea that accepting implies proof.

4)

Test statistics

The test statistic is some statistic that may be computed from the data of

the sample. As a rule, there are many possible values that the test

statistic may assume, the particular value observed depending on the

particular sample drawn. As we will see, the test statistic serves as a

decision maker, since the decision to reject or not to reject the null

hypothesis depends on the magnitude of the test statistic. An example of

a test statistic is the quantity

z

x 0

n

is related to the statistic

z

x

n

The following is a general formula for a test statistic that will be

applicable in many of the hypothesis tests:

Test statistic

s tan dard error of the relevant statistic

In above example,

In above example, x is the revant statistic; 0 is the hypotesized parameter

and

5)

It has been pointed out that the key to statistical inference is the sampling

distribution. We are reminded of this again when it becomes necessary to

specify the probability distribution of the test statistic. The distribution of

the test statistic

z

x 0

n

hypothesis is true and the assumptions are met.

6)

Decision rule

All possible values of the test statistic can assume are points on the

horizontal axis of the graph of the distribution of the test statistic and are

divided into two groups; one group constitutes what is known as the

rejection region and the other group makes up the nonrejection region.

The values of the test statistic forming the rejection region are those

values that are less likely to occur if the null hypothesis is true, while the

values making up the acceptance region are more likely to occur if the

null hypothesis is true. The decision rule tells us to reject the null

hypothesis if the value of the test statistic that we compute from our

sample is one of the values in the rejection region and to not reject the

null hypothesis if the computed value of the test statistic is one of the

values in the nonrejection region.

Significance Level

The decision as to which values go into rejection region and which ones

go into the nonrejection region is made on the basis of the desired level

of significance, designated by . The term level of significance reflects

the fact that hypothesis tests are sometimes called significance tests, and

a computed value of the test statistic that falls in the rejection region is

said to be significant. The level of significance, , specifies the area

under the curve of the distribution of the test statistic that it is above the

values on the horizontal axis constituting the rejection region.

The level of significance is a probability and, in fact, is the

probability of rejection a true null hypothesis.

Since to reject a true null hypothesis would constitute an error, it seems

only reasonable that we should make the probability of rejecting a true

null hypothesis small and, in fact, that is what is done. We select a small

value of in order to make the probability of rejecting a true null

hypothesis small.

Types of Errors

The error committed when a true null hypothesis is rejected is called the

type I error. The type II error is the error committed when a false null

hypothesis is not rejected. The probability of committing a type II error is

designated by .

Whenever we reject a null hypothesis there is always the concomitant

risk of committing a type I error, rejecting a true null hypothesis.

Whenever we fail to reject a null hypothesis the risk of failing to reject a

false null hypothesis is always present. We make small, but we

generally exercise no control over , although we know that in most

practical situations it is larger than ..

We never know whether we have committed one of these errors when we

reject or fail to reject a null hypothesis, since the true state of affairs is

unkown. Comfort from the fact that we made .small and, therefore, the

probability of committing a type I error was small. If we fail to reject the

null hypothesis, we do not know the concurrent risk of committing a type

II error, since is usually unkown but, as has been pointed out, we do

know that, in most practical situations, it is larger than .

Possible

Action

7)

Fail to reject

Hypothesis

True

Correct

H0

Reject H0

action

Type I error

False

Type II error

Correct action

From the data contained in the sample we compute a value of the test

statistic and compare it with the rejection and nonrejection regions that

have already been specified.

8)

Statistical Decision

hypothesis. It is rejected if the computed value of the test statistic falls in

the rejection region, and it is not rejected if the computed value of the test

statistic falls in the nonrejection region.

9)

Conclusion

conclude that H0 may be true.

10) P-Value

The p-value is a number that tells us how unusual our sample results are,

given that the null hypothesis is true. A p value indicating that the

sample results are not likely to have occurred, if the null hypothesis is

true, provides justification for doubting the truth of the null hypothesis.

The p is probability of occurrence of an event as extreme under null

hypothesis. Here chance of occurrence is 5 in 100. This is critical

probability, since a rare occurrence null hypothesis is rejected.

We emphasize that when the null hypothesis is not rejected one should

not say that the null hypothesis is accepted. We should say that the null

hypothesis is not rejected. We avoid using the word accept in this

case because we may have committed a type II error. Since, frequently,

wish to commit ourselves to accepting the null hypothesis.

Steps involved in testing of hypothesis:

The following are the steps involved in applying a test of significance.

1)

2)

3)

4)

5)

not rejected.

If the objective is to conclude that the two samples are from the same

population or not, without considering the direction of difference

significance is used. On the other hand, if the objective is to conclude

that the mean of one of the samples is larger than the other or not, one

tailed test of significance is used.

The decision about the choice of test statistic depends on the sample

size and the type of data whether qualitative or quantitative and the size

of sample.

The test of significance is used when the objective is to compare

a)

b)

c)

d)

e)

If we have large sample and the objective is either one of (a) to (d)

mentioned above, we use the normal curve test or the normal test. To

use the normal test, the following assumptions should be satisfied.

i.

ii.

iii.

less the same.

iv.

i).

ii).

the difference.

iii).

iv).

v).

vi).

vii).

(rejecting hypothesis H0)

viii).

level (Not rejecting null hypothesis)

Introduction:

Most of the Statistical Inference procedures we have discussed up

to this point are classified as parametric statistics.

One exception is our use of Chi-square: as a

Tests of goodness of fit and

Test of independence

These uses of chi-square come under the heading of nonparametric

statistics.

Difference between Parametric:

The obvious question now is: What is the difference?

In answer to this, let us recall the nature of the inferential procedures that

we have categorized as parametric.

In each case, our interest was focused on estimating or testing a

hypothesis about one or more population parameters. Furthermore, central

to these procedures was a knowledge of the functional form of the

population from which were drawn the samples providing the basis for the

inference.

An example for a parametric statistical test is the widely used t test. The

most common uses of this test are for testing a hypothesis about a single

population mean or the difference between two population means. One of

the assumptions underlying the valid use of this test is that the sampled

population or populations are at least approximately normally distributed.

As we will learn, the procedure that we discuss in this chapter either are

not concerned with population parameters or do not depend on knowledge

of the sampled population. Strictly speaking, only those procedures that

test hypotheses that are not statements about population parameters are

classified as nonparametric, while those that makes no assumption about

the sampled population are called distribution-free interchangeably and to

discuss the various procedures of both types under the heading of

nonparametric statistics. We will follow the convention.

nonparametric statistics.

1. They allow for the testing of hypotheses that are not statements

about population parameter values. Some of the chi-square tests of

goodness of fit and tests of independence are examples of tests

processing this advantage.

2. Nonparametric tests may be used when the form of the sampled

population is unknown.

3. Nonparametric procedures tend to be computationally easier and

consequently more quickly applied than parametric procedures. This

can be desirable feature in certain cases, but when time is not at a

premium, it merits a low priority as a criterion for choosing a

nonparametric test.

4. Nonparametric procedures may be applied when the data being

analyzed consist merely of rankings or classification. That is, the

data may not be based on a measurement scale strong enough to

allow the arithmetic operations necessary for carrying out parametric

procedures.

Although nonparametric statistics enjoy a number of advantages their

disadvantages must also be recognized.

1. The use of nonparametric procedures with the data that can be

handled with a parametric procedure results in a waste of data.

2. The application of some of the nonparametric tests may be laborious

for large samples.

1). SIGN TEST:

The familiar t test is not strictly valid for testing

(1). The null hypothesis that a population mean is equal to some particular

value, or

between pairs of measurements is equal to zero

Unless the relevant populations are at least approximately normally

distributed. When a). The normality assumption cannot be made or

b). The data at hand are ranks rather than

measurements on an interval or ratio scale,

the investigator may wish for an optional procedure. Although the t test is

known to be rather insensitive to violations of the normality assumption,

there are times when an alternative test is desirable.

A frequently used nonparametric test that does not depend on the

assumptions of the t test is the sign test. This test focuses on the

median rather than mean as a measure of central tendency or location.

The median and mean will be equal in symmetric distributions. The only

assumption underlying the test is that the distribution of the variable of

interest is continuous. This assumption rules out the use of nominal

data.

The sign test gets its name from the fact that pluses and minuses,

rather than numerical values, provide the raw data used in the calculations.

Example for Sign Test:

General appearance scores of 10 mentally retarded girls

Girl

1

2

3

4

5

Score

4

5

8

8

9

Girl

6

7

8

9

10

Score

6

10

7

6

6

population from which we assume this sample to have been drawn is

different from 5.

Solutions:

1. Data.

2. Assumptions. We assume that the measurements are taken on

a continuous variable.

3. Hypotheses.

H0: the population median is 5.

HA: The population median is not 5.

4. Level of significance: let = 0.05

5. Test statistic. The test statistic for the sign test is either the

observed number of plus signs or the observed number of minus

signs. The nature of the alternative of hypothesis determines

which of these test statistics is appropriate. In a given test, any

one of the following alternative hypotheses is possible.

HA: P (+) > P (-) one-sided alternative

HA: P (+) < P (-) one-sided alternative

HA: P (+) = P (-) two-sided alternative

If the alternative hypothesis is

HA: P (+) > P (-)

A sufficiently small number of minus signs causes rejection region of H0.

The test statistic is the number of minus signs. Similarly, if the alternative

hypothesis is

HA: P (+) < P (-)

A sufficiently small number of plus signs causes rejection region of H0. The

test statistic is the number of plus signs. If the alternative hypothesis is

HA: P (+) = P (-)

Either a sufficiently small number of plus signs or a sufficiently small

number of minus signs causes rejection of the null hypothesis H0. We may

take as the test statistic the less frequently occurring sign.

Scores above (+) and below (-) the hypothesized median based in data of

Example:

Girl Score

1

2

3

4

5

4

5

8

8

9

Score relative to

Girl

Score

Score relative to

hypothesized median 5

hypothesized median

0

+

+

+

5

+

+

+

+

+

6

7

8

9

10

6

10

7

6

6

Purpose and Uses:

Sometimes we wish to test a null hypothesis about a population mean,

but for some reason neither z nor t is an appropriate test statistic. If

we have a small sample (n < 30) from a population that is known to be

grossly nonnormally distributed, and the central limit theorem is not

applicable, the statistic is ruled out. The t statistic is not appropriate

because the sampled population does not sufficiently approximate a

normal distribution.

The sign test may be used when our data consist of a single sample or

when we have paired data. If however, the data for analysis are measured

on at least an interval scale, the sign test may be undesirable since it

would not make full use of information contained in the data. A more

appropriate procedure might be the Wilcoxon signed rank test, which

makes use of the magnitudes of the differences between measurements

the differences.

Assumptions the Wilcoxon test for location is based in the following

assumption about the data.

1. The sample is random

2. The variable is continuous.

3. The population is symmetrically distributed about its mean .

4. The measurement scale is at least interval.

Hypotheses. The following are the null hypotheses (along with their

alternatives) that may be tested about some unknown population mean 0.

(a)

When we use wilcoxon procedure we perform the following calculations.

1. Subtract the hypothesized mean 0 from each of the observation Xi,

to obtain

d X

i

i

0

calculations and reduce accordingly.

2. Rank the usable di from the smallest to the largest without regard to

the sign of di. That is, consider only absolute value of the di,

designated by

di

di

are

equal, assign each tied value the mean of the rank positions the tied

values occupy. If, for example the three smallest

di

values are

equal, place them in rank positions 1, 2 and 3 but assign each rank

of (1+2+3) / 2=2.

3. Assign each rank the sign of the di that yields that rank.

4. Find T+, the sum of the ranks with positive signs, and T-, the sum of

the ranks with negative signs.

A nonparametric procedure that may be used to test the null hypothesis

that two independent samples have been drawn from populations

with equal medians is the median test.

4). THE MANNWHITNEY U TEST

Purpose and Uses:

It is the most widely used test as an alternative to the t-test when we do

not make the t-test assumptions about the parent population. The median

test does not make full use of all the information present in the two

samples when the variable of interest is measured on at least on ordinal

scale. By reducing an observations information content to merely that of

whether or not it falls above or below the common median is a waste of

information. If, for testing the desired hypothesis, there is available a

procedure that makes use of more of the information inherent in the data,

that procedure should be used if possible. Such a nonparametric

procedure that can often be used instead of the median test is the MannWhitney test, sometimes called the Mann-Whitney-Wilcoxon test. Since

this test is based in the ranks of the observations it utilizes more

information that does the median test.

Assumptions:

1. The two samples, of size n and m, respectively, available for

analysis have been independently and randomly drawn from their

respective populations.

2. The measurement scale is at least ordinal.

3. The variable of interest is continuous.

4. If the populations differ at all, they differ only with respect to their

medians

Hypotheses:

When these assumptions are met we may test the null hypothesis

that is the two populations have equal medians against either of the three

possible alternatives.

H0 : M X MY

Vs

H A : M X MY

Two Sided

H A : M X M Y

One Sided

H A : M X MY

One Sided

Level of significance.

Let = 0.05 (or 5%)

Test statistic.

To compute the test statistic we compute the samples and rank all

observations from smallest to largest while keeping track of the sample to

which each observation belongs. Tied observations are assigned a rank

equal to the mean of the rank positions for which they are tied

The test statistic is

T S

n(n 1)

2

S = The sum of the ranks assigned to the sample

observations

Distribution of test statistic.

Critical values from the distribution of the test statistic MW are given

in table of Quantiles of the Mann-Whitney test statistic for various n, m, p

values.

Decision rule.

In general, for the two-sided situation with

H 0 : M X M Y Vs H A : M X M Y

Computed values of T that are either sufficiently large or sufficiently small

will cause rejection of H0. The decision rule for this case, then, is:

Reject H 0 : M X M Y if the computed value of T is either less than MW / 2 or

greater than MW(1 / 2), where MW / 2 is the critical value of T for n, m, and

/2 given in Quantiles of the Mann-Whitney test statistic, and MW(1 / 2) =

nm MW / 2.

For one-sided tests of the type illustrated here the decision rule is:

Reject H 0 : M X M Y if the computed T is less than MW is the critical value

of T obtained by entering Quantiles of the Mann-Whitney test statistic with

n = the number of X observations

and

= the chosen level of significance.

H 0 : M X M Y Vs

H A : M X MY

Sufficiently large values of T will cause rejection so that the decision rule

is:

Reject H 0 : M X M Y if computed value of T is greater than MW1

Where, MW1 = nm MW

Statistical Decision.

When we enter MW statistic table with n, m, and , we find the

critical value of MW.

We do not reject H0.

Conclusion.

H 0 : M X M Y Vs

If

H A : M X M Y (two-tail test)

We conclude that MX is equal to MY. This leads to the conclusion that there

is no significant difference between the X and Y.

If

H0 : M X MY

Vs

H A : M X M Y (one-tail test)

We conclude that MX is greater than MY. This leads to the conclusion that X

is more than that of Y.

p value.

If p > , we do not reject H0.

Large-Sample Approximation. When either n or m is greater than 20 we

cannot use Mann-Whitney test statistic table to obtain critical values for the

Mann-Whitney test. When this is the case we may compute

mn

nm(n m 1) 12

T

And compare the result, the significance, with critical values of the

standard normal distribution.

Example for Mann-Whitney U Test:

The following table shows the time taken for Root Canal treatment in

Conservative for a tooth. The junior student taken time and senior student

taken time are shown.

Sl.no

Junior

Student

Taken Time

Senior Student

Taken Time

1

2

3

4

5

17.4

17.6

18.0

15.3

16.8

14.4

16.5

16.5

14.1

15.9

6

7

8

9

10

14.2

13.7

16.7

14.0

14.7

17.6

We wish to know if we can conclude that senior student taking lesser time

that junior student.

Solution:

1. Data: See problem table

2. Assumptions. We presume that the assumptions of the MannWhitney U test are met.

3. Hypotheses. The null and alternative hypotheses are as follows

H0 : M X M Y

HA: MX < MY

Where,

MY = Median of a population of senior taken time

5. Test statistic. To compute the test statistic we compute the samples

and rank all observations from smallest to largest while keeping

track of the sample to which each observation belongs. Tied

observations are assigned a rank equal to the mean of the rank

positions for which they are tied. The results of the steps are shown

below.

Table: Original Data and Ranks

Sl. No.

1

2

3

Junior

Student

(X)

Rank of X

Senior

Student

(Y)

Rank of Y

13.7

14.0

14.1

1

2

3

4

5

6

7

8

14.2

15.3

14.4

14.7

5

6

15.9

8

(9+10) / 2

= 9.5

(9+10) / 2

= 9.5

11

16.5

10

16.5

11

12

13

14

16.7

16.8

17.4

17.6

15,16

18.0

Total

12

13

14

(15+16) / 2

= 15.5

S = 65.5

18.0

(15+16) / 2

= 15.5

70.5

T S

n(n 1)

2

S = The sum of the ranks assigned to the sample

observations

6. Distribution of test statistic. Critical values from the distribution of

the test statistic MW are given in table of Quantiles of the MannWhitney test statistic for various n, m, p values.

7. Decision rule. If the median of X population is, in fact, smaller than

the median of the Y population, as specified in the alternative

hypothesis, we would expect (for equal sample sizes) the sum of the

ranks assigned to the observations from the X population to be

smaller than the sum of the ranks assigned to the observations from

a way that a sufficiently small value of T will cause rejection of

H 0 : M X M Y.

In general, for one-sided tests of the type illustrated here the decision rule

is:

Reject H0: MX MY if the computed T is less than MW is the critical value

of T obtained by entering Quantiles of the Mann-Whitney test statistic with

n = the number of X observations

and

= the chosen level of significance.

H0: MX MY Vs HA: MX < MY

Sufficiently large values of T will cause rejection so that the decision rule

is:

Reject H0: MX MY if computed value of T is greater than MW1

Where, MW1 = nm MW

For the two-sided situation with

H0: MX = MY Vs HA: MX MY

Computed values of T that are either sufficiently large or sufficiently small

will cause rejection of H0. The decision rule for this case, then, is:

Reject H0: MX = MY if the computed value of T is either less than MW / 2 or

greater than MW(1 / 2), where MW / 2 is the critical value of T for n, m, and

/2 given in Quantiles of the Mann-Whitney test statistic, and MW(1 / 2) =

nm MW / 2.

For this example the decision rule is:

Reject H0, if the computed value of T is smaller than 15, the critical value of

the test statistic for n = 6, m = 10, and = 0.05 found in Quantiles of the

Mann-Whitney test statistic table.

shown in table (2), S=65.5, so that

T S

n(n 1)

6(6 1)

65.5

44.5

2

2

n = 6, m = 10, and = 0.05, we find the critical value of MW to be 15.

Since Computed T value > Critical value of the test statistic

(i.e.,)

44.5 > 15

We do not reject H0.

10. Conclusion. We conclude that MX is greater than MY. This leads to the

conclusion that junior students do not reduce the time than senior

students.

11. p value. We have for this test p = 0.0519 > 0.05 (i.e., p>), we do not

reject H0.

Purpose: Testing the randomness of a given set of observations.

Procedure

Let X1,X2,X3,.,Xn be the set of observation arranged in the order in which

they occur, Xi is the i-th observation in the outcome of an experiment. Then

for each of the observations, we see if it is above or below the median of

the observations and write M if the observation is above and B if it is below

the median value. Thus we get the sequence of As and Bs of the type say

A B B B A A B B B B AAAA B AAA....

1

7.......

(1)

Null Hypothesis:

H0: That the set of observations is random

Test statistic:

Let U= number of runs in equation (1) is a random variable.

With

n2

2

(n 2)

(n 1)

Mean(U) E (U )

Var(U)

and

SD (U ) Var (U )

Z

U E (U )

~ N(0,1) ,

SD(U )

asymptotically.. (2)

The following data is the teeth size of the patients and to test teeth size are

randomly distributed to patients. The median size of the teeth is 3.4

3.4

4.5

3.1

4.6

2.9

2.8

4.2

4.6

3.9

3.5

3.6

Solution:

Data. See the problem

Assumption. The given data is continuous variable. If the given data is

ordinal than no assumption.

Null Hypothesis:

H0: That the set of observations is random

Level of significance. Let = 0.05 or 5%

Test statistic:

i.

median) if value < 3.4 and put if value =3.4.

ii.

next run continue this process to cover all As and Bs.

3.4

4.5

3.1

4.6

2.9

2.8

4.2

4.6

3.9

3.5

3.6

A|

B|

|B

B|

..(1)

Here U = 5

n= number of observation = 11.

Mean(U) E (U )

Var(U)

SD(U )

and

(n 2) (11 2) 13

2

2

2

n (n 2)

11 (11 2)

11 9

2.7(0.9) 2.48

4 (n 1)

4 (11 1)

4 10

Var (U )

2.48 1.57

5 3.4

1.6

1.01

1.57

1.57

Z

U E (U )

~ N(0,1) , asymptotically.. (2)

SD (U )

Critical values from the distribution of the test statistic Z standard

normal distribution are given in table standard normal table for various

values.

Decision rule.

Computed values of Z Critical values of Z will cause rejection of H0.

Calculation of test statistic. For our present example we have

Z

5 3.4

1.6

1.01

1.57

1.57

= 0.05, we find the critical value of Z to be 1.64.

Since, Computed Z value < Critical value of the Z test statistic

(i.e.,)

1.01 < 1.64

We do not reject H0.

Conclusion. We conclude that the given data is random

p value. Here p = 0.8531.We have for this test p > 0.05 (i.e., p>), we do

not reject H0.

6). Kruskal-Wallis One Way Analysis Of Variance By Ranks:

Purpose:

One-way analysis of variance may be used to test the null hypothesis

that several population means are equal. When the assumptions

underlying this technique are not met, that is,

i.

When the populations from which the samples are drawn are not

normally distributed with equal variances

ii.

used to test the hypothesis of equal location parameters.

Procedure:

1) The n1,n2,n3,.,nk observations from the k samples are combined

into a single series of size n and arranged in order of magnitude

from smallest to largest. The observations are then replaced by

ranks from 1, which is assigned to the smallest observation, to n,

which is assigned to largest observation. When two or more

observations have the same value, each observation is given the

mean of the ranks for which it is tied.

2) The ranks assigned to observations in each of the k groups are

added separately to give k rank sums.

3) The test statistic

2

Where

k R

12

j

3(n 1)

n(n 1) j 1 n j

.(KW1)

nj = the number of observations on the j-th sample

n= the number of observations in all samples combined

Rj=the sum of ranks in the j-th sample

4) When there are three samples and find five or fewer observations in

each sample, the significance of the computed H is determined by

consulting in corresponding table. When there are more than five

tabulated values of 2 (chi-square) with k-1 degrees of freedom.

Example for Kruskal-Wallis one-way ANOVA test by ranks:Dental

surgery time in minutes of 13 Experimental patients

________________

Sample

I

17

II III

2

20

40

31

35

Solution:

1. Data: See the problem

2. Assumptions

a. The samples are independent random samples from their

respective populations.

b. The measurement scale employed is at least ordinal.

c. The distributions of the values in the sampled populations are

identical except for the possibility that one or more of the

populations are composed of values that tend to be larger

than those of the other populations.

3. Hypothesis:

H0: The Population centers are equal.

HA: At least one of the populations tends to exhibit larger

values than at least one of the other populations.

4. Level of significance

Let =0.01

5. Test statistic.

2

k R

12

j

H

3(n 1)

n(n 1) j 1 n j

and levels are given in the critical values of the Kruskal-Wallis test

statistic table.

6. Decision rule:

The null hypothesis will be rejected if the

Computed value of H Critical value of H.

The null hypothesis may be accepted if the

Computed value of H < Critical value of H.

7. Calculation of test statistic.

When the three samples are combined into a single series and

ranked, the table of ranks shown below.

The Data of table Replaced by ranks

____________________________

Sample

II

III

6.5

2, 3, 4, 5, 7, 8, 8, 9, 17, 20,31,

10

35, 40

13

11

6.5

11

12

is

12, 13

Note: 8 Occur two times these value takes mean of ranks (i.e) mean of 6

and 7 is (6+7)/2 = 6.5.

R1 = 9+10+13+11+12

= 55

R2 = 6.5+5+8+6.5

= 26

R3 = 1+4+3+2

= 10

3 R

12

j

3(13 1)

13(13 1) j 1 n j

12 552 26 2 10 2

13(14) 5

4

4

3(14)

42

182 5

4

4

0.0659(799) 42

52.6541 42

10.6541

9. Statistical Decision:

Kruskal-Wallis statistic table when nj are 5, 4, and 4, the critical value oh

H is 7.7604 and probability of obtaining a value of H is 0.009. The null

hypothesis can be rejected at the 0.01 level of significance. (i.e)

Computed value of H Critical value of H

Here

reaction time among the three populations.

11. p value.

study design are interrelated. Following are the suggested steps to be

followed in a study design (experiment, project, thesis, dissertation,

study, survey, etc.).

i).

Area of study is mapped out and a proper title is given which should

be precise and self explanatory.

ii).

iii).

iv).

v).

vi).

prospective, etc.).

vii).

viii).

ix).

x).

Report preparation.

REFERENCE BOOKS:

1). Wayne W. Daniel (1999) BIOSTATISTICS: A Foundation for Analysis in the

Health Sciences, John Wiley & Sons, INC, New York.

2). G N Prabhakara(2006) BIOSTATISTICS, Jaypee Brothers, New Delhi.

3). P.S.S. Sundar Rao and J. Richard (2006) Introduction To Biostatistics And

Research Methods, Prentice-Hall of India Pvt. Ltd., New Delhi.

4). Dr. Soben Peter (2004) Essentials of Preventive

And Community Dentistry, Arya (Medi) Publishing House, New Delhi.

5). Rebecca G. Knapp and M. Clinton Miller III Clinical Epidemiology and

Biostatistics, Harwal, Malvern, Pennsylvania

And Some Pure Statistics Books:

Part A

(Minimum 5 lines for each)

5 x 2 = 10 Marks

UNIT I

1. Define Bio-statistics?

2. State the uses of Bio-statistics for dental research?

3. Explain the ratio and interval scale?

4. Distinguish between qualitative data and quantitative data?

5. What are the variables and scales in data collection?

6. Distinguish between nominal scale and ordinal scale?

7. Define frequency distribution?

8. Explain the sample and population?

9. Explain the individual data and grouped data?

10. Explain the pilot survey?

11. Explain the simple random sampling. Give an example?

12. Explain the uses of pie and bar diagrams?

UNIT II

13. How do you determine the consistency of two sets of variables?

14. Explain the correlation. Give an example.

15. Explain dental practice using correlation?

16. Explain the uses of regression?

UNIT III

17. What is bias in research?

18. What are the errors in testing of hypothesis?

19. Explain the Chi-square test?

20. What is mean by small sample and large sample tests?

21. Explain the cohort study design?

UNIT IV

22. What is ANOVA? How will you use it in dental research?

23. Explain the Wilcoxon signed rank test?

24. Explain the Non-parametric tests?

25. What are the assumptions of the Non-parametric tests?

UNIT V

26. Write the utilization of dentistry research?

27. Explain the descriptive approach in research?

28. Write the importance of bibliography?

29. What is case study?

30. Write is research?

31. What is literature review?

Part B

UNIT I

1.

2.

3.

4.

5.

6.

Explain the Bio-statistics. State its applications in patient care

Explain the methods of data collection.

Explain the applications of Bio-Statistics in dentistry research?

Explain reliability and validity? How will you Asses?

Explain the applying study results in patients care?

UNIT II

7. Explain the various measures of central tendencies and illustrate these

with example

8. Explain the applications of descriptive statistics in dentistry research?

9. Determine the mean and standard deviation of each of the sets of

analytical measurements, which is of more precise?

A: 29.5 45.3

28.8

42.9

46.6

24.0

32.7

28.0

B: 35.2 34.2

33.0

35.9

33.7

38.2

33.1

34.5

10. Explain the scatter diagram. How do you infer about the scatter diagram?

11. Explain the two regression equations. With an example.

UNIT III

12. State the properties and applications of sampling distributions?

13. Explain the procedure for statistical hypothesis in dental research?

14. Two types of treatments were tried for a group of patients with bleeding

teeth disease and their outcome was measured as improvement or no

improvement.

Outcome

Details

Improvement No Improvement

New Treatment

38

Conventional

39

Check weather new method is effective?

7

17

UNIT IV

15. Explain the ANOVA. State its applications in dentistry research

16. Explain the method of Mann-Whitney U-test. State its importance.

17. Explain the method of Kruskal-Wallis One way analysis of variance by

ranks? Explain with an example?

UNIT V

19. Formation of hypothesis is key research in dentistry research-Discuss

and identify the role of the researcher towards evidence-based practice?

20. Explain the steps in thesis report writing?

21. Explain the criteria of good research?

22. How do you frame the research study? Explain the various steps in

preparing scientific report?

23. Explain the research proposal?

24. Explain the research process?

25. How do you statistical estimation helpful to achieve the research in clinical

trials?

- Hypothesis TestingTransféré parHassan Khan
- We Understand the World by Asking Questions and Searching for AnswersTransféré parshreya1010
- Finding+New+Business+Opportunity+at+DaburTransféré parahen123
- Developing ProtocolTransféré parPrahlad Reddy
- Dp Ls 722 Quantitative Data AnalysisTransféré parSyed Zulqarnain Haider
- Statistics Introduction.docxTransféré parFazal Ulbasit
- Usama Brm ReportTransféré parSajjad Hussain
- Project on Labour Welfare in Sri Aravind EnterprisesTransféré parbookboy143
- SIJMD4OCT2014Transféré parScholedge Publishing
- Ct 31415419Transféré parIJMER
- MIR - Science for Everyone - Khurgin Ya. I. - Yes, No or Maybe - 1985Transféré paravast2008
- Bản gốc bài 4Transféré parQuản Lê Đình
- Sample Survey DesignTransféré parBrick PeñaSula
- Acceptance samplingTransféré parDhan C
- 1461254_634767525787421250.pptTransféré parAmit Verma
- 1Front PageTransféré parProfesor Jorge
- Statistics Summary 675Transféré parAhsan Afzal
- ToR for a Vitamin a Coverage Survey in Selected Provinces in IndonesiaTransféré parmiindonesia
- analisis de negociosTransféré parspiplatium
- L7. SamplingTransféré parBernice Chan Wai Wun
- std11-stat-em.pdfTransféré parsubho
- Data Analysis LtTransféré parBamgbade Adewale Jibril
- Resume Applied Statistics Ch 1&2Transféré parwidya saw
- mba301_fall_2017 Solved SMU AssignmentTransféré parArvind K
- APJMMRTransféré parR.v. Naveenan
- proposal synthesis matrix analysis of literatureTransféré parapi-421024249
- Course Syllabus_bba Bbis BhtmTransféré parkhadkachakra
- SamplingTransféré pardesire
- Methodology Handout.docxTransféré parguimaras lingayen
- K00802_20190915171627_Chapter 1 new (1).pptTransféré parfarah

- Tosun Cevat - Expected nature of community participation in tourism development.pdfTransféré parGerson Godoy Riquelme
- Customer Satisfaction and Marketing Potential of Birla CementsTransféré parPrashant M Biradar
- Customer Satisfaction Survey of Maruti Udyog Ltd. VikashTransféré parvikash_singh08
- Continuous Improvement the Ten Essential Criteria - Kaye AndersonTransféré paralbchile
- Factor Influencing on Purchase Decision of Two Wheeler-Bilal LuharTransféré parVishal Patel
- Lec No. 1 - IT Project EstimationTransféré parRandall Abarca Zúñiga
- Customer Satisfaction Toward Maruti SuzukiTransféré parAkash Tamuli
- Clinical Practice Guideline on Migraine Headache Diagnosis and ManagementTransféré parOscar Reyes II
- Pillsbury Cookie ChallengeTransféré parchawlavishnu
- Green Information Technology Managerial Capabilities of IT Organizations in Sri LankaTransféré parrobert0rojer
- MARKET SURVEY ON CUSTOMERS PERCEPTION OF A TERTIARY CARE HOSPITAL IN MUMBAITransféré parvineeth19
- Meaning and Scope of Marketing ResearchTransféré parNardsdel Rivera
- Project Report (Alok Kumar Singh, Pg09006) - CopyTransféré parshambhu0000
- Research Process (Sekaran, Uma)Transféré parA.Azwar
- Differential Mortuary Treatment Among the Andean Chinchorro FishersTransféré parSamantha Madison
- Rural Consumer Attitude KhadiTransféré parutcm77
- Cheng IJEBR4.4Transféré parBella Andika
- Higher education.pdfTransféré paraoulakh
- Inequality of Opportunity and Outcomes in Asia: A Gendered PerspectiveTransféré parADBGAD
- 2580Transféré pargeethark12
- Adriaanse-Measuring Residential SatisfactionTransféré parLilian Cîrnu
- Ryerson Privacy Institute OSN ReportTransféré parSrijana Ghimire Prasai
- Employee SatisfactionTransféré parYoddhri Dikshit
- Milk.pdfTransféré parVicky Singh
- ACHTERBERG_Qualitative Methods in Nutrition Education Evaluation ResearchTransféré parfeflao1203
- 01352-MJinitiationTransféré parlosangeles
- Service Failure and RecoveryTransféré parDini Fitriastuti
- Alcohol Drug Use childrenTransféré parJuju R Shakya
- EFFECTS_OF_OVERPOPULATION_ON_THE_ACADEMI.docxTransféré parJanesa
- +A_guide_to_the_pmdpro1Transféré parvedantgoyal

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.