Vous êtes sur la page 1sur 50

ASSIGNMENT OF RESEARCH

METHODOLOGY
A PROJECT
in the subject of Research Methodology in Commerce
SUBMITTED TO

UNIVERSITY OF MUMBAI
FOR SEMESTER -IV OF

MASTER OF COMMERCE
BY.

KHAN MOHD.MOHSIN
Roll No.(10)
Specialization: Business Management
UNDER THE GUIDANCE OF

Dr Vivek Deolankar
YEAR - 2015-16

DECLARATION BY THE STUDENT

I, Shri Khan Mohd. Mohsin, student of M. Com. Part-II Roll Number (10), at the
Department of Commerce, University of Mumbai do hereby declare that the project
titled, Assignement of Research Methodology submitted by me in the subject of
Research Methodology in Commerce for Semester IV during the academic year 201516, is based on actual work carried out by me under the guidance and supervision of Dr
Vivek Deolankar
I further state that this work is original and not submitted anywhere else for any other
examination.

Date :
Mumbai

Signature of Student

EVALUATION CERTIFICATE

This is to certify that the undersigned have assessed and evaluated the project on
Assignment of Research Methodology in the subject of Research Methodology in
Commerce submitted by Kum/Smt/Shri Khan Mohd. Mohsin , student of M. Com.
Part-II at the Department of Commerce, University of Mumbai for Semester IV during
the academic year 2015-16.

This project is original to the best of our knowledge and has been accepted for Internal
Assessment.

Internal Examiner

External Examiner

Director

Dr. Vivek Deolankar

University of Mumbai
Department of Commerce
Internal Assessment: Subject: Research Methodology in Commerce
Name of Student

Class

Branch

Roll
Number

First Name : MOHD MOHSIN

Business

Fathers : MOHD MOIN

M.COM

Surname : KHAN

PART -II

Management

10

sem -IV

Topic for the Project: Assignment of Research Methodology

Marks Awarded

Signature

DOCUMENTATION
Internal Examiner
(Out of 10 Marks)
External Examiner
(Out of 10 Marks)
Presentation
(Out of 10 Marks)
Viva and Interaction
(Out of 10 Marks)
Total Marks
(Out of 40 Marks)
INDEX
Contents
Q.No.

S.No.

Particular

Page
No.

1.1
1.2
1.3

Chapter 1
Meaning of Data Processing
Significance of Data Processing
Problems in Data Processing
Chapter 2

1
2-6
6-7

2.1

Data Processing Stages

8-22

2.1.1 EDITING

8-11

2.2.1 CODING

12-13

2.3.1. CLASSIFICATION OF DATA

3.1
3.2
3.3
3.4
4.1
4.2

14

2.4.1 TABULATION

15-17

2.5.1 GRAPHIC PRESENTATION OF DATA

18-22

Chapter 3
Measure of Central Tendency
Correlation Analysis
Regression Analysis
Measure of Dispersion
Conclusion
Bibliography

23-32
33-35
36
37-39
40
41

Questionnaire of the assignment for the Research Methodology


in Commerce
Question 1 Explain in details :- meaning, significance, and problem in
data processing ?
Question 2 Enumerate in details following stages in data processing:a. Editing
b. Coding
c. Classification
d. Tabulation
e. Graphic presentation

Question 3 What is statistical analysis and discuss following tools and


technique in research methodology in commerce ?
a. Measurement of Central Tendency
b. Correlation Analysis
c. Regression Analysis
d. Dispersion Analysis

Answer 1
1.1 Meaning of Data Processing
Processing refers to subjecting the data collected to a process in which, the accuracy,
uniformity of entries and consistency of information gathered are examined. It is a
very important stage before the data is analyzed. Most commonly processing is
understood as editing, coding, classification, and tabulation of the data collected.
After collecting data, the methodology of converting raw data into meaningful
statement; includes data processing, data analysis, and data interpretation and
presentation. Data processing involves main stages such as editing, coding,
classification, tabulation, and graphic presentation of data.
Data processing is a process of skilfully organizing of data for the purpose
of data analysis and interpretation.Data processing can be done manually when
the data collected is limited or it can be done mechanically when the collected data
involve huge quantities.
Data processing is a intermediary stage between data collection and data
analysis. The completed instruments of data collection, such as interview
questionnaires, data sheets, and field notes contain a vast mass of data. The collected
data instruments are like raw materials and therefore, they cannot straightaway
provide answers to research questions. Therefore, there is a need for skillful
manipulation of data, i.e., data processing.

1.2 Significance of Data Processing


Significance of Editing
1. Accuracy of Data:
`Editing checks the accuracy of data collection. At times the respondent may
provide incorrect responses to certain questions. The editor uses his judgement to
correct the inconsistencies in responses given by some respondents. For instance, a
respondent belonging to middle class may respond that he buys a premium price
products, which is not a possibility for a middle class consumer. Therefore, the editor
may correct the wrong responses and as accuracy of data.
2. Uniformity of Responses:
Editing also helps to find out whether or not the respondents have interpreted the
questions uniformly. For instance the question may have a scale of 1 to 5 where 1 =
excellent and 5 = poor. However, some respondents may have interpreted in the other
way. In such cases, the responses are defective. Therefore, the editor checks into the
uniformity in responses of all respondents and make necessary changes.
3. Completeness of Data:
Editing ensures completeness of data. At times, the respondents may provide
responses only to certain question and not to all questions. Also, the responses to
certain questions may be incomplete, especially in the case of open ended questions.
The editor may use judgement to deduce proper answer to unanswered or incomplete
responses based on other responses.
4. Coding of Data:
Editing facilitates coding of data at the post data collecting stage. After editing
the data, the researcher assigns codes to the responses provided by the respondent,
especially in the case of open ended questions. In the case of open ended questions,
all responses are placed in different categories and each category is assigned.
For instance, the responses to a question -what are the buying motives for purchasing
a car - may include status in the society (Status - code 1), convenient mode of
traveling to family members (Convenience - code 2), claiming of depreciation in tax
returns (Depreciation - code 3), and so on, therefore, there would be at least three
codes.

Significance of Coding
1. Facilitates Classification of Data:
Coding facilities classification of data. After providing codes to various
responses, the data can be classified into various categories. The coded responses can
be classified into categories such as age, gender, educational level, income level, area
wise, occupation wise, and so on.
2. Facilities Tabulation of Data:
Since coding facilities classifications of data. It becomes easier for the researcher
to the tabulate the data. The code responses are classified into different categories,
and accordingly the data is transferred to statistical tables. The tabulated data can
then be used for analysis and interpretation.
Significance of Classification
1. Protection and Management of Data:
From the time information is collected or created and until it is destroyed, it
should be classified to ensure it is protected, stored and managed appropriately. For
instance, information may be classified as public information, information for
internal use only, and confidential/ restricted information. The information needs to
be protected, stored and managed properly. The public information can be provided
to anyone - insiders and outsiders, the information that is classified as internal use
only should not be provided to outsiders, and the information that is classified as
internal use only should not be provided to outsiders, and the information that is
classified as confidential may be restricted only to top authorities in the
organization.
2. Facilities Tabulation of Data:
Coding and classification facilities tabulation of data. In fact coding is
considered as an important element of classification. The researcher assigns codes to
responses either at the pre-data collection stage or at the post data collection stage,
and accordingly the responses are classified into different categories. The classified
information is tabulated for proper analysis and interpretation.

3. Facilities Speedy Searches of Data:


The classified data is easy to locate and to retrieve. The classified data helps in
speeding up data searches. For instance, the classified data can help an organization
to retrieve quickly certain data which may be required for legal and regulatory
requirements within a set time-frame.
4. Grouping of Data:
Classification facilities grouping of data into different categories. The collected
data can be grouped in different categories such as age, gender, education, etc. Each
of the categories can be further sub-classified.
Significance of Tabulation:
1. Analysis and Interpretation of Data:
Tabulation helps to arrange the classified data into statistical tables. The
statistical tables facilities analysis and interpretation of data. The tabulated data can
facilities comparative analysis of two or more variables - such as different age,
groups, income groups, different states, different period, and so on. For instance, the
tabulated data can be analyzed in terms of buying patterns of different age groups,
income groups, and so on.
2. Basis for Writing Research Report:
Tabulation provides the basis for writing research report. Based on tabulated
data, the research can be analyse and interpret the data. The analyzed data in turn
enables research to conclusions and accordingly make recommendations. The
statistical tables, conclusions and recommendations form an important part or
research report.
3. Correlation between Variables:
Tabulation facilities correlation between two or more variables. For Example,
data on income and saving habits placed in the same tables help to draw certain
conclusion about income levels and saving habits of certain classes of people.
Generally, higher the income, higher would be the savings, but such a general
assumption may differ among different age groups and in different areas.
4. Detects Errors in Coding and Classification:
Tabulation may help to find out errors in coding and classifications of data. For

instance, certain coded and classified data may not fit in the statistical tables.
Therefore, this may require changes in coding and classification of data.
5. Ease in Understanding of Data:
Tabulation helps the researchers to determine and communicate the findings in a
form which cane be easily understood by others. For instance, the tabulated data may
indicate high literacy in one state as compared to another. Therefore, one can easily
understand that the former state is more literate than the latter.
6. Facilities Location of Specific Data:
Tabulation helps to locate specific data required by the researcher. For example,
census data provides a wealth of geographic and demographic data, but a researcher
might need only certain segments of the data from certain locations. This specific
data about certain segments from certain locations can be easily identified from the
statistical tables - with reference to density of populations, gender ratio, life
expectancy, etc.
7. Supports Written Matter:
Statistical tables supports written matter. The written gets more important due to
statistical tables. This may be because; statistical tables gives a good feel of the
written matter, and also easy to understand.
Significance of Graphic Presentation
1.Quick Communication:
The graphs and charts can communicate the information at a glance. It does not
take much time to read and understand the message. One can easily understand the
data presented in the bar charts or pie diagrams, graphs and so on. For instance, a
graph can indicate at a glance the trends in slaves over a period of time, either
increasing or decreasing or showing a mixed trend.
2. Effective appeal:
The graphic presentation may have an effective appeal to the readers. For
instance, the pie diagrams, bar charts, graphs, etc, can be illustrated with the helps of
effective colours. In the graphs or charts easily attract attention and may create a
good impact on the mind of the readers with special reference to understanding the
data.

3. Condenses Large Volumes of Data:


Graphs and charts condense large amount of information into easy-tounderstand formats. The graphs or charts can be expressed in terms of frequencies,
percentages, or some other variables. For instance, a pie diagrams can condense the
sake data of various brands of a firm in the form of percentage of sales of each brand
in the total sales basket of a firm.
4. Educative Values:
Graphs provide educative value to the audience. For instance, graphs and charts
an be used in training sessions. It is more visually appealing to show a colorful graph
then to explain with hundreds of pages of raw data.

1.3 Problems in Data Processing


1.

Editor Bias :
Data processing may get affected due to editor bias. For instance, there may be
inconsistency in the responses given by the respondent or some of the responses
may be complete. In such a situation, the editor may edit or complete the responses
in a biased manner.

2.

Problems of Uniformity in Editing :


Editing may get affected specially when there are two or more editors,
especially in a case of large research studies which involve a huge volumes of data.
Each editor may edit the data differently. However, the problem of uniformity can be
solve by giving proper guidelines for consistency in editing.

3.

Problem of Accuracy of data :


The respondents may not give accurate responses. Therefore, the editing of data
becomes difficult. However, this problem can be sorted out by the editor must look
for inconsistent responses and accordingly by using judgement can correct the
inconsistent and inaccurate responses.

4.

Problem of Completeness of Data :


At times, the may provide incomplete responses or they may provide responses
to few questions and keep blank for other questions. To solve this problem, the
researcher must check whether or not the responses are obtained for all questions
from the respondents.

5.

Problems of Outdated Data :


Processing of data gets affected when the research staff collects outdated data.
Therefore, the editor may discard the outdated data at the time of editing. Reliable
and up-to-data data would enable the researcher to analyze the data properly and
accordingly draw proper conclusions.

6.

Problem of Exclusive Categories :


Classification of data becomes difficult when there are categories which are not
mutually exclusive. For proper classification of data, each category must be mutually
exclusive. This, means a specific response must be classified only once in one
category only. The problem arises when a particular respondent comes under two or
more categories, and then it becomes difficult to classify the response into a
particular category.

7.

Problem of Appropriate Category :


The research staff may collected the data from different categories of
respondents. At times, the data may be collected from inappropriate categories. But
the data processing staff must not consider the appropriate categories.

8.

Problem of Tabulation :
Data processing becomes difficult if data is not tabulated properly. When there is
large number of tables, and lot of data to be tabulated., there is a possibility of errors.
For instance, the figure of Rs1,00,000 may be tabulated under the column of Rs
10,000 and therefore, faulty tabulation may lead to faulty analysis.

9.

Problem of Misunderstanding Graphic Presentation :


The data presented in the graphs may be analyzed by the researcher. For
instance, the data may be wrongly recorded in the graph or pie charts. Therefore, the
data analysis will be go wrong. Also, the readers may get the wrong picture of the
situation. At times, the colours used in the diagrams may have a different meaning to
different readers.

10. Problem of Coding :


At times, there may be problems in assigning codes to the responses. Faculty
coding will lead to faulty classification of data, and as result, the data tabulation
would show a different picture faculty tabulation in turn will lead to fully analysis of

data.

Answer 2
2.1 DATA PROCESSING STAGES
The various stages in Data Processing Stages are as follows :
STAGES OF DATA PROCESSING

EDITING

CODING

CLASSIFICATION

TABULATION

GRAPHIC PRESENTATION
2.1.1 EDITING
Editing is the process of examining errors and omissions in the collected data and making
necessary corrections in the same. This desirable when there is some inconsistency in the
response or responses as entered in the questionnaire or when it contains only a partial or
a vague answer.
When data collected through schedule and questionnaire there is a chance to lie
incompleteness, inaccuracy, inconsistency and absence of uniformity in the answers.
Editing is a first stage in data processing. It is a process which looks for as possible. If
error is left undetected at this stage, the research would not serve its purpose. Editing
8

ensure the complements, reliability and consistency of the data. It is a routine task of
checking the filled schedule and questionnaire.
Editing is a process of checking errors and omissions in data collection, and making
corrections, if required. Editing is required when :
There is inconsistency in responses given by the respondents.
Respondents may provide incorrect or false responses.
Some vague/incomplete answers given by the respondents.
No responses are provided by the respondents for certain questions.
There are following example of editing are as follows: The respondents has given answers which are in consistent with each other. In such a
case, the editor has to change one of the answers so as to make it consistent with the
other one, which can be suitably changed.
The respondent has marked two answers instead of one for a particular question. In
such a case, the editor has to carefully examine which of the two answer would be
more accurate. Sometimes, when a decision cannot be made categorically, he may
prefer to code no information for that question.
The respondent has answered a question by checking one of the many possible
categories contained in the questionnaire. In addition, the respondent has written
some remarks in the margin. These remarks do not go well with the particular
category marked by the respondent.

Sometimes the questionnaire contains imaginary and factious data. This may be due
to cheating by the interview who may fill in the entries in the questionnaire without
actually interviewing the respondent. This may also happen in case of mail
questionnaire, where the respondent has given an arbitrary answer without exercising
any care. The editor has to exercise his judgment in this regard.

Another type of editing is central editing, which is undertaken after the questionnaires
have been received at the headquarters. As far as possible, a single editor should carry out
this task so that consistency in editing can be ensured. However, in the case of large
studies, this may not be physically possible. When two or more editors are entrusted with
the task of editing, it is necessary that they are given uniform guidelines so that maximum
9

possible consistency in their approaches can be attained.


An editor should be well-versed with the editing of questionnaire. It may be emphasized
that editing a manuscript is different from the editing of a questionnaire or numeric data.
People who are good at editing descriptive materials may not be able to edit numeric data
satisfactorily. Persons with long experience and having special aptitude for editing of data
should be given preference over others.
2.1.2 Essentials of Editing
Accuracy Editor must look for accurate responses. Sometimes, interviewee may give
wrong responses. Therefore, interview must be given proper training to obtain
accurate responses. The editor must use judgement to correct incorrect responses.
Avoid Bias The editor must avoid bias. For instance, if the editor is positively
inclined towards advertising as a tool of promotion, he may edit the answers in favor
of advertising, even though the responses given by respondents may be otherwise.
Consistency If there is more than one editor, they must follow the same pattern of
editing. For instance, if there are incomplete responses, all the editors may follow the
same norm for considering the incomplete responses as nil response or all the editors
may complete responses by using their judgement.
Completeness The questionnaire must be checked to find out whether or not the
respondents have answered all the question. The unanswered responses based on
other completed responses.
Training to editors Training not only helps to improve the skill of the editors for
effective editing, but it also develops a positive attitude of the editor towards editing
and the organisation.
Reliability Data collection must be reliable. Editor must consider only the up-todate data and discard the outdated data at the time of editing.
Uniformity Editors needs to check whether or not each respondent has interpreted
the questions may have a scale of 1 to 5, where 1= excellent and 5 = poor. But the
respondent may interpret the scale the other way.
Economical Editing must be economical. The time, money and effort involved in
editing must bring good returns to the researcher. As far as possible, the researcher
10

must keep a check on the costs incurred on editing.


2.1.3 Types of Editing
a.

Field Editing
Editing undertaken at the time of field survey is called as field editing. At the time
of interview, the interviewer may use several abbreviations due to constraint. These
abbreviations need to be spell out fully, at the time of processing of data.

b. Central Editing
Editing done at the central office is called central editing. A single editor should carry
out this task so that consistency in editing can be ensured. But in the case of large
studies, two or more editors can handle the task. Sometime, the entries questionnaire
may be divided in two parts, and each part can be edited by separate editor.

11

2.2.1 CODING
Coding is the procedure of classifying the answers to a question into meaningful
categories. The symbols used to indicate these categories are called codes. Coding is
necessary to carry out the subsequent operations of tabulating and analyzing data. If
coding is not done, it will not be possible to reduce a large number of heterogeneous
responses into meaningful categories with the result that the analysis of data would be
weak and ineffective, and without proper focus.
Coding involves two steps. The first step is to specify the different categories or classes
into which the responses are to be classified. The second step is to allocate individual
answers to different categories.
Coding facilitates proper tabulation and analysis of data. One of the most important
points in this respect is that the categories must be all inclusive and mutually exclusively.
The all-inclusive and none. The other aspect is that categories must be mutually
exclusively i.e., they must not be overlapping and ambiguous. To give an example, a
person may, by occupation, be an industrial worker as well as unemployed. Here, two
concepts are dimensions have been used. The first is the occupational category and the
second is the current employment mutually exclusive. It would, therefore, be advisable to
use two category-set, one for the occupations and the other for the current employment
status.
The problem of coding is not so simple, especially in respect of an open-ended question.
He response to such a question is in a descriptive form, in the word of respondent
himself. For example, the respondent may be asked: what is your opinion regarding the
prohibition policy of the government? The respondent may give a lengthy answer
indicating what he feels about this policy. In case of such responses to an open-ended
question is to be included. He may first take down the entire response and then decide the
category in which it should be included.

12

2.2.1 Types of Codes


Numerical Codes such as Code 1, Code 2, Code 3, and so on.
Alphabetical Codes such Code A, Code B, Code C and so on.
Alpha-Numerical Codes such as Code A1, A2, A3, B1, B2, B3, etc.
Coding can be considered as an element of classification. For example; the researchers
may conduct a study in TV Viewership. The categories may be Males (Code M) and
Females (Code F).
2.2.2 Steps of Coding
Specify the Categories : The researchers or data processor must specify the
categories into which the responses can be classified. For instance, the categories
may include Age, Gender , Education, Income etc.
Allocate Individuals Codes : The researcher must allocate individuals codes to each
category of responses. For example : Males in the four age groups my be allocated
codes as follows : M1, M2, M3, and M4.

13

2.3.1. CLASSIFICATION OF DATA


It is the principles of grouping of collected data into different categories. Therefore
coding is a element of classification. The classification can be according to different
categories : Age Group wise, Gender wise Educational level wise Income Group Wise
Occupation wise etc.
Each of the categories can be further divided into sub-groups - For example: the age
group can be further divided into different categories such as : Children, Teenagers,
Young Adults, Middle Aged, and Senior Citizens.
2.3.2 Principles of Coding / Classification
1.

Mutually Exclusive :
The categories must be mutually exclusive. A specific case or response must be
classified only once in one category only. For instance, on the basis of occupation,
one may place the response of a particular respondent in a definite pre-determined
category. But the problem may arise, if the respondent belongs to two categories.

2.

Appropriateness :
The classification/coding must be appropriate to the research work. For instance, a
researcher studying brand loyalty of readymade governments may classify the
population in certain groups appropriate to the survey. The senior citizens and the
kids may be ignored as they are not much loyal to the brands as far as readymade
garments are concerned.

3.

Exhaustive :
The classification must be exhaustive in nature. There must be a separate category
where the responses can be fitted or placed. The respondents must belong to a certain
category. For instance, if classification is based on students then there must be a
category for every class students. Therefore, there must be several classification. But
if there are too many groups, the researcher may include the isolated groups under
single category called as General Category.

14

2.4.1. TABULATION OF DATA


It refers to transferring the classified data in a tabular format for the purpose of analysis
and interpretation. It involves sorting of data into different categories and counting the
number of responses that belong to each category.
Variables in the Tabulation
Uni variate: The tabulation can be uni variate wherein only one variable is involved
in tabulation such as Boys. For example 15 out 20 boys - 75% have responded
positively to a particular question.
Multi-variate: When two or more variables are involved in tabulating the data, it is
called as multivariate tabulation. For example, male and age groups - 6males out of
10 (60%) in the age group of 13 to 19 years responded positively to a particular
question, and 5 males out of 10 (50%) in the age group of 20 to 39 have responded
positively to a particular question.
The most common method is by frequency distribution and an average or % age, i.e. As
follows :As a member of club

Tabular Representation

%age

Number
124

11.1

2) I would probably enroll

211

18.9

3) I am not sure

204

18.3

4) I would not probably

200

17.9

5) No, i would not

1115

100.0

6) Yes i would

124

11.1

7) I would probably

211

18.9

8) Uncertain uninterested

780
115

69.0
100.00

1) Yes i would enroll

15

2.4.2 Guidelines / Principles of Tabulation


Title :
Each and every statistical table needs to have a clear, and suitable title. For instance,
a report on the causes of decline in sales - will have a title Cause of Decline in
Sales of Product ABC in the Year 2014.
Units of Measurement
The units of measurement under each heading or sub-heading must be always
indicated for clear and better understanding. The measurement may be in terms of
amounts such as Rupees, Dollars, and so on, Volumes such as Kilos, Quintals,
tonnes, etc. percentages, or some other units.
Numbering of Tables :
Every table needs to have a distinct number to facilitate easy reference. The
numbering of tables is specially required when there is large number of tables.
Explanatory Footnotes :
Explanatory foot notes relating to the tables should be below the table along with
reference symbols used in the table.
Source of Data :
The table needs to indicate the source of data. For instance, if the data is taken from
Economic Survey of India -2014-15, the table should indicate the source.
Approximation of Data :
Generally, it is better to approximation of data before tabulating the same. Data
figures can be briefed.
Column Heading :
The column heading and the row heading of the table should be clear and brief. For
instance, the literacy rate in urban areas can be shown in the tables as Urban Literacy
Rate.
Row Stubs :
The row stubs in the first column of the table, should identify the data presented in
each row of the table.
Numbering of Column :
16

When there are several column in the tables, then each column must be serially
numbered. Numbering of column facilitates easy reference.
Placing of column :
The column whose data are to be compared should placed side by side. Such
placement facilitates proper comparison.
Separation of column :
The columns must be separated by lines which made the table easily readable and
attractive. Thick lines must be placed to separate two unrelated columns.
Alignment of Data :
It is important that all the figures in a column should be suitably aligned. Positive
and negative signs must also be in perfect alignment.
Displaying of Data :
Display your data either by chronological order for time series or by using some
standard classifications. For longer time series it may be more appropriate to use the
reverse chronological order in some cases, such as for monthly unemployment.
No Empty Data Cells :
Do not leave any data cell empty. Missing values should be identified as not
available or not applicable. The abbreviation NA can apply to either (not
available or not applicable), so it needs to be defined in the footnote.

17

2.5.1 GRAPHIC PRESENTATION OF DATA


The research data needs to be presented effectively for quick and clear understanding.
Bar graphs, pie charts, line graphs, histograms and other pictorial devices are an excellent
means to present the data.
Pie Chart :
A pie chart is a circular chart used to compare parts of the whole. It is divided into a
sectors that are equal in size to the quantity represented. For instance, a pie chart is
divided into different parts to indicate percentage sales/profits/ market share, etc., of
various brands of a company during a particular period. This reader can be understand at
a glance the relationship between various parts of a pie-chart.

18

Bar Graphs :
A bar Graphs or bar charts is a chart with rectangular bars with lengths proportional
to the values that they represent. The bars can be plotted vertically or horizontally. A
bar chart is very useful for recording discrete data.
A bar graph is a chart that uses either horizontal or vertical bars to show comparisons
among categories.one axis of the chart shows the specific categories being compared,
and the other axis represents a discrete value.
Stacked bar graphs present the information in the same sequences on each bar. The
stacked bar can have two or more parts. For instance, the following diagram shows
stacked bar graph.
The following table shows subject wise distribution in three colleges :

8
0
7
6
0
5
E
c
o
n
m
i
c
s
4
0
M
a
g
e
n
t
A
u
t
a
y
3
0
2
S
a
l
e
R
e
v
n
u
e
1
0
o
le
gA
C
o
le
g
B
C
o
l
e
g
C
0C
P
e
rio
d

Accountancy

College A
300

Number of Students
College B
250

Management

200

250

100

Economics
Total

250
750

150
650

250
500

S
a
l
e
s
R
e
v
n
u
e
N
o
(
R
.
s
f
C
S
r
t
o
u
d
e
)
n
t
s
J
a
n
M
a
r
A
rJu
p
ilyl--JS
u
n
e
e
p
t
O
ct-D
e
c

Subject

Line

19

College C
150

Graphs :
A line graph shows information that is connected in some way. A line chart or line
graph is a type of chart which displays information as a series of data points called
markers connected by straight line series of data points called markers connected by
straight line segments. It is a basic type of chart common in many fields.
Line charts show how a particular data changes at equal intervals of time. A line
charts is often used to visualize a trend in data over intervals of time a time series
thus the line is often drawn chronologically.

Gantt Charts :
A Gantt charts is a type of bar chart, developed by Henry Gantt in the 1910s, that
illustrates a project schedule. For instance, a gantt chart may consist of two

1
2
0
8
0
A
s
i
a
6
E
u
r
o
p
e
m
c
a
4
0
2
0Ja
nF
e
bM
a
rA
p
r

horizontal or vertical bars for each period of time/activity. One bar indicates the

planned/anticipated performance, and the other bar indicates the actual performance.

Histograms :

A Histograms is a special kind of bar graph where the intervals are equal. In
statistics, a histogram is a graphical representation of the distribution of data. It is an

20

estimate of the probability distribution of data. It is an estimate of the probability


distribution of a continuous variable and was first introduced by Karl Pearson.

2.5.2 Guidelines fro Graphic Presentation


If there is more than one curve or bar, they should be clearly differentiated from one
another by distinct pattern or colours.
Generally, Horizontal axis indicate the discrete values such as scores, height, amount,
etc.
The vertical axis indicate the frequency or number of students, male female etc for
particular discrete values.
Numerical data upon which the graph or chart is based should be presented in an
accompanying table.
The graph or chart should have a clear, concise and simple tittle. The tittle should
describe the nature of the data presented.
The measurement variable should be placed from left to right on the horizontal line

21

generally from bottom to the top on the vertical line.


Too many graphic forms detract rather than illustrate the presentation.
The graphs or chart must follow and not precede the general text matter.
The researcher needs to define the target audience. Depending on the target audience,
the graphs or charts need to be used.

Answer 3
3.1 MEASURE OF CENTRAL TENDENCY
Meaning
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central
tendency are sometimes called measures of central location. They are also classed as
summary statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the median
and the mode.

22

There are the following definitions are :- According to George Simpson and Fritz
Kafka state that A measure of central tendency is a typical value around which the
other values congregate.
According to Jeff Clark states Average is an attempt to find one single figure to describe
the whole range of figures in a given series.
Characteristics of Measure of central Tendency
It should be simple to calculate and easy to understand.
It should be rigidly defined.
It should be based on all the observation.
It should not be affected by extreme items.
It should be capable of further algebraic treatment.
It should have sampling stability.
It can be easily calculated in the case of distributions containing open end classintervals.
It should be in the form of a mathematical formula.

3.1.2 TYPE OF MEASURE OF CENTRAL TENDENCY (AVERAGE)


MEASURE OF CENTRAL TENDENCY

MATHEMATICAL

POSITIONAL

Arithmetic Mean

* Meadian

Geometric Mean

* Mode

Harmonic Mean
23

1. MEAN (ARITHMETIC)
The mean (or average) is the most popular and well known measure of central tendency.
It can be used with both discrete and continuous data, although its use is most often with
continuous data. The mean is equal to the sum of all the values in the data set divided by
the number of values in the data set. So, if we have n values in a data set and they have
values x1, x2, ..., xn, the sample mean, usually (pronounced x bar).
Arithmetic mean is calculated with the following formula :
Arithmetic mean = Sum of values of all items / Total number of items

24

(A)

25

Individual Distribution ( Ungrouped Data )


This formula is usually written in a slightly different manner using the Greek capital
letter,, pronounced "sigma", which means "sum of...":
Example calculate the mean
wage from the data of daily
wages of 6 workers in the
factory - Rs 50, 60, 70, 80, 90, 100
Solution :
= 50+60+70+80+90+100
6
= 450 = 75
6
Therefore, mean wage is Rs 75.
(B) Discrete Distribution
In discrete distribution the arithmetic mean can be calculated by using the following
formula :
Fx = Frequency of an item
x= Value of an item
N= Number of items
Sigma fx = sum of the frequency of value of the items.
(C) Continuous series
In the continuous series arithmetic mean can be computed by applying any of the
following methods :

Direct Method

26

f stands for frequency of a variable


x stands for the value of a variable

27

28

Sigma stands for the sum of.


Short Cut Method
A = assumed mean (any x)
f = Frequency of each class
d = Deviation of midpoint, i.e. x - A
Sigmaf = sum of frequency
Step Deviation Method
A = Assumed mean (any x)
d1 = d/C
d = deviation of midpoint, i.e, x from A (assumed mean)
C = Common factor which is class internal
f = frequency of each class
In continuous series, the exact value of the items are not known.

Merits and Demerits of arithmetic Mean


MERITS
The arithmetic mean is easy to calculate and easy to understand.
The arithmetic mean is rigidly defined, so the same answer is obtained by using any
method.
It is based on all the observations or items of the series.
Further mathematical treatment is possible in the case of arithmetic mean.
There is greater stability in arithmetic mean. It is not affected by fluctuations of
sampling.
There arithmetic mean can be effectively used as good basis for comparison.

29

There is no need to arrange the values of items in ascending or descending order to


calculate the arithmetic mean.
If the number of items and their averages are known, the sum of the values of those
items can be directly obtained.
DEMERITS
The arithmetic mean give absurd result.
It may be difficult to compute arithmetic mean when the value of some value are not
known.
The arithmetic mean is affected by extreme values, which are either too big or too
small. Therefore, arithmetic mean should not be calculated when there is one or two
extreme values in the series.
The mean value may not be found in the data, because mean is a mere average of the
total values of all items in the data.
In case of open end class intervals, the arithmetic mean cannot be computed, unless
some assumption about size of class intervals is made.
The arithmetic mean has limited application as quite often, it gives irrelevant results
mainly on account of extreme values.

2. DEFINITION OF 'GEOMETRIC MEAN'


The average of a set of products, the calculation of which is commonly used to determine
the performance results of an investment or portfolio. Technically defined as "the 'n'th
root product of 'n' numbers", the formula for calculating geometric mean is most easily
written as:
Where 'n'
represents the
number of returns
in the series.
The geometric
30

mean must be used when working with percentages (which are derived from values),
whereas the standard arithmetic mean will work with the values themselves.
Merits of Geometric Mean
It is rigidly defined and it is based on all the observations.
It is less affected by extreme values.
It is useful to obtain averages of percentages and ratios.
It is capable of further algebraic treatment.
It is least affected by the fluctuations of sampling.
It can be used to average rates of changes and to construct index numbers.
Demerits of Geometric Mean
It is difficult to understand and still, more difficult to compute.
It can be a value which does not exist in the series.
It brings out the difference in the value ratio of change and not of absolute difference.
It gives more weight-age to smaller items as compared to larger items.
Its value can not be obtained when there are some negative values or some of them
are zero.
3. HARMONIC MEAN
Harmonic mean of a series is the reciprocal of the arithmetic average of the reciprocal of
the values of its various items.
H.M. =

(1/x) + (1/x) + .......... (1/xn)


This formula is used in the series of individual observations.
Merits of Harmonic Mean
It is rigidly defined
It is based on all observations.
It is capable of further algebraic treatment.
It is not affected by fluctuations in sampling.
It is measures relative changes and is extremely useful in averaging certain types of

31

ratios and rates.


Demerits of Harmonic Mean
It is difficult to understand and interpret.
It is only a summary figure and may not be the actual item in the series.
It cannot be taken as a true representative of the statistical series.

4. MEDIAN
Median is the middle value of a series when the data of a series is arranged in ascending
or descending order. It divides the series in two equal parts.
Calculation of Median
(A) Individual distribution
Formula
n = Number
of items
The same
formula is to
be used for even and odd number of items
(B) Discrete Series

32

n = number
of items
obtained by
taking
cumulative frequency.
(C) Continuous Distribution

5.

MODE
The mode is defined as the value of a variable which occurs most frequently. It is the
value which is repeated maximum number of items or with the highest frequency in the
series.
Croxton and cowden define The mode of a distribution is the value at the point
around which the items tend to be most heavily concentrated.
A.M. Tuttle defines Mode is the value which has the highest frequency density in its
immediate neighborhood.
A) Individual Numbers
Example: Marks obtained in a test by 10 Students - 20, 15, 14, 20, 16, 12, 18, 13, 19, 20.
Solution
(Arrangement in ascending order)

33

12, 13, 14, 15, 16, 18, 19, 20, 20, 20


In the above series, 20 is the repeated number (3 times). Therefore the mode = 20.
B) Discrete series
The value which has higest frequency is the mode.
C) Continuous Distribution

Merits

and

Demerits
Mode is to understand and easy to calculate from given series of items.
Mode is the most typical or representative value.
Mode is not affected by extreme values in the data; as it considers the value that is
frequently neighborhood of point of concentration is known.
Mode can be calculated in open-end class intervals or in those cases where the
neighborhood of point of concentration is known.
To calculate mode, there is no need to know the value of all items of the series.
The mode can be identified from mere inspection of the values in the series and there
is no need for calculation.
Mode can be determined graphically from a histogram.

Demerits
Mode is not rigidly define. A distribution may be bi-modal and multi-modal.
There is greater instability in mode. It is affected by sampling fluctuations.
34

It is not capable of further mathematical treatment.


The scope of mode is limited, especially in the case small samples when none of the
variable repeat.
Mode is poor measure of central tendency .

3.2.1. CORRELATION ANALYSIS


CORRELATION
Correlation and regression analysis are related in the sense that both deal with
relationships among variables. The correlation coefficient is a measure of linear
association between two variables. Values of the correlation coefficient are always
between -1 and +1. A correlation coefficient of +1 indicates that two variables are
perfectly related in a positive linear sense, a correlation coefficient of -1 indicates that
two variables are perfectly related in a negative linear sense, and a correlation coefficient
of 0 indicates that there is no linear relationship between the two variables. For simple
linear regression, the sample correlation coefficient is the square root of the coefficient of
determination, with the sign of the correlation coefficient being the same as the sign of
b1, the coefficient of x1 in the estimated regression equation.
Neither regression nor correlation analyses can be interpreted as establishing cause-andeffect relationships. They can indicate only how or to what extent variables are associated
with each other. The correlation coefficient measures only the degree of linear association

35

between two variables. Any conclusions about a cause-and-effect relationship must be


based on the judgment of the analyst.
In statistics, dependence is any statistical relationship between two random variables or
two sets of data. Correlation refers to any of a broad class of statistical relationships
involving dependence. The most common of these is the Pearson correlation coefficient,
which is sensitive only to a linear relationship between two variables (which may exist
even if one is a nonlinear function of the other). Other correlation coefficients have been
developed to be more robust than the Pearson correlation that is, more sensitive to
nonlinear relationships.Mutual information can also be applied to measure dependence
between two variables.

Method of Coefficient of correlation


Pearsons product moment coefficient
The most familiar measure of dependence between two quantities is the Pearson productmoment correlation coefficient, or "Pearson's correlation coefficient", commonly called
simply "the correlation coefficient".
The population correlation coefficient X,Y between two random
variables X and Y with expected values X and Y and standard deviations X and Y is
defined as:

The Pearson correlation is defined only if both of the standard deviations are finite and
nonzero. It is a corollary of the CauchySchwarz inequality that the correlation cannot
exceed 1 in absolute value. The correlation coefficient is symmetric:
corr(X,Y) = corr(Y,X).

36

The Pearson correlation is +1 in the case of a perfect direct (increasing) linear


relationship (correlation), 1 in the case of a perfect decreasing (inverse) linear
relationship (anticorrelation), and some value between 1 and 1 in all other cases,
indicating the degree of linear dependence between the variables. As it approaches zero
there is less of a relationship (closer to uncorrelated). The closer the coefficient is to
either 1 or 1, the stronger the correlation between the variables.
If we have a series of n measurements of X and Y written as xi and yi where i = 1,
2, ..., n, then the sample correlation coefficient can be used to estimate the population
Pearson correlation r between X and Y. The sample correlation coefficient is written as:

If x
and y are results of measurements that contain measurement error, the realistic limits on
the correlation coefficient are not 1 to +1 but a smaller range.[6]
For the case of a linear model with a single independent variable, the coefficient of
determination (R squared) is the square of r, Pearson's product-moment coefficient .
Rank Correlation
Rank correlation coefficients, such as Spearman's rank correlation
coefficient and Kendall's rank correlation coefficient () measure the extent to which, as
one variable increases, the other variable tends to increase, without requiring that increase
to be represented by a linear relationship. If, as the one variable increases, the
other decreases, the rank correlation coefficients will be negative. However, this view has
little mathematical basis, as rank correlation coefficients measure a different type of
relationship than the Pearson product-moment correlation coefficient, and are best seen as
measures of a different type of association, rather than as alternative measure of the
population correlation coefficient.To illustrate the nature of rank correlation, and its
difference from linear correlation, consider the following four pairs of numbers (x, y):
(0, 1), (10, 100), (101, 500), (102, 2000).

37

As we go from each pair to the next pair x increases, and so does y. This relationship is
perfect, in the sense that an increase in x is always accompanied by an increase in y. This
means that we have a perfect rank correlation, and both Spearman's and Kendall's
correlation coefficients are 1, whereas in this example Pearson product-moment
correlation coefficient is 0.7544, indicating that the points are far from lying on a straight
line. In the same way if y always decreases when x increases, the rank correlation
coefficients will be 1, while the Pearson product-moment correlation coefficient may or
may not be close to 1, depending on how close the points are to a straight line. Although
in the extreme cases of perfect rank correlation the two coefficients are both equal (being
both +1 or both 1) this is not in general so, and values of the two coefficients cannot
meaningfully be compared. For example, for the three pairs (1, 1) (2, 3) (3, 2) Spearman's
coefficient is 1/2, while Kendall's coefficient is 1/3.
3.3.1 REGRESSOIN ANALYSIS
Regression analysis involves identifying the relationship between a dependent variable
and one or more independent variables. A model of the relationship is hypothesized, and
estimates of the parameter values are used to develop an estimated regression equation.
Various tests are then employed to determine if the model is satisfactory. If the model is
deemed satisfactory, the estimated regression equation can be used to predict the value of
the dependent variable given values for the independent variables.
TYPES OF REGRESSION MODEL
SIMPLE AND MULTIPLE
In simple linear regression, the model used to describe the relationship between
a single dependent variable y and a single independent variable x is y = a0 + a1x + k.
a0and a1 are referred to as the model parameters, and is a probabilistic error term that
accounts for the variability in y that cannot be explained by the linear relationship with x.
If the error term were not present, the model would be deterministic; in that case,
knowledge of the value of x would be sufficient to determine the value of y.

38

TOTAL AND PARTIAL


In the case of total relationship all the important variables are considered. Normally, they
take the form of a multiple relationship because most economic and business phenomena
are affected by a multiplicity of causes. In the case of partial relationships one or more
variables are considered, but not all, thus excluding the influence of those not found
relevant for a given purpose.
Linear and Non-linear
If this path is a straight line, we derive liner regression. The equation of straight line. It
may be noted that in case of linear by a constant absolute amount for a unit change in the
value of the dependent variable.
The regression is termed as non-linear if the curve of regression is not a straight line. In
which case the regression equation will be function involving terms of the higher order of
the type x2 x3.
3.4.1. DISPERSION ANALYSIS
According to George Simpson and F. Kafka. An average does not tell the full story. It
Is hardly representative of a mass data, unless we know the manner in which the
individual items are scattered around it.
In statistics measure of dispersion describe how the data varies or dispersed (spread out).
The two most commonly used measures of dispersion are the range and the standard
deviation. Rather than showing how the data ares similar they sow how data differs (its
variation.
3.4.2. Objectives of measure of dispersion
To judge the reliability of the measure of central tendency.
To compare two or more series with regard to its comparability.
To control the variability itself.
To facilitate the use of other statistical measure.

MEASURE OF DISPERSION
39

Relative Measures

Absolute Measures

Range Quartile Deviation Mean Deviation Standard Deviation Lorenz Curve

Coefficient of mean Deviation Coefficient of Quartile Deviation Coefficient of


Variation.
a.

Range
It is the difference the maximum value and the minimum value in a series of data. In
the other words, it is the difference between the largest value and the smallest value
of the distribution. It is an absolute measure.
Range = Largest Value - Smallest Value

b. Coefficient of range it is relative measure of dispersion and it is calculated as :


Coefficient of Range = L - S
L+ S
c.

Semi-Inter-Quartile Range
It is defined as follows.
Semi-inter-Quartile Range = Q3 - Q1
2
The semi-inter-Quartile Range considers only the middle 50% of the observation and
it ignores that first and the last quarter. It is an absolute measure. The quartile
deviation also measures the average amount by which the two quartile Q1 and Q3
differ from median.

d. Coefficient of Quartile Deviation


It is relative measure and defined as :
Coefficient of Q.D. = Q3 - Q1
40
Q3 + Q1

e.

Mean Deviation
The range or Quartile deviation do not take into account, the deviation from the
central value. The mean deviation considers these differences in absolute values and
averages these differences.
Thus, mean deviations, in which is an absolute measure is defined as the arithmetic
mean of absolute values of deviations of all the observations taken from the mean,
median and mode.

41

f.

Coefficient of mean Deviation


It is a ratio of the mean deviation and the measure from which the deviations are
considered. Hence it can be used to compare two or more sets of data. It is defined as
follows:

g.

Standard Deviation
This concept was
developed by Karl
Pearson in 1893. It
is defined as the
positive square root
of the arithmetic
mean of the squares of the deviations of the observations from the arithmetic
mean. It is denoted by s (sigma). It is an absolute measures. It is the most important
and widely used measure of dispersion.

42

h. Coefficient of Variation
The coefficient of variation is the relative measure corresponding to standard
deviation. It is denoted by C.V. And it is expressed as percentage.
C.V. = Standard Deviation

100

Mean
It is used to compare variability or consistency of two or more distributions.

.
4.1 Conclusion
After answer all the question which are given in the assignment i am understand the how
to process the data by research in each are like Editing, Coding , Tabulation, Graphic
Presentation and Classification of data and mean of all these aspect which are research
by mean meaning definition and in depth details of each items and in the third question
we have a study of Measure of Central Tendency, Correlation Analysis, Regression
Analysis and Measure of Dispersion in all categories wish explaination and like what
is mean, median , mode and corelation of karl peason Pearsons product moment
coefficient and Kendall's rank correlation coefficient () and after the Regression
analysis and type of regression analysis SIMPLE AND MULTIPLE, TOTAL AND
PARTIAL Linear and Non-linear and Measure of Dispersion in that we have a study
of Absolute Measure and Related Measure. After doing this research i have a lot of
knowledge of statistical mathematics. This my conclusion of about this project.

43

4.1 BIBLIOGRAPHY
Research Methodology written by S. Mohan and R. Elangovan published by
Deep publication pvt ltd.
Research Methodology written by Dr. S.L. Gupta and Hitesh Gupta published
by international book house.
Staitiscal mathematics

44

Vous aimerez peut-être aussi