UNIT 4 Data Collection

While deciding about the method of data collection, we should keep in mind two types of data
Primary Data Secondary Data

Primary Data: which is collected fresh and for the first time and thus happen to be original in character.
Secondary Data: Which have already been collected by someone else and which have already been passed through the statistical process.

Factors Collection Purpose Collection Process Collection Cost Collection Time Primary Data For the problem in hand Require high involvement high Long Secondary Data For some other Problem Require Less Involvement Relatively Low Short

Collection Of Secondary Data

Secondary Data

Ready to use Published Material Require further process Computerized database


Syndicate Services

From general business Sources

Government Sources Intranet Internet Offline

Internal: when data available within the organization. Ready to use data: can be used in the form in which it is available.
But some times it requires further statistical processing to sort out the information required.

External: When data is collected from some outside source. Published material: usually published data are available in, Various publication of central ,state or local government Publication of international bodies. Technical and trade journal Books, Magazine & Newspaper Reports and publications of various association connected with business & industry, banks, stock exchange. Reports prepared by Research Scholars.

Computerized Database: Intranet: require computer terminal & telecommunication network Advantage is the data can be accessed by a limited number of user or by the authorized user only. Internet: also require computer terminal & telecommunication network Data is available for the entire group of users.

Offline: data is available in CD, DVD, Pen drives etc. Advantage is that they can be used at any location and without the help of any telecommunication network. Syndicate Services: companies that collect and sell common pools of data of known commercial value designed to serve a number of clients. Marketing research firms that collect, package and sell their data to many clients (each client receives the same information).

Characteristic of Secondary Data

Reliability Suitability


Reliability: it can be tested by finding out such thing about the data. Who collected the data? Who were the source of data? Were they collected by using proper method? At what time were they collected. What level of accuracy was desired & was it achieved?

Suitability: the data are suitable for one enquiry may not necessarily be found suitable in another enquiry. It is better to scrutinize the definition of various terms and units of collection used. Objective, scope and nature of the original enquiry must be studied.

Adequacy: If the level of accuracy achieved in the data is found inadequate for the purpose of present study, it will consider as inadequate. Data also consider inadequate if related to an area which may be either narrow or wide than the area of present study.

Collection of Primary Data

Primary Data Qualitative Direct Focus Group Indirect Depth Interview Projection Technique


Association Completion
Interview/Survey Telephonic Interviewing

Expression Observation
Other methods

Structured Personal Interviewing




Qualitative: used in case of Exploratory Research.

Direct: Disclose the purpose of research. Focus Group Interview: is conducted by a trained moderator in a non structured and natural manner with a small group of respondents from the appropriate target market. Depth Interview: is an unstructured direct personal interview in which a single respondents is asked by a highly skilled interviewer to uncover underlying motivation, attitude, feeling of respondents on a topic.

Indirect: Not disclose the purpose of research.

Projection Technique: Participants are asked to project their feelings and thoughts onto other things. For example: If Coca-Cola was an animal, which animal would it be?

Association: used to extract information regarding such words which have max association. in this the respondent is asked to mention the first word that comes in mind, without thinking, as the interviewer read out each word from the list.
Frequently used in Advertising research

Completion: an extension of association technique.

Informants may be asked to complete a sentence like person who wear khadi are.. To find association of khadi clothes with certain personality characteristics. Analysis of replies from the same informant reveals his attitude towards the subject & the combination of these attitudes of all the sample members is then taken to reflect the view of population.

Expressive: in this respondents are asked to comment or explain what other people do. Like, Why do people wear designer cloths? Answer may reveal respondents own motivation. Also the subjects are asked to act out a situation in which they have been assigned various role and the researcher may observe various traits.

Quantitative: Used in case of Descriptive Research.

Interview: involves the presentation of oral verbal stimuli and reply in terms of oral verbal response. The interview has been called a conversation with a purpose, and more formally a purposeful discussion between two or more people

Telephonic Interview: contacting the respondents over phone. Merits: More flexible than mailing method. Faster Cheaper than personal interview High rate of response No field staff is required. Recall is easy, callback are simple and economical. Demerits: Less time is given to respondents for considered answers. Restricted to respondents who have phone facility. Extensive geographical coverage may get restricted by cost consideration. Questions have to be short and to the point.

Personal Interview: requires a interviewer asking questions generally in a face to face contact to the other person. This method usually carried out in a structured way, involves the use of a set of predetermined question. Merits: Obtained more in-depth information. Non response is very low. Language of the interviewer can be adopted according to the ability or the educational level of the person interviewed.

Demerits: Very expensive More time consuming especially when sample is large. Selecting, training and supervising the field staff is more complex. Presence of interviewer on the spot may over stimulate the respondents.

Observation: Most commonly used method in behavioral science. Information is sought by investigators own direct observation without asking from the respondents. Advantage: Subjective bias is eliminated the information obtained relates to what is currently happening This method is independent of respondents willingness to respond. Disadvantage: It is an expensive method. Information provided by this method is very limited.

Structured Observation: it is characterized by a careful definition of the units to be observed, the style of recording the observed information, standardized condition of observation, and the selection of useful data only. Mainly used in Descriptive Studies. But when there is no planning in advance about all the mentioned things, it is termed as Unstructured Observation. Mainly used in Exploratory Studies.

Participant & Non Participant Observation: This distinction depends upon the observer sharing or not sharing the life of the group he is observing. If the observer observes by making himself, more or less, the member of the group he is observing so that he can experience what the members of the group experience, the observation is called Participant. When the observer observe as a detached person without an attempt on his part to experience through participation what other feels, it is known as Non Participant.

Questionnaire: It is a list of questions sent to a number of persons to answer. It secures standardized results that can be tabulated and treated statistically. Purpose: To collect information from the respondents who are scattered in a vast area. To achieve success in collecting reliable data.

Types of Questionnaire On the basis of structure Structured Unstructured On the basis of Questions Open ended Close ended Mixed Pictorial

Structured : are those in which there are definite, concrete questions. The questions are presented with exactly the same wording and in the same order to all the respondents.

Unstructured: work as a guide to the interviewer. The interviewer is free to arrange the form and timing of inquiry. The main advantage is flexibility.

Open ended: the respondent is free to express his/her views and ideas rather than limited to stated certain alternatives. Close Ended: The responses are limited to the stated alternatives, like Yes or No Mixed: combination of both open & close ended Pictorial: Pictures are used to promote interest in answering the questions.

Advantage: It can be used as a method or as a base for interview. Can be posted, emailed and faxed. Can cover a large number of people. Wide geographic coverage. Relatively cheap.

Disadvantage: Design Problem Questions have to be relatively simple. Low response rate Time delay Assumes no literacy problem No control over who complete it. Not possible to give assistance if required.

There is not any such a scientific method to frame question, but some general guidelines to frame the questionnaire are,

Formulation Of Questionnaire
Step 1: Specify the information needed. Try to make a dummy table Categorized the problem Decide the statistical tool in advance Select the parameter Should address all the different components of a problem Define the target group

Step 2: Specify the type of interview method It can be, Personal: can use complex type of question Telephonic: medium type of question Mail: simple question

Step 3: Determine the content of an individual question Is the question necessary? Are several question needed instead of one? Try to avoid double barreled question.

Step 4: Design the question to overcome the respondents inability & unwillingness to answer. Overcoming inability to answer. Is the respondent informed? Can the respondent articulate response? Overcome unwillingness to answer Legitimate purpose Sensitive information

Step 5: Choose the structure of Question Open ended: respondents are free to give any answer of their choice Close ended: choices are given Multiple choice: more than 2 alternative Dichotomous: only 2 alternatives

Step 6: Choose the question wording Dont use ambiguous words. Question should be simple & easy to understand. Avoid leading and biased question.

Step 7: Determine the order of Questions Basic: only to solve the basic problem Classification: related to sociographic & demographic variable of the respondents. Identification: related to identity of individual like address, phone number etc. Dont ask the sensitive or difficult question in the beginning. Question should be placed in a logical order.

Step 8: Format & Layout Format should be standard & proper because it affects the brand image of the organization.
Step 9: Printing of Questionnaire Print out should be very much clear or of good quality. Questionnaire should look impressive.

Step 10: Pretesting Usually a small number of respondents are selected for the pre-test. The respondents selected for the pilot survey should be broadly representative of the type of respondent to be interviewed in the main survey. Protocol Analysis: Respondents are allow to think & speak in order to identify the problem Debriefing: Suggestion take after filling the questionnaire.

Questionnaire is generally sent through mail to informants to be answered as specified in a covering letter, but otherwise without further assistance from the sender. Data collection is cheap and economical

Schedule A schedule is generally filled by the research worker or enumerator, who can interpret the questions when necessary.

Data collection is more expensive as money is spent on enumerators and in imparting trainings to them.

Non response is usually high as many people do not respond and many return the questionnaire without answering all questions.

Non response is very low because this is filled by enumerators who are able to get answers to all questions.

It is not clear that who replies.

Identity of respondent is not known.

The questionnaire method is Information is collected well in likely to be very slow since many time as they are filled by respondents do not return the enumerators. questionnaire.

No personal contact is possible in Direct personal case of questionnaire as the established questionnaires are sent to respondents by post who also in turn returns the same by post.



This method can be used only The information can be gathered when respondents are literate and even when the respondents cooperative. happen to be illiterate. Wider and more representative There remains the difficulty in distribution of sample is possible. sending enumerators over a relatively wider area. Risk of collecting incomplete and The information collected is wrong information is relatively generally complete and accurate more as enumerators can remove difficulties if any faced by respondents in correctly understanding the questions. The success of questionnaire It depends upon the honesty and methods lies more on the quality competence of enumerators of the questionnaire itself.

Overview of the Stages of Data Analysis

Editing is the process of reviewing the data to ensure maximum accuracy and clarity. Editing should be conducted as the data is being collected. This applies to the editing of the collection forms used for pretesting as well as those for the fullscale project.

Careful editing early in the collection process will often catch misunderstandings of instructions, errors in recording, and other problems at a stage when it is still possible to eliminate them for the later stages of the study.
Early editing has the additional advantage of permitting the questioning of interviewers while the material is still relatively fresh in their minds.

Editing is normally centralized so as to ensure consistency and uniformity in treatment of the data. If the sample is not large, a single editor usually edits all the data to reduce variation in interpretation. In those cases where the size of the project makes the use of more than one editor mandatory, it is usually best to assign each editor a different portion of the data collection form to edit. In this way the same editor edits the same items on all forms, an arrangement that tends to improve both consistency and productivity.

Types of Editing

1. Field Editing
Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.

2. In-house Editing
Editing performed by a central office staff; often done more rigorously (the quality of being extremely thorough and careful) than field editing.

Legibility of entries.
the quality of being clear enough to read Obviously the data must be legible in order to be used. Where not legible, although it may be possible to infer the response from other data collected, where any real doubt exists about the meaning of data it should not be used.

Completeness of entries. On a fully structured collection form, the absence of an entry is ambiguous. It may mean either that the respondent could not or would not provide the answer, that the interviewer failed to ask the question, or that there was a failure to record collected data

Consistency of entries.
Inconsistencies raise the question of which response is correct. Discrepancies may be cleared up by questioning the interviewer or callbacks to the respondent. When discrepancies cannot be resolved, discarding both entries is usually the wisest course of action.

Accuracy of entries.
An editor should keep an eye out for any indication of inaccuracy in the data. Of particular importance is the detection of any repetitive response patterns in the reports of individual interviews. Such patterns may well be indicative of systematic interviewer bias or interviewer/respondent dishonesty.

Coding is the process of assigning responses to data categories and numbers are assigned to identify them with the categories. Pre-coding refers to the practice of assigning codes to categories and sometimes printing this information on structured questionnaires and observation forms before the data are collected. Post-coding refers to the assignment of codes to responses after the data are collected. Post-coding is most often required when responses are reported in an unstructured format.

Once a complete code has been established, after post-coding, a formal coding manual or codebook is often created and made available to those who will be entering or -analyzing the data.


The mass of data collected has to be arranged in some kind of concise and logical order.
Tabulation summarizes the raw data and displays data in form of some statistical tables. Tabulation is an orderly arrangement of data in rows and columns. OBJECTIVE OF TABULATION:

1. Conserves space & minimizes explanation and descriptive statements.

2. Facilitates process of comparison and summarization.

3. Facilitates detection of errors and omissions.

4. Establish the basis of various statistical computations.


Tables should be clear, concise & adequately titled.

Every table should be distinctly numbered for easy reference. Column headings & row headings of the table should be clear & brief.

Units of measurement appropriate places.





Explanatory footnotes concerning the table should be placed at appropriate places. Source of information of data should be clearly indicated.

The columns & rows should be clearly separated with dark lines
Demarcation should also be made between data of one class and that of another. Comparable data should be put side by side. The figures in percentage should be approximated before tabulation. The alignment of the figures, symbols etc. should be properly aligned and adequately spaced to enhance the readability of the same. Abbreviations should be avoided.

No of person Occupation

No of person Occupation Male Female

No of person Occupation Male Married Female Unmarried Married Unmarried