Vous êtes sur la page 1sur 40

Excel for

Data Analysis

3005 30th Street


Boulder, CO 80301
303-444-7863
www.n-r-c.com
Excel for Data Analysis

Table of Contents

Introduction .................................................................................................... 1

Data Entry in Excel .......................................................................................... 2


Unique IDs ................................................................................................................................. 2
Setting up the Worksheet............................................................................................................. 2
Entering Single-Response, Closed-Ended Questions........................................................................ 2
Entering “Multiple-Response” Questions ........................................................................................ 3
Entering Open-Ended Questions ................................................................................................... 5
Creating a Codebook ................................................................................................................... 6

Analyzing the Data .......................................................................................... 8


Calculating an Average ................................................................................................................ 8
Creating a Frequency Distribution for a Single-Response Question ................................................. 10
Creating a Frequency Distribution for a Multiple-Response Question .............................................. 13
Functions and formulas used for simple descriptive analyses in Excel............................................. 15

Presenting the Results: One Quick Idea........................................................ 17

Using Pivot Tables for Basic and Advanced Analyses .................................... 18


Creating a PivotTables (“Basic” Analyses) .................................................................................... 20
Crosstabulation of Data Using PivotTables (“Advanced” Analyses) ................................................. 23

APPENDIX I: Example Completed Surveys for Data Entry ............................ 25

APPENDIX II: Example Codebook ................................................................. 32

APPENDIX III: Example Analysis, with Formulas.......................................... 33

APPENDIX IV: Example of an “Annotated Instrument”................................. 37


Introduction
This handbook is designed to instruct program staff on how to set up data entry processes and
perform simple analyses of data collected through surveys, course evaluations, or by observation
or other record keeping.

Throughout this handbook, a common example is used: data representing the results from six
surveys completed by fictional participants of a fictional training program. A copy of the
completed surveys can be found in Appendix I. The reader may find it helpful to review the
surveys before continuing with the rest of the handbook. In fact, it might be beneficial to pull
out the six surveys and refer to them periodically while reviewing the handbook.

The individual conducting the data analysis is referred to in this handbook as the “analyst.” This
person may be a program staff member, volunteer, board member or other stakeholder willing to
accomplish this task. There is no job description for this analyst. He or she needs only to have a
basic understanding of Microsoft Excel, know how to perform calculations using the contents of
multiple cells, and be familiar with formulas. Reminders about using Excel are found in text
boxes throughout the handbook.

Good luck!

The Staff of NRC

Excel for Data Analysis was written by National Research Center, Inc.
3005 30th Street, Boulder, Colorado 80301
Phone: 303-444-7863 Fax: 303-444-1145 www.n-r-c.com

Copyright © 2003 by National Research Center, Inc. All rights reserved.

Excel for Data Analysis Page 1


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Data Entry in Excel
The first job to be completed before data analysis of a data set is creating an electronic dataset, or
entering the data into an electronic file. This can be done fairly simply using Microsoft Excel.

Unique IDs
Before beginning the data entry, it is advisable to put a unique identifier on each survey or data
form. This will allow the analyst to keep track of his/her progress, and will also make it easier to
track down and set straight any data entry errors. This “identifier” is not one that actually
associates or identifies the survey with a particular person; rather, it is only to make it easier to
find a specific survey at a later date. The surveys do not need to be in any particular order, just
begin at the top of the stack with 1, and number consecutively.

Setting up the Worksheet


To set up a worksheet for data entry, the analyst will use the first row (row 1) as the question or
question part labels. Dedicate the first column (column A) to the IDs. Thus, the analyst will put
the label “ID” in cell A1. Cell B1 would contain the label q1 (for question #1) or whatever is
appropriate for the first question or field of data. Cell B2 would contain the label q2 (or
whatever is appropriate), etc.

Each survey will then be entered into one row; the first survey in row 2 (ID #1), the second
survey in row 3, and so on.

Reminder: Cell References


“Cells” in an Excel spreadsheet are referred to by the intersection of the Column and Row in
which they appear. In the example used for this handbook, the cell that contains the label “ID”
is cell A1, because it is in the first column (A) and the first row (1). The cell that contains the
answer to question #1 of the third survey entered is B4 (the 2nd column and the 4th row).

Entering Single-Response, Closed-Ended Questions


A “closed-ended” question means that the respondent chooses an answer by marking a box or
circling a number from a given list of possible responses. A “single-response” question means
that the respondent is to only choose one answer from the list.

Question #1 (shown below) from the example survey represents a single-response, closed-ended
question.

1) How many of the training sessions did you attend?


‰ 1 to 2
‰ 3 to 4
‰ 5
‰ 6 or more

When entering and analyzing data, it is easiest to work with numbers. To do this, a number is
assigned to each possible response option:
“1 to 2” = 1,
“3 to 4” = 2,
“5” = 3, and
“6 or more” = 4.

Excel for Data Analysis Page 2


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Thus, since the respondent to the first survey said they
attended “5” sessions, a “3” would be entered as the answer.
The example to the right shows how the answers to
question #1 would be entered for all six fictional surveys
(from Appendix I).

Entering “Multiple-Response” Questions


Question #2 from the fictional survey is a “multiple-response” question, meaning that
respondents could give more than one answer to the question; in this example, they may have
heard of the program from multiple sources.

2) How did you hear about this training? (Please check all that apply.)
‰ Neighborhood newsletter
‰ Bulletin boards in community buildings
‰ Flyers
‰ Your child’s school
‰ Word of mouth
‰ Other

There are two ways the data could be entered from a question of this type. In the first method, a
number is assigned to each response, similar to a single-response question. However, more than
one column is assigned to the question. The number of columns assigned should be as many as
the highest number of answers the analyst
believes that the respondent may give; if
necessary, assign as many columns as there
are possible responses (in case a respondent
checks every box). In the example at left, 3
columns were assigned to question #2, and
the answers entered as shown.

The second approach to multiple-response questions is to assign a column to each possible


response. For the example question #2 (shown on the previous page), the following columns
would be assigned:
q2a: Neighborhood newsletter
q2b: Bulletin boards in community buildings
q2c: Flyers
q2d: Your child’s school
q2e: Word of mouth
If a response was marked, place a “1” in the assigned column. If no response was given, leave it
blank, or place a “0” in the column. With this method, it is harder to know if a respondent
skipped a question altogether. The analyst may wish to have a column before q2a where he/she
marks whether or not the question was left blank (1=blank, 2=not blank). This will help in the

Excel for Data Analysis Page 3


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
analysis, when calculating the percent of respondents giving each answer. The example below
shows how the data could be entered for question #2 using this approach.

Reminder: “Freeze Panes”


“Freezing the panes” allows the labels at the top of the worksheet and the IDs at the left of the worksheet
to be always visible. To freeze the panes, put the cursor in the cell where the panes should break
(usually B2). Then select “Windows” from the menu bar, and then the option to “Freeze Panes.” This
option works as a toggle; that is, if this option is selected again, the panes will “unfreeze.” (If the panes
are frozen, the menu option will read “Unfreeze Panes.”) Using this option is quite helpful where there
are many variables (columns) or cases (surveys, records of data in rows).

Excel for Data Analysis Page 4


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Entering Open-Ended Questions
An open-ended question is one in which respondents are invited to answer in their own words,
rather than from a list of responses. Question #6 on the fictional survey represents an
“open-ended” question.

6) Do you have any other comments you would like to make about this training?
________________________________________________________________________
________________________________________________________________________

Depending on the type of open-ended question asked, the analyst may or may not wish to enter
these responses into the dataset at the same time as the other questions are entered. These
questions could be entered later into an appendix for a report, or they could be read and assigned
“codes;” that is, like answers could be grouped into categories. Each category or code could be
assigned a number, and these codes entered into the dataset in a manner similar to the examples
shown above.

For this fictional survey, the answers to Question #6 were deemed short enough to enter verbatim
into the dataset, as shown in the example below:

However, the answers to Question #7 were considered appropriate for coding.

7) What is your race? ____________________

The answers were entered into the dataset as written in by respondents, as shown below, but then
codes were assigned: 1=Latino/a; 2=Asian; 3=White/Caucasian.

Excel for Data Analysis Page 5


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Creating a Codebook
The examples above showed how the data entry would occur for each type of question.
Generally, the analyst will want to set up the data entry spreadsheet before beginning the data
entry. By knowing how to enter each type of question, the analyst can determine which
questions will be entered into each column, being sure to reserve the first column for the IDs.

Appendix II shows the codebook for the fictional survey being used as an example in this
handbook. The ID is in column A (shown with a circle around it), question #1 is in column B,
question #2, using the first version of multiple-response data entry, is in columns C through E,
while question #2 using the second version of multiple-response data entry is in columns F
through K (in this example, the “others” were ignored), the three parts of question #3 are in
columns L, M and N, and so on. This codebook also shows the numeric equivalents assigned to
each question response.

It is a good idea to hang on to this codebook. It will serve as a customized guide in data entry,
and in the analysis of the data once the dataset has been created. The example below shows the
entered data for the surveys shown in Appendix I.

(Note: the columns for the open-ended questions were shrunk to allow all the columns to show.)

Excel for Data Analysis Page 6


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Reminder: Wrapping Text
Sometimes the text entered into a cell is too long for it to display in its entirety. To turn on text wrapping
(the text will automatically move to the next line if it runs out of room), highlight the cells to be formatted,
then choose “Format” from the menu bar, and then “Cells.”

Click on the “Alignment” tab, and check the box labeled,


“Wrap text.” Click the OK button to apply the
formatting.

The text should be wrapped in the cell. Note that


wrapping text will change the height of the rows.

Excel for Data Analysis Page 7


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Analyzing the Data
Now that the data collected by the program has been entered into an electronic dataset, the
analyst is ready to start analyzing the information to get answers to the questions posed. This
next section will demonstrate how to use formulas and functions within Excel to produce the
“statistics” or summaries of the information needed.

Reminder: Formulas
Formulas are used to perform calculations within a spreadsheet. To insert a formula, as opposed to a
number or text, type an equals sign (“=”) in the cell where the calculation is to be performed, and then
type in the rest of the formula. A formula can perform mathematical calculations or execute a wide variety
of functions (see below for more on functions). To add or subtract, use the plus (+) or minus (-) symbol.
To multiply, use an asterisk (*) and to divide use a slash (/). Use parentheses as necessary to indicate
the desired order of operations.
For example, if the analyst wanted to know how many seconds there were in 3 hours, he or she could
type in the formula: =3*60*60. The result displayed in the cell would be 10,800.
There might have been a cell somewhere on the page that had a value of “3” to indicate three hours; for
the sake of an example, this cell is T21. To know how many seconds that represented, use the same
formula as above, but exchange the “3” for the cell reference: =T21*60*60. If the number of hours in cell
T21 changed, the result of the formula would also change.

Reminder: Functions and Referring to a range of cells


Functions can be used within formulas to perform special calculations or manipulations. There are a
large number and variety of functions that can be used in Excel. Some of the functions are mathematical,
some are logical, some are statistical, and others serve yet more purposes.
All functions begin in a similar fashion: the equals sign (=), the function, immediately followed by an open
parenthesis, the references on which the function should operate each separated by a comma (a different
number of references are needed for each function), and a close parenthesis.
For example, the “SUM” function can be used to add the values of several cells.
Some functions will refer to a “range” of cells. For example, if an analyst wanted to total the number of
youth served in the table below, a formula could be used like that found in cell B5: =B2+B3+B4.
Alternatively, the SUM function could be used which referred to a range of cells to be summed, like
this: = SUM(B2:B5). The colon indicates that a range of cells is being referred to, starting with (and
including) the cell to the left of the colon, and ending with (and including) the cell to the right of the colon.
The function “SUM” indicates what is to be done with this range of cells – total all the values together.

Calculating an Average
Calculating the average of a range of cells is a fairly simple procedure within Excel, and
appropriate for certain types of data. For example, in the fictional survey for our training
program, one of the questions asks respondents to report their annual household income. The
average annual income of participants could be calculated and reported.

Excel for Data Analysis Page 8


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
The function “AVERAGE” would be used to make this calculation. As shown in the table
below, to create this formula an equals sign (=) is first typed, followed by the function, with the
range of cells proceeding the function in parentheses.

Reminder: Formatting cells


In many of the spreadsheet examples shown in this handbook, some of the cells are formatted as
numbers, and some are formatted as percents. You will want to format the cells appropriately. To format
a cell or group of cells, highlight the cells you wish to format, then choose “Format” from the menu bar,
and then “Cells.” A dialogue box will open, with a number of formatting options. You can format the
alignment of the cell contents, the cell shading or border, or the “Number.” If you choose the “Number”
tab, you will be presented with a list of types of number formats, such as “currency,” “percentage,” etc.
Choose the type, and then decide how many decimals you want. The highlighted cells will be formatted
according to the specifications you choose.

Excel for Data Analysis Page 9


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Creating a Frequency Distribution for a Single-Response Question
Creating a frequency distribution, or a count and/or proportion of respondents giving each
response to a question, is an intuitively easy process. However, doing it within Excel for a large
number of cases is actually a multi-step procedure.

The first step is to count how many respondents gave each response. There is a function within
Excel that will help automate this step: “COUNTIF.” To use this function, specify two items:
- What range of cells contains the answers to the question of interest, and
- Which particular answer should be counted (“the criterion”).

The function is set up as: =COUNTIF(range of cells, criterion). To know how many people
attended the training program just one or two times, the analyst would want to count how many
times “1” (the numeric assignment for question #1 to the response “1 to 2”) was entered as the
answer to question #1. The data for question #1 are in column B, and specifically in rows 2
through 7. The formula to enter to find out how many respondents said they attended one or two
sessions would be:
=COUNTIF(B2:B7,1)
The results can be seen in the table below in cell B13. The formula is shown to the right in cell C13.

To get a count of the number of responses to each of the other possible answers, use the same
formula, but change the criteria each time. (See the formulas in cells C14, C15, and C16.)

In this example, no participants attended 1 to 2 sessions, three participants attended 3 to 4


sessions, two participants attended 5 sessions, and one participant attended 6 or more sessions.

Excel for Data Analysis Page 10


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
To know the proportion (percent) of respondents attending 6 or more times, the analyst would
want to divide the number who gave that answer by the total number of those who answered the
question. The SUM function can be used to total the number of respondents who answered that
question. In the example above, the formula would be: =SUM(B13:B16). In the table below,
that formula was entered into cell B11.

To determine the proportion of people giving that answer, the contents of cell B16 would need to
be divided by cell B11. As shown below, those results are displayed in cell B22. The formulas
showing the formulas for calculating the proportion giving each answer to question #1 are also
shown.

Excel for Data Analysis Page 11


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Reminder: Absolute versus relative cell references
In a formula, a cell reference can be made in a “relative” or an “absolute” manner. For example, looking
at the table below, if the analyst wanted to calculate a percent, he or she might create a formula in cell C2
which would display the proportion of youth served who are 12-14 years old. That formula would be:
=B2/B5, which would divide the value of B2 (12) by the value of B5 (112).

The analyst may then wish to also calculate the proportion of youth served who are 15-17 years old. If
the contents of cell C2 were copied to cell C3, the formula would look like this: =B3/B6. This is because
in Excel the cell references in this formula are “relative” references; that is, Excel has assumed that
because in cell C2 the calculated number was derived by dividing the number in the same row and one
column to the left by the number three rows below and one column to the left, the same thing should
happen in the cell to which the formula is copied. However, cell B6 is blank, so an invalid number would
be calculated in cell C3 using this formula. This can be fixed by changing the formula after it has been
copied, so that the denominator refers to B5. But, if the formula is then copied to cell C4, the
denominator would again have to be manually changed in the formula to refer to the correct cell that
contains the total number of youth served. If this manual change was not made, the formulas in
column C would look like the formulas in column D in the table below.
If, however, an “absolute” reference was used to refer to the row that contains the total number of youth
served, when the formula was copied, the denominator would always refer to row 5. The dollar sign ($) is
used to indicate an absolute reference. In this example, it is only used for the row designation, not for the
column designation. It can be used for both the row and column designation, or only one or the other.
Excel defaults to assuming that all cell references are relative, unless the change is made manually.
Knowing how to use relative and absolute references can greatly speed up creation of spreadsheets in
Excel.

Excel for Data Analysis Page 12


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Creating a Frequency Distribution for a Multiple-Response Question
The approach to be used to calculate the results to a multiple-response question depends upon the
approach used to enter the data.

If the data have been entered such using the first approach described, where a numeric
assignment is made for each possible response, but more than one column is designated for entry
of the results (as in columns D, E and F in the table below), then the counts and proportions can
be calculated in a manner quite similar to that of an single-response question. The change would
be in the definition of the range of cells to include in the count. Instead of covering only one
column, it would cover multiple columns. In this example, the number of people who said they
heard of the program through the neighborhood newsletter would be determined using the
formula:
=COUNTIF(D2:F7,1)
Calculating the percent of respondents who heard of the program through the neighborhood
newsletter would also be changed slightly. Instead of dividing the number of respondents giving
a specific answer by the sum of the cells F13 through F17 (which would be the total number of
responses, not respondents answering the question), the denominator is the total number of
respondents answering the question.

To determine this, the number of valid answers entered in column D would need to be examined.
This can be done using the COUNT function. This formula is not shown in the table below, but
would be entered in cell D11 as follows:
=COUNT(D2:F7)
This function counts the number of non-blank answers in the range of cells specified. In this
case, every respondent gave at least one answer, so the total is 6, the same as the number of
returned surveys. This same formula (with the correct cell range specification) was used in
cells E11 and F11. The numbers displayed there designate the number of people who gave two
or more answers (4 people, see cell E11) or three answers (1 person, see cell F11).

It should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.

Excel for Data Analysis Page 13


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
If the answers to question #2 were entered as shown in columns H through M, where each
possible answer was assigned to a column, and a “1” was used to designate when a box was
checked, then a slightly different approach is needed to create the frequency distribution.

First, to get the total number of respondents who gave an answer, column H needs to be
appropriately analyzed. In this instance, a “1” was entered if a respondent gave no answer to the
question, and a “2” was entered if a respondent gave at least one answer. The formula in
cell H11 (not shown in the table below) was =COUNTIF($H$2:$H$7,2), to count the number of
valid answers to question #2. This formula was copied to cells I11, J11, K11, L11 and M11.

To determine the number of people who indicated each potential source of familiarity with the
training, the number of “1” responses in each column was counted, using the COUNTIF
function. The formula for cell M13 (the number of respondents indicating they heard of the
program by word of mouth) is shown in cell N13. A similar formula was used for each of the
other responses.

Next, to determine the proportion of respondents each of those counts represented, the counts
were divided by the number of valid responses to question #2. As shown in cell M19, 33% of
respondents reported they had heard of the training by word of mouth. The formula used to
make that calculation is shown in cell N19. A similar formula was used for each of the other
responses.

Again, it should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.

PivotTables cannot be used to calculate the frequency distribution of multiple response


questions.

Excel for Data Analysis Page 14


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Reminder: Functions Revisited
“SUM” is only one of a large number of functions available in Excel. Some of the functions are
mathematical, some are logical, some are statistical, and others serve yet more purposes.
All functions begin in a similar fashion: the function, immediately followed by an open parenthesis, the
references on which the function should operate each separated by a comma (a different number of
references are needed for each function), and a close parenthesis. The functions needed for simple
descriptive analyses in Excel are shown below.

Functions and formulas used for simple descriptive analyses in Excel


The table on the next page displays the functions used to perform the analyses described in this
handbook. The examples all refer to the spreadsheet and examples shown in Appendix III.

Excel for Data Analysis Page 15


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Functions and formulas used for simple descriptive analyses in Excel
using the function or value
Calculate . . . by . . . formula . . . operators are: example: displayed: what it means:
counting the number of range of cells for
the number of surveys rows of data entered which the number of 6 surveys were
(regardless of whether ROWS =ROWS(B2:B7) 6
completed rows should be returned
some cells/rows are blank) counted
the average rating or calculating the average of range of cells The average annual
the ratings or answers
answer of those who given by those who gave AVERAGE containing the values =AVERAGE(AH2:AH7) $29,000 income as reported
responded an answer to be averaged for question #10
examining the values in a range of cells The lowest annual
the lowest number given as
range of cells, and finding MIN containing the values =MIN(AH2:AH7) $15,000 income as reported
an answer the lowest value to be examined for question #10
examining the values in a range of cells The highest annual
the highest number given
range of cells, and finding MAX containing the values =MAX(AH2:AH7) $57,000 income as reported
as an answer the highest value to be examined for question #10
1) the range of cells to
the number of respondents counting the number of 2 people gave an
be examined
who gave a specific responses of a certain type COUNTIF =COUNTIF(B$2:B$7,3) 2 answer of “5 times”
within a range of cells 2) the value to be
answer* question #1
counted
the total number of adding the number of range of cells to be 6 people answered
respondents who answered people who gave a valid SUM =SUM(B13:B16) 6
answer to a question totaled question #1
the question**
4 people gave two
the total number of or more answers to
counting the number of range of cells to be question #2 (as
respondents who answered nonblank answers COUNT =COUNT(E2:E7) 4
examined column E contains the
the question
second answer people
gave to question #2)
cell reference1 is the
dividing the number of cell reference of the 33% of respondents
the proportion (percent) of people who gave a specific
(division) numerator; cell gave an answer of
respondents who gave a answer by the total number =B15/B$11 33%
of people who answered [cell reference1]/[cell reference2] reference2 is the cell “5 times” to
specific answer
the question reference of the question #1
denominator
*This is used for each “row” or part of a frequency distribution.
** Or the sum of any list of numbers.

Excel for Data Analysis Page 16


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Presenting the Results: One Quick Idea
Once the frequency distributions of the data set have been produced, how will the analyst and
other program staff share this information with others? The Excel spreadsheet is not very pretty.

One idea is to create an “annotated instrument;” that is, typing the results into a blank
questionnaire.1 Most evaluation forms or surveys have been created using word processing
software such as Word or WordPerfect, and thus are well-suited to this approach. A new file
should be created from the electronic version of the survey. The check boxes can then be
replaced with the proportion of respondents giving each answer. For example:

1) How many of the training sessions did you attend?


0% 1 to 2
50% 3 to 4
33% 5
17% 6 or more

Staff can write a cover memo or report to accompany the annotated instrument that explains the
methods used to obtain the data and interprets the results.

An example copy of an annotated instrument can be found in Appendix IV.

1
The term “annotated instrument” is one created by and used by staff at National Research Center, Inc. It is NOT a
commonly used evaluation term, but one that we think is descriptive.

Excel for Data Analysis Page 17


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Using Pivot Tables for Basic and Advanced Analyses
Pivot tables are an analytic tool at the disposal of the Excel user. They take a bit of time to set
up, but can be very powerful. Pivot tables can be used as an alternate way to create frequency
distributions, although they cannot be used for multiple response questions. They can also be
used to create crosstabulations of data. For example, the analyst might wish to know whether
males and females have a different response to a training, or whether younger respondents feel
more positively about staff than older respondents.

A useful first step before creating a pivot table is to name the range of cells that will be used for
the analyses. This range of cells should include the first row with the variable names.

Reminder: Naming a Range of Cells


To name a range of cells, highlight all the columns and rows that make up the database. Choose Insert
from the menu bar, select Name and then Define…

In general, when the named range of cells will be used for creating pivot tables, it is a good idea to name
the range “Database.” This is the default name used by Excel in the pivot table wizard. The “Define
Name” dialogue box above shows that the name “Database” has been typed in. The field labeled “Refers
to:” shows that Database will refer to the cells starting at A1 and going to W7 in the worksheet labeled
“Data Entry.” These are the cells that contain the data entered for the fictional survey.

Once a range of cells has been defined, pivot tables can be created from those data. It is easiest
to create the pivot tables on another worksheet within the workbook.

Excel for Data Analysis Page 18


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Reminder: Worksheets within a Workbook (or Spreadsheet)
An Excel file is often referred to as a “spreadsheet.” This file, however, is comprised of a group of
“worksheets.” By default, a new workbook in Excel usually contains three worksheets. These are usually
labeled “Sheet1,” “Sheet2,” and “Sheet3.” The note below was entered in cell B7 on Sheet2. To see a
different worksheet, simply click on the tab of the worksheet to be viewed. To rename the worksheets,
double-click the tab and type a new name. Names are limited to a certain number of characters.

Excel for Data Analysis Page 19


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Creating a PivotTables (“Basic” Analyses)
Before the analyst sets up the pivot table, he or she should place the cursor in the cell where it is
desired to generate the pivot table. To set up a pivot table, go to the Data menu, then select
PivotTable and PivotChart Report… The PivotTable and PivotChart Wizard will walk one
through the rest of the set up. In the example below, the pivot table will be placed in cell B4.

The Pivot Table and PivotChart Wizard


Once “PivotTable and Pivot Chart Report...” has been selected from the Data menu, the Pivot
Table and PivotChart Wizard will start displaying a series of dialogue boxes. The first dialogue
box is shown below as Step 1 of 3. (Note: Different versions of Excel will have slightly different
Pivot Table and PivotChart Wizard dialogue boxes, but the steps to follow are the same or
similar.)

Step 1: Two questions are asked in Step 1 of the


Wizard. For the most part, the analyst will
select the Wizard’s default options. In answer
to the first question, the data to be analyzed is
an Excel list or database. In answer to the
second question, a PivotTable will be created.
(Note: PivotCharts are not discussed in this
handbook, but the analyst may wish to try this
option.)

Click Next to continue onto the next step of the


Wizard.
Excel for Data Analysis Page 20
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Step 2: In Step 2, the Wizard asks for the
location of the data to be used in the PivotTable.
The name “Database” is automatically inserted
as the answer. If another named range is
desired, it can be typed into the field. If the
range of cells to be used has not been named, it
can be selected by clicking on the “Browse…”
button.

Click Next to continue onto the next step of the Wizard.

Step 3: In Step 3, the Wizard asks where the


PivotTable should be placed. The default is the
location of the cursor when the Wizard was
started.

At this point, the analyst will choose the data to


be displayed in the PivotTable by clicking on
the “Layout…” button. When this button is
clicked, another dialogue box is displayed

Layout: The Layout dialogue box displays all


the variables or fields available for display in
the PivotTable. These fields are shown as a
series of buttons in the right half of the dialogue
box. If there are a large number of fields, the
scroll button below the fields can be used to
show additional field buttons. In the left half of
a blank template is shown. To select a field for
display, simply drag the fields from the right
into the areas on the left.

To create a pivot table that displays the frequency of training attendances, the button q1 (“How
many of the training sessions did you attend?”) would be dragged into the row area, so that the
values in q1 will be listed vertically as rows. A field is also needed for the data section. It does
not really matter what button is dragged into the data section, as it will be used simply as a
counter. However, it should be a field that has no missing data; the ID field is ideal for this
situation. As shown above, the ID field was dragged into the data area. Usually by default the
field in the data area will be shown as a “Count.”

If a different summary is desired, double-click the button, and a dialogue box displaying various
options will be displayed.

Excel for Data Analysis Page 21


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
PivotTable Field: The Field dialogue box shown to
the left is displayed if a button in the data portion of
the template is double-clicked. In this example, the
data summary chosen is “Count.” In addition, if the
“Options>>” button is clicked, more options for the
display of the data are shown.

In this instance, it would be appropriate to display the


information as a proportion, so the option of showing the
data as: “% of column” was selected.

Format Cells: To choose a number format for the


data display, click on the “Number” button in the
PivotTable Field dialogue box. A Format Cells
dialogue box will be displayed, from which an
appropriate number format can be selected.

Excel for Data Analysis Page 22


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
After this, click the “OK” buttons
until the Step 3 dialogue box is
again showing. At this point, if
the “Finish” button is clicked, the
PivotTable will be displayed. In
this example, the PivotTable will
appear as shown to the right:

Note that when using the


PivotTable method for this
question, the value 1 (“1 to 2
sessions”) is not listed because no
one selected this response in the
survey.

Crosstabulation of Data Using PivotTables (“Advanced” Analyses)


Sometimes it is useful to analyze the data based on certain respondent characteristics; for
example, satisfaction ratings by gender or program attended. One of the easiest ways to generate
a table like this is through the use of a PivotTable.

The example to the right shows the PivotTable layout and resulting table to perform a
crosstabulation of the results to question #5 “How would you rate the overall quality of this
training?” by the gender of the respondent. (Of course, crosstabulations are recommended with
larger datasets than that created for these
examples, with sufficient number of cases
within each subgroup examined.)

This PivotTable Layout: (Q9, gender, is


placed in the column area, while q5, quality
rating, is placed in the row area. ID is
again used for the data section.)

Females (1) gave more positive


produces: answers than did males (2).

Excel for Data Analysis Page 23


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
The analysis in the previous example could also be performed using the average quality rating,
on a scale from 1 to 4, where 4 = “excellent” and 1 = “poor.”

This PivotTable Layout: (Q9, gender, is placed


in the column area, while q5, quality rating, is
placed in the data area. The type of data
summary was changed to “Average”, and the
Number formatting was changed to a number
with two decimal places.)

Again, this shows that


females (1) gave
produces: higher quality ratings
than did males (2).

Excel for Data Analysis Page 24


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
APPENDIX I: Example Completed Surveys for Data Entry
The following pages show the completed surveys from six participants in a fictional training
program. These were used for all the examples in this handbook.

Excel for Data Analysis Page 25


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 26
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 27
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 28
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 29
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 30
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 31
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
APPENDIX II: Example Codebook

Excel for Data Analysis Page 32


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
APPENDIX III: Example Analysis, with Formulas

Excel for Data Analysis Page 33


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 34
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 35
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Excel for Data Analysis Page 36
© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
APPENDIX IV: Example of an “Annotated Instrument”
The next page shows an example of an “annotated instrument” for the training program using the
data examples as included in the previous appendices.

Excel for Data Analysis Page 37


© National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863
Training Evaluation: Annotated Instrument
1) How many of the training sessions did you attend?
0% 1 to 2
50% 3 to 4
33% 5
17% 6 or more

2) How did you hear about this training? (Please check all that apply.)
33% Neighborhood newsletter 50% Your child’s school
17% Bulletin boards in community buildings 33% Word of mouth
50% Flyers 0% Other

3) Please rate the following aspects of the training:


Very Very
Poor Poor Good Good
The instructor’s knowledge of the topic .......................................................... 0% 17% 67% 17%
The instructor’s presentation style/skills ...................................................... 0% 25% 50% 25%
The handouts or take-home materials ........................................................... 20% 0% 80% 0%

4) Rate the extent to which you agree or disagree with each of the following statements.
Strongly Strongly
Disagree Disagree Agree Agree
I would strongly recommend this training for my friend............................. 0% 20% 60% 20%
This training will help improve the quality of like for my family................. 0% 17% 50% 33%

Poor Fair Good Excellent


5) How would you rate the overall quality of this training? ...............................17% 0% 33% 50%

6) Do you have any other comments you would like to make about this training?
• I think we spent too much time reviewing the background information.
• I had a lot of fun. I thought Angela was great.
• This was great! I will definitely apply what I learned at work and at home!

7) What is your race? 10) What is your annual household income?


50% Latino/a average annual income:= $29,000
17% Asian 33% less than $20,000
33% White 33% $20,000 to $29,999
17% $30,000 to $39,999
8) How long have you lived in Colorado? 17% $40,000 or more
17% 6 years
50% 7 years 11) Is your child enrolled in the free lunch program?
33% 8 years
50% Yes
50% NO
9) What is your gender?
50% Female
50% Male

Thank you for your answers!


Excel for Data Analysis Page 38
© National Research Center Inc. 3005 30th St.• Boulder, CO 80301•(303) 444-7863

Vous aimerez peut-être aussi