Vous êtes sur la page 1sur 4

10/15/2014

Simple Exposure Analysis

SimpleExposureAnalysis
300958SocialWebAnalysis
Week4Lab

1 UsingR
1.1 ReadinginData
The easiest way to get data into R is to use read.csv. CSV stands for comma seperated
values. By default, R will read CSV files and if a column is numeric, make a numeric
variable. If the column contains any character data, the whole column will be made into a
factor.
The Key Metrics sheet from facebook has a top row with short variable names and a
second row with longer descriptions. This means that R will read every column as a factor.
To prevent this we use the as.is=TRUE option in read.csv. This has the effect of leaving
character variables as character.
This problem doesn't occur for most of the other sheets.

1.1.1 Exercise
Use read.csv to read in the CSV files keyMetrics.csv, LifetimeLikesByGenderAge.csv and
WeeklyReachDemog.csv into data frames. Make sure you give sensible names to the data
frames you create. Use as.is=TRUE for keyMetrics.csv to avoid the mentioned problem.

1.2 Manipulatingdataframes
The dim function in R gives the dimensions of a data.frame. Use it on your objects. The
head function shows the top few rows, and typing the data.frame name prints all of it out.
Try these out.
The summary command gives a quick summary of each variable (column) in the data frame.

1.3 Keymetrics
To do anything with the Key Metrics data we have to convert the dates (they will have been
read as character strings), and convert the other columns to numeric data.

1.3.1 ConvertingDates
The first column in Key Metrics is a date string. To use it in plots, it needs to be a data
http://staff.scm.uws.edu.au/~lapark/300958/labs/300958.week04.lab.html

1/4

10/15/2014

Simple Exposure Analysis

structure that can be manipulated. R has several forms of date structures but we will use
something called POSIXlt. To convert from a string to POSIXlt we use the function
strptime. Here is the code from lectures.
> dates <- keyMetrics[, 1]
> dates <- dates[-1]
> dates <- strptime(dates, format = "%m/%d/%y")

We extract the first column, remove the first row (it is the long description) and convert
using strptime.
Try this with your data.frame

1.3.2 Extractingacolumnofnumericdata
In lectures we extracted the Daily and Weekly Total Reach (columns 15 and 16) using the
code like the following.
> reach <- keyMetrics[, 15]
> reach <- as.numeric(reach[-1])
> wreach <- as.numeric(keyMetrics[-1, 16])

Identify and extract the "Weekly Total impressions" variable from the keyMetrics. HINT:
the names function will give a list of column names.
Impressions are the views of any content on a page.

1.3.3 Plotting
Try plotting the "Weekly Total impressions" by date. Remember for reach we used
plot(dates, reach, type="l")

Look at the help page for plot (see the examples) and try changing the axis labels ( xlab=
and ylab=), give a title ( main=) and changing colours and line type ( col= and lty=)

1.4 LikesandReach
The Likes and Reach by Demographics sheets (CSV files) are a different format. Generally
we are interested in these demographics for a particular date.
The following template code will extract the 158th row, including only the females and
males (columns 3 to 16).
> tab <- matrix(as.numeric(WeekReach[158, 3:16]), nrow = 2, byrow = TRUE)
> colnames(tab) <- c("13-17", "18-24", "25-34", "35-44", "45-54", "55-64",
http://staff.scm.uws.edu.au/~lapark/300958/labs/300958.week04.lab.html

2/4

10/15/2014

Simple Exposure Analysis

"65+")
> rownames(tab) <- c("Female", "Male")
> print(tab)

Try this for your data frames, use different rows (between 1 and 158).
Use barplot(tab, legend=TRUE, col=c("pink","lightblue")) to make a barplot.
Investigate changing the colours ( col=) and the beside= option.

2 ManualCalculations
The purpose of this section is to give you practice of manual calculations for confidence intervals
for proportions and 2 tests. You shgould use your calculator to verify that you can get these
results. If you don't have your calculator you can use R as a calculator.

2.1 ConfidenceIntervalsforProportions
z0.025=1.960 and z0.05=1.645.
The proportion of males that reach a page is 89 out of 104 in a particular week. Find
a 95% confidence interval for the proportion of males in the audience for this page.
ANSWER: (0.789, 0.923)
For 52 out of 200 find a 95% confidence interval for the proportion.
ANSWER: (0.199, 0.321)
For 441 out of 762 find a 90% confidence interval for the proportion.
ANSWER: (0.55, 0.608)
Your answers may be slightly different due to rounding.
Try this in R using prop.test (remember the results may be slightly different)

2.2 2tests
Find the 2 statistic and the degrees of freedom for the following tables.
<25 25-34 35+
Female 46

15 13

Male

31 16

123

ANSWER: 2= 3.7698, degrees of freedom = 2


http://staff.scm.uws.edu.au/~lapark/300958/labs/300958.week04.lab.html

3/4

10/15/2014

Simple Exposure Analysis

<25 25-34 35+


Female 107

106 92

Male

101 136

68

ANSWER: 2= 17.2952, degrees of freedom = 2


<25 25-34 35-44 45+
Female 58

87

76 55

Male

54

85 60

77

ANSWER: 2= 11.1275, degrees of freedom = 3

3 MoreRachallenge
Construct the 22 table containing the number of males and females (independent of age), versus
the months of April and May. Does a 2 test show that the reach for gender is independent of the
two months?

http://staff.scm.uws.edu.au/~lapark/300958/labs/300958.week04.lab.html

4/4

Vous aimerez peut-être aussi