Vous êtes sur la page 1sur 2

Summary of R Commands for SDS 302

The following is a summary of R commands that we will be using in SDS 302. Please refer to
the pre-labs for examples of their usage, including the appropriate arguments of the commands.

Basic arithmetic and logic


= = equal to <, >, <=, >= inequality operators # identifies a comment
!= not equal to & or | and or or log(x, base) solves logbx

Import data from class package


library(SDSIntroR) calls the package
bike <- BikeData imports BikeData as bike

Identify a variable or a value


Karloff$Sex identifies the variable Sex in the dataframe Karloff
Karloff[n, k] identifies the value in the nth row and kth column of a dataset

Create a vector
newdata <- c(1,3,5,7,9) creates a vector containing the numbers 1, 3, 5, 7 and 9
credit <- Karloff$CreditHours creates a vector of credit hours for all students in Karloff
full_temp <- Karloff$Temp[Karloff$Full.Time == Yes] a vector of temp of full-time only

Create a dataframe
males <- Karloff[Karloff$Sex==M,] creates dataframe with just the males from Karloff
fulltime <- Karloff[Karloff$Credit.Hours >=12,] creates dataframe with students with 12+ hrs

Descriptive statistics
fivenum(x) data summary of variable x
mean(x) sample mean of variable x
sd(x) sample standard deviation of variable x
cor(x,y) correlation between variable x and variable y
pnorm(0.375) area under the normal curve at or below z = 0.375
1 - pnorm(0.375) area under the normal curve above z = 0.375

Histogram options
hist(x) histogram of variable x
hist(x, main = title, xlab = x axis label) with title and axis labels
hist(x, n=15) with 15 bins
hist(x, breaks=seq(1.5,5.25,.25)) with bins of size .25 that range from 1.5 to 5.25

Missing data corrections


mean(x, na.rm=T) sample mean of variable x omitting all NA
newdata <- na.omit(mydata) suppresses all observations with missing data
Counts and probabilities
length(x) number of values in x
table(x) frequency of each value of variable x
table(x,y) conditional distribution of variables x and y
prop.table(table(x)) probabilities of each value of categorical variable x
prop.table(table(x,y)) returns conditional probabilities of y, given x

Displays
plot(x, main = title, xlab = x axis label) barplot of categorical variable x
barplot(table(x,y), Legend = T, beside = T, main = title) side by side barplot of variables x, y
boxplot(x) boxplot of variable x
boxplot(x~y) side-by-side boxplots of variable x by category y
plot(x,y, main = title, xlab = x axis label, ylab = y axis label) scatterplot of y against x
abline(lm(y~x)) generates line-of-best-fit on scatterplot

Modeling
linFit(x,y) fit a linear model to predict y from x
expFit(x,y) fit an exponential model to predict y from x
logisticFit(x,y) fit a logistic model to predict y from x
tripleFit(x,y) fit all three models simultaneously
Predict values
expFitPred(x,y,95) use exponential model to predict value of y when x=95
logisticFitPred(x,y,95) use logistic model to predict value of y when x=95
Random sampling
sample(x,n=10) draw random sample of size n=10 from variable x
t-tests
t.test(x, mu=100) run one-sample t-test where the null says =100
t.test(x, mu=100, alternative = greater) run one-tailed t-test where alternative says >100
t.test(x1, x2) run two-sample t-test

Chi Square
chisq.test(table(x,y), correct = F) test of independence for variable x and variable y
chisq.test(table(x), p=c(.25,.25,.25,.25), correct = F) goodness of fit test where expected
values for variable x are equally distributed with 25% in each category

ANOVA
aggregate(Pulse~Class,Karloff,mean) mean of pulse for every class in the Karloff dataset
aggregate(Pulse~Class,Karloff,sd) sd of pulse for every class in the Karloff dataset
model <- aov(Karloff$Pulse~Karloff$Class) compare pulse rates of different classes
summary(model) shows the summary table from the ANOVA test
TukeyHSD(model) shows the results of the post-hoc TukeyHSD test

Vous aimerez peut-être aussi