Vous êtes sur la page 1sur 7

R Notebook

How to select file from dialogue box using R and Load into Data Frame
df = read.table(file.choose(), header = TRUE, sep = ",")
View(df)

Extract Column from data frame


brain = df["Brain"]
print(summary(brain))

## Brain
## Min. : 79.06
## 1st Qu.: 85.48
## Median : 90.54
## Mean : 90.68
## 3rd Qu.: 94.95
## Max. :107.95

Perform Data Visualization using ggplot2


library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.4.4

custdata = read.table(file.choose(), header=TRUE,sep='\t')


View(custdata)
ggplot(custdata) + geom_histogram(aes(x=age), binwidth=5)
Let’s increase some
binwidth and see what happens
ggplot(custdata) + geom_histogram(aes(x=age), binwidth=10)

Let’s Plot some


Categorical Data
ggplot(custdata) + geom_bar(aes(x=marital.stat))

Apply Corelation on data points


cor(custdata$age, custdata$income)

## [1] 0.02742927

custdata2 = subset(custdata,(custdata$age>0 & custdata$age


<100 & custdata$income>0))
cor(custdata2$age, custdata2$income)

## [1] -0.02240845

Business Data Analysis We have yelp_academic_business_dataset for data visualization


library(ggplot2)
business_data = read.csv(file = "E:/Training/Coursera/Social Media Data Analy
itcs/Week 3/R Data Analytics/yelp_academic_dataset_business.json.csv")
ggplot(business_data) + geom_bar(aes(x=state), fill="gray")
Now We are going to Plot Pie-Chart illustrates Stas distribution
ggplot(data=business_data, aes(x=factor(1),fill=factor(stars))) + geom_bar(wi
dth=1)+coord_polar(theta = "y")
Perform Visualization on Yelp’s User Profile Data Set
user_data = read.csv(file = "E:/Training/Coursera/Social Media Data Analyitcs
/Week 3/R Data Analytics/yelp_academic_dataset_user.json.csv")

Let’s get some insights from the data.


user_votes = user_data[,c("cool_votes","funny_votes","useful_votes")]
cor(user_data$funny_votes,user_data$fans)

## [1] 0.7312495

Let’s do some regressions analysis on our data set


my.lm=lm(useful_votes ~ review_count + fans , data = user_data)
#useful_votes:- Dependent variable , review_count+fans:- Independent varaible
s #
coeffs = coefficients(my.lm)
coeffs

## (Intercept) review_count fans


## -18.259629 1.419287 22.686274

Plot Number of Reviews


ggplot(user_data) + geom_bar(aes(x=review_count), fill ="gray")
Applying K-means Clustering which is unsupervised machine learning algoritm. K-means
finds How data is distributed and whether there is spacific structure exist in Data.
userCluster = kmeans(user_data[,c(3,11)],3)
ggplot(user_data,aes(review_count,fans,color=userCluster$cluster))+geom_point
()