Académique Documents
Professionnel Documents
Culture Documents
II.
Here in the US, the internet consumes my life at home through schoolwork, social media, and
news. I am on my computer or my phone for the majority of my day, and if the internet was
ever destroyed, my life would be in pieces. In countries like India however, many people do
not have internet access. There is a lack of broadband and internet provider service, and what
internet and internet-capable devices exist are expensive and for the wealthy. People like
Mark Zuckerberg have therefore taken it upon themselves to popularize and cheapen the
internet in poorer countries in order to establish a truly worldwide connection. I thought that
the GDP per Capita and Internet Usage per 100 People of different countries were linearly
related because I thought that the higher a countrys economic output per person, the more
the countrys citizens would consume, spend, and have access to luxuries like the Internet.
On a scatterplot, I plotted points of 49 different countries by GDP per Capita (as the
explanatory variable) and Internet Usage per 100 People (as the response variable) based on
data taken in 2014 by the World Bank. I found that the two variables had a moderate,
positive, linear correlation.
The least squares regression line equation that models my data is y hat = 33.30492 + (0.00105)x.
In this equation, y hat is the predicted number of Internet Users per 100 People, while x is
the GDP per Capita in US dollars. A slope of 0.00105 means that as the GDP per Capita
increases by 1 dollar, the Internet Users per 100 people is predicted to increase by 0.00105
people. The y intercept is not relevant because when x = 0, the GDP per Capita =
GDP/Population = 0, which means that the GDP = 0. This would mean that the economy
produces no food or shelter at all, and therefore the country cannot survive and all countries have
a positive GDP. Nnb The r2 value is 0.62125, which means that 0.62125 of the variance of
Internet Users per 100 People can be associated with the variance of GDP per Capita. The r value
is 0.7881941, which indicates a moderate, positive, linear relationship between the two variables.
The residual plot of the complete data set is shown below. The plot suggests that there is an
x-value that is outlier because the points are clustered close to the y-axis.
To confirm the presence of an outlier, I made a boxplot for each variable. If a point is
positioned outside the range from the 25th Percentile (1.5 x IQR) to the 75th Percentile
(1.5 x IQR), then it is an outlier. As illustrated by the boxplots below, there was an outlier in
the x direction, but not in the y direction. That point, Qatar, is an influential point because on
the scatterplot it is positioned away from the least squares regression line.
If I remove the outlier, then the r2 value of the data increases to 0.68104, the r value increases
to 0.8252515. The intercept decreases to 30.25639 Users/100 People, while the slope
increases to 0.00128 Internet Users/GDP per Capita.
Prediction: Arbitrarily selected explanatory data point: Hungary, with a GDP per Capita of
$13902.7 and 76.1 Internet Users per 100 People.
y hat = 33.30492 + (0.00105)x.
x = GDP per Capita = $13902.7
Business analysts would use this type of analysis to their jobs. They would be identifying
trends in sales and performance as well as analyzing risks and consumer behavior in order to
maximize profits for a business. These analysts are the reason business are able to make
educated decisions about when and where to expand, which products to stop manufacturing,
and what products to introduce.
III.
IV.
GDP per Capita and Internet Users per 100 People have a moderate, positive, linear
relationship, but can be better represented using logistic regression. This means that when
GDP per Capita rises, the number of Internet Users per 100 people is predicted to rise. The
two variables are best represented through a logistic regression, rather than a linear
regression. From the logistic regression, we can see that 0.78219 of the portion of the
variance in Internet Users per 100 people can be explained by the GDP per Capita, which is
very high for real-life data. We cannot, however, conclude that the changes in GDP per
Capita causes changes in Internet Users per 100 people because there is the possibility of a
lurking variable.
Works Cited:
"World Development Indicators." World Databank. The World Bank, 2014. Web. 14 Nov.
2015.<http://databank.worldbank.org/data/reports.aspx?Code=SL.UEM.TOTL.ZS&id=af3ce
82b&report_name=Popular_indicators&populartype=series&ispopular=y#advancedDownloa
dOptions>.
V.
R Code:
> `Lee,M_Project2DataVersion2` <- read.csv("~/Lee,M_Project2DataVersion2.csv")
> lee<-`Lee,M_Project2DataVersion2`
> plot(lee$GDP.per.Capita, lee$Internet.Users, main = "GDP per Capita vs Internet Users",
xlab = 'GDP per Capita (Current US$)', ylab = 'Internet Users per 100 People')
> abline(lm(lee$Internet.Users~lee$GDP.per.Capita))
> boxplot(lee$GDP.per.Capita, horizontal = TRUE, xlab = 'GDP per Capita (Current US$)',
main = 'GDP per Capita Boxplot')
> boxplot(lee$Internet.Users, horizontal = TRUE, xlab = 'Internet Users per 100 people',
main = 'Internet Users Boxplot')
> linFit(lee$GDP.per.Capita, lee$Internet.Users)
Intercept = 33.30492
Slope = 0.00105
R-squared = 0.62125
> plot(lee$GDP.per.Capita, lee$Internet.Users - ((.00105*lee$GDP.per.Capita)+33.30492),
main = "Residual Plot of GDP per Capita vs Internet Users", xlab = 'GDP per Capita (Current
US$)', ylab = 'Internet Users per 100 People')
> abline(a=0, b=0)
> graph<- lm(lee$Internet.Users ~ lee$GDP.per.Capita)
> resid(graph)
10
11
12
13
14
16
17
18
19
20
21
23
24
25
26
27
28
30
31
32
33
34
35
37
38
39
40
41
42
44
45
46
47
48
49