Statistics Term Project

Term Project
Erik Kantrowitz
STA 3032
11/24/2014
PART A
1.
General Electric vs. Siemens
A.
General Electric Closing Price (September 2011

September 2014)
Mean
Standa
rd Error
22.501
02
0.2647
6
Standa
rd
Deviati
on
3.3174
3
Varianc
e
Q1
Median
Q3
IQR
Range
11.005
34
19.84
23.12
25.51
5.67
13.13
Siemens Closing Price (September 2011 September 2014)

Mean
Standa
rd Error
Standa
rd
Deviati
Varianc
e
Q1
Median
Q3
IQR
Range
109.75
86
1.2651
51
on
15.852
3
251.29
53
98.52
106.16
125.71
5
27.195
57.52
B.
Through analysis of the given data Siemens displays a greater variability over the
period of September 2011 through September 2014
C.
General Electric
20% Trimmed mean of volume of shares
purchased
194248489.1
Siemens
20% Trimmed mean of volume of shares
purchased
2287490.583
2.
United States Census
A.
ProbabilityPlot of 50STATESCENSUS
Normal
99
Mean
6053834
StDev
6823984
N
51
AD
3.958
P-Value
<0.005
95
90
80
Percent
70
60
50
40
30
20
10
5
-10000000
10000000 20000000
50 STATESCENSUS
30000000
40000000
B.
The Census Data does not follow a normal distribution, this is made obvious by how
different the graph is to the regression line in the graph, after further analysis of the data
it would appear that the distribution resembled a lognormal distribution, rather than a
normal
Probability Plot for 50STATESCENSUS

Normal - 95%CI
Exponential - 95%CI
99
Percent
Percent
Normal
AD =3.958
P-Value <0.005
90
90
50
50
10
Exponential
AD =0.480
P-Value =0.511
10
1
20000000
40000000
100000
50STATESCENSUS
1000000
10000000
50 STATESCENSUS
Lognormal - 95%CI
Weibull - 95%CI
99
99.9
90
Percent
90
Percent
Goodness of Fit Test
99.9
50
Lognormal
AD =0.366
P-Value =0.422
Weibull
AD =0.479
P-Value =0.233
50
10
10
1
100000
1000000
10000000
100000000
1
10000
100000
50 STA TES CENSUS
1000000
10000000 100000000
50 STATESCENSUS
C.
Populationof states
Missouri
4.4%
Category
Florida
13.9%
Texas
18.5%
Illinois
9.5%
Georgia
7.1%
New York
14.3%
Indiana
4.8%
California
27.5%
3.
Golf Association Distances
Florida - 13.9%
Illinois - 9.5%
New York - 14.3%
California - 27.5%
Indiana - 4.8%
Georgia - 7.1%
Texas - 18.5%
Missouri - 4.4%
A.
Stem-and-Leaf: Distances
Leaf Unit = 1.0
Ste
m
22
23
24
25
Leaf
6
2334677
001124445578
011112233444455555566
77899
26 000011123334444566778
88
27 000011222223333344466
788999
28 0035
Through analysis of the stem and leaf it appears that between 250 and 280 seems to be
the most common range. With the median being around 260.
B.
Mean
Standard
Deviation
13.40828
260.302
Median
90th percentile
260.85
277.68
C.
Histogramof Distances
Normal
16
Mean 260.3
StDev 13.41
N
100
14
Frequency
12
10
8
6
4
2
0
230
240
250
260
Distances
270
280
290
The Histogram above has a left skewed distribution, with a peak just after 250 and
dropping down right before 280.
D.
The histograms peak lines up with what was shown in the stem-and-leaf diagram, if you
were to tilt the stem-and-leaf diagram onto its side it would look the same as the
histogram.
E.
Boxplot of Distances
290
280
Distances
270
260
250
240
230
220
The Boxplot shows the interquartile range between 250 and just over 270, very similar to
what was seen in the
histogram and the stem-and-leaf diagram, also it shows the median
around 260.
F.
The boxplot is a much better interpretative tool than the stem-and-leaf if the median and
quartiles are what is needed, although the stem-and-leaf plot is better if access to the
individual, for instance if I was looking for the mode of the data the boxplot would not be
much help, but with the stem-and-leaf diagram it is easily determined to be 155 yards.
G.
With an interval of two standard deviations being the same as 95 percent, the 95 percent
confidence interval is (257.64, 262.96)
Alternative (233.48,287.12)
4.
Histogramof WEIGHT
Normal
Mean 3238
StDev 566.8
N
98
14
12
Frequency
10
8
6
4
2
0
2000
2400
2800
3200
WEIGHT
3600
4000
4400
B.
DRIVSTAR
Count
2
3
4
5
All
4
17
59
18
98
PASSSTAR
% of % of
Row Column
100
100
100
100
100
4.08
17.35
60.20
18.37
100.00
Percent of cars with 3 stars or less is

21.43%
Count
2
3
4
5
All
2
23
52
21
98
Row
% of
% of
Column
100
100
100
100
100
2.04
23.47
53.06
21.43
100.00
Percent of cars with 3 stars or less is

25.51%
The percent of cars with less than or equal to three stars varies depending on which
column of data we look at, but both fall right around the first quartile range
C.
The PASSCHEST mean is 50.224 while the DRIVCHEST mean is 49.663, therefore the
DRIVCHEST does not exceed PASSCHEST. The difference between DRIVCHEST and
PASSCHEST is .561, therefore PASSCHEST has a higher injury rate
D.
Variable
Mean
DRIVCHEST
98
49.663
PASSCHEST
98
50.224
St
Deviation
6.670
7.107
SE Mean
99% CI
0.674
47.893,
51.434
48.338,
0.718
52.111
Part B
Big data is defined as data that is too large to be processed currently with conventional
processing power, according to Thomas Davenport, the author of a book titled Big Data at
Work he describes big data as too big to be processed on one server, too fast-moving to be
sequestered in a data warehouse, or too unstructured to fit into a conventional database.[1]
This is a definition that will change as technologies are developed and new data emerges. Big
data is something that is not limited to one field of study or even just a small cluster but as
Davenport describes across almost every field of human endeavor, research suggests that data
and analysis yield more accurate and reliable decisions. [1] Big data brings with it seemingly
endless possibilities, from the internet of things which can lead to a smarter home, cars, and
factories, to improving how companies do business, big data is changing the way data is
analyzed and how it effects everyday life.
One way big data is changing the world we live in is with the internet of things, the internet of
things is a term used to describe the ever increasing connectivity of objects in our life, things like
a TV that we can control with our tablet, using a smartphone to remotely lock and unlock your
doors, or even dimming the lights from half way across the world with an app on your phone,
these things collect data and use that data to bring the user a more personalized experience.
Allowing your alarm clock to slowly raise the intensity on the room lights creating a more ideal
wake up for the individual. Currently though most Internet of things smart devices arent in your
home or phone they are in factories, businesses and healthcare [2]
One big (excuse the pun) example of how the Internet of Things is effecting everyday life is the
Smart factory. The idea of a smart factory is to have a fully functional factory using intelligent
machines connected by an Industrial Internet. [3] GE is partnered with Amazon and are already
working on a development of a Hadoop-based software platform that would allow for big data
analytics within an interconnected smart factory system. That is not the only way big data is
making factories smarter though, with analysis of all the data gathered on the factory floor errors
can be identified and corrected, optimizing the factories production and saving money in the long
run. [3]
Optimization is a huge thing for factories, and the enormous amount of data that these factories
can create is huge, for instance Raytheon Corp. monitors its assembly operations down to the
turn of a screw the factory will shut down when it detects a problem to decrease manufacturing
defects. [The] industry stands to reap many benefits from Big Data as more sophisticated and
automated data analytics technologies are developed.[4] Through this optimization Big Data
can also decrease work place accidents by decreasing the number of people who are needed to
run a plant. For example, FANUC Robotics, a Japanese producer of industrial robots, has
automated its newest tool plant to the point where it can run mostly unattended for 700 hours.
[4] that is not to say that a factory can operate completely on its own, but that the people
running the plant are not in as dangerous of a position, they can be sitting at a command module
rather than right next to the dangerous machinery.
As Big Data expands and more tools are developed to gather more data, the result is more data
that can be used. One example of this is the new Ford focus electric, the car gathers data when
being driven and gives the driver all sorts of useful information from this data like how far away it
is to the nearest charging station. The data does not end there though, because even when the
car is not being driven the sensors continue to stream data about the tire pressure and battery
system. [4] The data analysis does not end there though, because Ford gets information from
the car as well, engineers in Detroit use the cars communication modules and remote
application management software to aggregate the information about driving behaviors using
this data Ford can better increase their knowledge on how their cars are used so that when they
go to develop the next one they have all this data at their fingertips. [4]
For all the great things with Big Data, there are also plenty of issues with it, one of which is it still
being so new and the data sets so big, the analysis technology cannot keep up. Part of the
definition of big data from earlier is that it is data sets that are too big for current processing
technology. Because of this big data runs into challenges of finding bogus correlations within the
data, bogus correlations can be thought of as coincidences, they are things that happen
randomly but using such large data sets can appear to be significant. If you look 100 times for
correlations between two variables, you risk finding, purely by chance, about five bogus
correlations that appear statistically significant.[5]
Despite these issues, big data is constantly growing and maturing. With this continuing maturity
our ability to use this data to better the world we live in grows as well. Being able to track and
predict inventory so that your groceries are shipped to your house before you even realize you
needed more. It is not hard to see a world so interconnected, that our objects start to work
together to make life easier, and as factories become more interconnected their ability to
manufacture more proficiently will spark a drive for more innovation. Big data and smart
technology, even being as new as it is changing the world around us, as it continues to develop
more it brings limitless possibilities, to improve every facet of peoples lives, bringing the future
that science fiction movies have been portraying for decades into a reality. With the way big
data is advancing it is not hard to imagine a world like the Jetsons in the near future.
[1] Smith, N., "What's the big (data) idea? [Book interview]," Engineering & Technology ,
vol.9, no.4, pp.92,93, May 2014, http://ieeexplore.ieee.org/stamp/stamp.jsp?
tp=&arnumber=6823999&isnumber=6809530
doi: 10.1049/et.2014.0434
[2] Data Floq,. How the internet of things will make a smart world, [online] 2014 :
https://datafloq.com/read/internet-of-things-will-make-our-world-smart-infographic/302
(Accessed: 20 November 2014).
[3] Data Floq. The industrial internet will bring a revolution to the manufacturing
Industry, [online] https://datafloq.com/read/industrial-internet-bring-revolutionmanufacturing-industry/141 (Accessed: 20 November 20140
[4] Noor, Ahmed. Putting big data to work, Mechanical engineering magazine, [online] 2014.
https://www.asme.org/wwwasmeorg/media/ResourceFiles/Network/Media/Mechanical
%20Engineering%20Magazine/1013BigData.pdf (Accessed: 20 November 2014)
[5] Marcus G, Davis E. BRW, Nine large problems with big data [online] 10 April 2014,
http://www.brw.com.au/p/techgadgets/nine_large_problems_with_big_data_BOkbvT5G7f6Y2Jc2qi
MgGM (Accessed: 20 November 2014).

Statistics Term Project

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Statistics Term Project

Transféré par

Droits d'auteur :

Formats disponibles

Term Project

General Electric Closing Price (September 2011

Siemens Closing Price (September 2011 September 2014)

Probability Plot for 50STATESCENSUS

Goodness of Fit Test

50 STA TES CENSUS

Percent of cars with 3 stars or less is

Percent of cars with 3 stars or less is

Vous aimerez peut-être aussi