Vous êtes sur la page 1sur 5

# Running Linear Regression using mysql and R This is a basic introduction to regression using mysql.

We will verify the results using R which is an excellent tool for data analysis. We source our data from Yahoo Finance.
Retrieve data from Yahoo Finance export from yahoo finance into G .csv and G!"#.csv Create tables in mysql cr\$n mysql create the tables as shown above.

Load the data into the newly created tables \$ am loading as root. %lso& note #'( )forward slash not bac*ward slash +,+ -

.ata /uality /uality of data is paramount. \$ am going to create a new table with data for G!"# and G ensuring identical dates.

\$ will treat G!"# as the independent var and G as dependent var. %s if G!"# has a reliable predictive capability. \$ need these calculations from wi*i

\$ will do these in !/0 using mysql math functions. /uery for interecept' 12.34 select avg)y- 1 )avg)x5y- 1avg)x- 5avg)y--()avg)x5x- 1avg)x-5avg)x-- 5 avg)x- from ols /uery for slope' 6.62 select )avg)x5y- 1avg)x- 5avg)y--()avg)x5x- 1avg)x-5avg)x-- from ols \$n continuation of this thread' !lope is

\$ntercept is

the dataset and these sql in the attached xls. \$s this 7athematically fine8 9ut does G!"# really cause a move on G li*e that8 #ould this be random8 :ow li*ely is this random8 We need R1!quared 11 this gives us the goodness of fit8 )is this formula trust worthy8 does ; explain Y8Then comes "1<alue \$ will cover that in "art \$\$\$. Recall that ols has the timeseries of G!"# and G we can extract the timeseries into a csv file as follows'

\$ have xy.csv let us install R and load this file into R as follows' data =1 read.table )>#',,?sers,,mari,,.ownloads,,xy.csv>&header@TR? &sep@>,t>if u examine data it will have two columns x=1dataA&BC y=1dataA&2C data has two vectors accessed as dataA&BC and dataA&2C y =1 dataA&BC x =1 dataA&2C

## D we run a linear regression m =1 lm ) y E xsummary)m-

intercept is 12.3FG slope 2.64He162 11 precisely the intercept and slope we had come up with& using mysql. !ince p1value is 2e1B4 much less than 6.6F or 6.6B we reIect the null1hypothesis. This being the single variable linear regression& the null :ypothesis is slope is Jero and there is no relationship between G!"# and G . We reIect that there is no relationship between G!"# and G . p-value is statistical measure 11 its significance is of statistical origin. !tatisticians quibble about > vidence of %bsence vs %bsence of vidence 11K % L@ % . #orrelation and causation are two different things. !ome of you enquired about ! and t1test etc& http'((stattre*.com(regression(slope1test.aspx is your source.

\$ have screenshots that show typos and miscellaneous problems. Getting multiple technologies requires determination to complete and never quit. We have to persist and to highlight that \$ have shown the snafus \$ experienced. /uestions and feedbac* are always welcome.