Vous êtes sur la page 1sur 9

LinearRegressionPopulation

vs.CO2Emissionsin176
Countriesin2001

KristaBangs
Crow2nd
11/18/15
D/CStatistics

TheUnitedStatesisthesecondlargestemitterofcarbondioxideintheworldandisquicklycatchingupto
China,theworldslargestemitterofCO
.Thesetwocountrieshavethe3rdlargestandlargestpopulationsinthe
2
world,respectively.SinceboththeU.SandChinahavelargepopulationsandhighCO
emissionsIwantedtoseeif
2
populationdirectlyinfluencesacountry'sCO
emissions.Asanenvironmentallyconcernedcitizenandastudent
2
takingAPEnvironmentalScienceIdecidedtostudytheeffectofpopulationonCO
emissions.Iexpectedthat
2
populationwouldhaveastronglinearrelationshipwithCO
emissionsbecauserespiration,transportation,and
2
energyareallfactorsthatincreasewhentherearelargerpopulationsandtheyallemitCO
asbyproductsoftheir
2
functions.So,Idecidedtostudypopulationdensityvs.CO
emissionsinstead,becauseIthoughtitwouldbemore
2
interestingandpopulationdensityisanindicatorofurbanizationwhichmayleadtogreaterCO
emissions.I
2
anticipatedamoderatelystrongpositivelinearrelationshipbetweenpopulationdensityandCO
emissions.Ifound
2
populationdensitiesandCO
emissionsdatafor176countriesfrom2001whichIusedformyinvestigation.
2
TotesttherelationshipbetweenpopulationdensityandCO
emissionsImadeascatterplotwithpopulation
2
densitybeingtheexplanatoryvariableandCO
emissionsbeingtheresponsevariable.Thescatterplotshoweda
2
veryweakpositivelinearrelationshipwithaslopeof.00028.ThiswasmuchlowerthanIhadexpected.Iusedthe
fivenumbersummarytodetectanyoutliers.WhenIremovedtheseoutliers(someofthemwerealsoinfluential
points),theslopechangedto.00094,stillmuchweakerthanIhadexpected.Theinfluentialpointsthataffectedthe
slopeofthelinewerethecountriesthathadlowpopulationdensitiesandhighCO
emissions.Theoutliersthathad
2
highpopulationdensitybutlowCO
emissionswerenotinfluentialpointsbecausetheydidnotaffecttheslopeof
2
thelineofbestfit.TofindinfluentialpointsIlookedattheoriginalgraphandchosethepointsfarthestawayfrom
thelineofbestfitbecausethosewouldhavethemosteffectontheslope.Whentheinfluentialpointsandoutliers
wereremoved,theslopeincreasedandsodidtheRsquaredvalue,Rvaluebuttheyinterceptdecreased.
Theregressionequationforthefirstscatterplot(datawithoutliers)wasY=.00028X+4.64962.This
equationtellsusthateachtimethepopulationdensityofacountryincreasesby1CO
emissionsgoupby.00028
2
metrictons.Theregressionequationforthesecondscatterplot(datawithnooutliers)isY=.00094X+3.75997,
whichtellsusthateachtimethepopulationdensityofacountryincreasesby1CO
emissionsriseby.00094metric
2
tons.Theinterceptsarenotinterpretableincontextoftheproblembecauseifpopulationdensitywaszeroyouwould
nothaveanyCO
emissionotherthantheemissionsfromplantsandthenaturalCO
presentintheatmosphere,
2
2

whichisnotequalto4.64962metrictons(theyinterceptintheoriginaldatawithoutliers),or3.75997metrictons
(theyinterceptinthescatterplotwithnooutliers).
AstheoutliersweretakenouttheRsquaredvalueincreasedfrom0.00744(originaldata)to0.01368(data
withnooutliers).ThenewRsquaredvalueshowsaslightlysmallerresidualbetweenthelineofbestfitandthedata
points.ThenooutlierRsquaredvalueshowslargeresidualsbetweenthedatapointsandthelineofbestfit.The
RSquaredvalueinthebothscatterplots(outliersandnooutliers)showsthatthevariationintheexplanatory
variableisnotresponsibleforvariationintheresponsevariablebecausealmosteverydatapointhasapretty
substantialresidual.ThisdataprovesthatknowingpopulationdensitywillnothelpyoupredictCO
emissions.
2
R,themeasurementofhowstronglycorrelatedthevariablesare,improvesasoutliersareremovedfromthe
data.TheRvaluegoesfrom.086to.117showinganincreaseintheweakpositivelinearcorrelation.Asoutliersare
removedthepositivelinearcorrelationbetweenpopulationandCO
emissionsincreases.TheoriginalRvalueof
2
.086showsthatthereisaveryweakrelationshipbetweenpopulationdensityandCO
emissionsandthatthedatais
2
notverylinearlyrelated.

ScatterPlotwithall176countriesDataPresent
>plot(carbon$population.per.capita,carbon$CO2.emissions,main="PopulationDensityVs.CO2emissions",
xlab="PopulationDensity",ylab="CO2Emissions")
>abline(lm(carbon$CO2.emissions~carbon$population.per.capita))

linFit(carbon$population.per.capita,carbon$CO2.emissions)
Intercept=4.64962
Slope=0.00028
Rsquared=0.00744

GraphswithNoOutliers
>no<carbon[carbon$Population.per.capita<=5000,]
>No<carbon[carbon$population.per.capita<=5000,]
>View(No)
>Noo<No[No$CO2.emissions<=20,]
>View(Noo)
>plot(Noo$population.per.capita,Noo$CO2.emissions,main="PopulationDensityVs.CO2emissionsNOOutliers",
xlab="PopulationDensity",ylab="CO2Emissions")
>abline(lm(Noo$CO2.emissions~Noo$population.per.capita))


>linFit(Noo$population.per.capita,Noo$CO2.emissions)
Intercept=3.75997
Slope=0.00094
Rsquared=0.01368

ResidualPlots
>plot(carbon$population.per.capita,carbon$CO2.emissions(.00028*Noo$population.per.capita+4.64962),main="Re
sidualPlotPopulationVs.CO2emissions",xlab="PopulationDensity",ylab="CO2Emissions")
abline(a=0,b=0)


>plot(Noo$population.per.capita,Noo$CO2.emissions(.00094_*Noo$population.per.capita+3.75997),main="Resid
ualPlotPopulationVs.CO2emissionsNoOutliers",xlab="PopulationDensity",ylab="CO2Emissions")
>abline(a=0,b=0)

Thegraphsaboveshowthatsomedatapointshaveaverylargeresidualvalueandsomedonot.Sincethedatadoes
nothaveastronglinearrelationship,infactitbarelyhasoneatall,almostallofmypointshaveresiduals,another
indicatorthatthelinearregressionlineisnotthelineofbestfitandthedataisnotlinearlyrelated.
Whenanexplanatorydatapointisrandomlychosenandpluggedintotheoriginalequation(withoutliers
andinfluentialpoints)itproducesaresidualunderestimatingtheactualresponsedata.Switzerlandhasapopulation
densityof490peopleper/squaremileandCO
emissionsof5.9metrictons.Whenthisdataisputintothelinear
2
regressionequationforthedataincludingoutliersyouget4.64962metrictons.
>.00028*490
[1]0.1372
>+4.64962
[1]4.64962
TheresidualofSwitzerlandsdatapointinthescatterplotis1.25metrictons.Thispredictionisnotvalidandis
quiteabitundertheactualvalueofSwitzerlandsCO
emissions.
2
>5.94.64962
[1]1.25038

Linearregressionisnotthebestfitmethodforthisdatabecausethescatterplotshowssuchaweaklinear
correlation.Whentheoutliersandinfluentialpointsaretakenoutthedatatherelationshipisstillveryweaklylinear
soalinearlineisntagoodfit.Thebestfitlinewouldbeahorizontallinegoingthroughthemeanofallyvalues.
Thedatacollectedinthisinvestigationwouldhelpenvironmentalscientists.Theycouldusethisdatato
helpcreatenewenvironmentalpolicyoradvocateforpopulationcontrolsincethereisastrongpositivelinear
relationshipbetweenpopulationandCO
emissions.EnvironmentalScientistsareveryimportantbecausewith
2
globalwarmingandunsustainableamountsofgreenhousegassesinouratmosphereweneedtomakeinformed
choicestomaintainourplanetandpreserveitforfuturegenerations.
Overall,thedatashowsaveryweaklinearrelationshipbetweenpopulationdensityandCO
emissions.
2
SincetherelationshipissoweakyoucanconcludethatknowingpopulationdensitywonthelpyoupredictCO
2
emissionlevels.ThedataalsoshowsthathighpopulationdensitydoesnotcausehighCO
emissionsortheother
2

wayaroundbecauselinearrelationshipsdonotimplycausation.Therearemanyfactorsthatcouldberesponsiblefor
this.Sincepopulationdensityisanindicatorofurbanization,peopleinbigcitiesdontneedtotravelasmuch
becauseeverythingiscentrallylocated,soalthoughtheymayseemlikebigareaswithlotsofCO
emissionsmany
2
peoplewalktogetaround,takepublictransportation,anddontdrivecars,whichisoneofthebiggestfactorsofCO
2
emissions.Alsobigbusinesses,frequentlylocatedinurbanareas,areunderpressurefromenvironmentalgroupsto
useecofriendlyfuelsandhavelimitstohowmuchCO
theycanomitwithoutpayingafineorbreakingpollution
2
ordinances.AllthesefactorshelpexplainwhypopulationdensitydoesnotlinearlycorrelatewithCO
emissions.
2

WorksCited

"WorldDataBank."
TheWorldBankDataBank
.TheWorldBankGroup,n.d.Web.17Nov.
2015.

"PopulationDensityperSquareMileofCountries."
Infoplease
.Infoplease,2007.Web.17Nov.
2015.

Vous aimerez peut-être aussi