Imperial College London. Coursework on Machine Learning and Neural Computation.

Attribution Non-Commercial (BY-NC)

33 vues

Imperial College London. Coursework on Machine Learning and Neural Computation.

Attribution Non-Commercial (BY-NC)

- Matlab Basics
- Control Manual PGD Carel User Interface UK (1)
- Advanced Engineering Mathematics 3rd Ed - Merle C. Potter Et Al. (OUP, 2005)
- 2013 Cape maths Specimen Paper Unit 2 Paper 1
- Student Solution Chap 02
- 36435784 s7 Wcf Blocks Supplements En
- Building Skills in Python
- Foreign Donor Money in USA Elections
- CM-ACAD-VBA
- Pscad Manual(1)
- Excel Function
- CL 400 PRESENTAT..
- HW1-Solns.pdf
- Plsql Procedures
- Technical_Specification DEBMAS IDOC
- Positioning Tutorial MEXLv2
- PRO1_11E [읽기 전용] [호환 모드]
- UCLA CS 31 Lecture 6 Post
- Research Design
- ZFI6IN11

Vous êtes sur la page 1sur 8

Coursework issue date: 26/10/2013

Coursework submission: 18/11/2013 (online via CATE; New submission date with 3 extra days)

Coursework return & feedback: within 14 days after submission

Coursework has to be submitted individually and will be marked as such. You are welcome to discuss

during the lab sessions your work with other students and your GTAs. All text-based files must

contain a commented line identifying the Name and CID of the submitter. Submissions may be

individually checked for plagiarism.

Download and open the coursework zip-file from

https://www.dropbox.com/s/f95x4km5kxbyewf/MLNCAssessedCoursework2013-2014.zip

The focus of this coursework is supervised learning, and has three questions A, B and C.

Please read all the coursework through carefully, understand it and then only start working on it. Your

coursework submission should not be longer than 2,000 words (shorter submissions are quite

welcome; code is of course excluded from the word count). All figures should conform to the

visualising scientific data guidelines explained at the end of this coursework.

Space-time series of 1-bedroom/studio rents in London

You have been provided with a spatio-temporal dataset of over 57,000 rental prices (pcm) for single

bedroom/studio properties in London along with corresponding Lat/Long coordinates. Also provided

is a set of coordinates for all stations on the London Underground network.

This is a real dataset collected using automated scripts from current sources (our crawler has been

running since October 2012). Please, note that as with ever real-world data, there may be some

corrupted or strange data points. E.g. there may be a number of properties that are not located within

Greater London.

The aim of this project is to use supervised learning techniques from the course to learn the supplied

sample data mapping from geographical location and date to rental costs per calendar month. Thus,

for example for an arbitrary location in Greater London and date of the recent past and present we

want to know the predicted rental price from your machine learning system. You can choose any of

the methods discussed in the course, provided that you implement them yourself from first principles.

Since this dataset contains geolocation information, you are invited to use Google Maps

(http://maps.google.com) to search for the specific locations of rental prices (e.g. paste longitude and

latitude numbers into the map search window).

!"#$%& ( )**%+",-.&/ *0 1%*1&%."&/ 234$& +*./5 ", +-.-/&. -,+ .$3& /.-."*,/ 2%&+ 6"%64&/57 89"/ 0"#$%& :-/ 6%&-.&+ $/",# .9&

3&4*: ;-.4-3 6*<<-,+/

(

7

1

MaLlab commands Lo generaLe Lhe flgure

ploL(renLal(:,4),renLal(:,3),'.')

hold on

ploL(Lube.locaLlon(:,2),Lube.locaLlon(:,1),'ro')

axls equal

ylabel('LaLlLude [^\clrc]')

xlabel('LonglLude [^\clrc]')

1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6

51

51.1

51.2

51.3

51.4

51.5

51.6

51.7

51.8

51.9

52

L

a

t

i

t

u

d

e

[

c

i

r

c

]

Longitude [

c

irc]

In your coursework zip directory you will find the following files:

rental.mat contains the location and price data that you will be using for the exercise,

stored as a structure named rental, with four columns 1. rent ( rent per

calendar month), 2. time stamp

2

and finally geographic location in terms of

degrees 3. latitude and 4. longitude. In addition it also contains a struct named tube,

which contains two fields station and location with all London Underground stations.

trainRegressor.m an empty stub file you have to write in

testRegressor.m an empty stub file you have to write in

sanityCheck.m see explanation in Question B

In the following sub-questions you will be required to generate plots from data or your

results. Note: Plots that do not have appropriate labels (i.e. the name of dimension and the

units in square brackets, e.g. Rent []) will not be counted. Therefore, figures should be

exported so that lines and data points are clearly visible when exported and any text in the

figures is clearly readable. Use export figure to export the figure from Matlab and store it

as PNG. Untidy or grainy figures will not be considered.

2

This date information is given in Matlab datenum format, use datevec to convert into normal calendar dates.

A [20%]

Familiarise yourself with the data set. Plot the rental prices over time. Now we want to know if

the rental prices increased with time or did they remain constant? Fit a 0

th

order and a 1

st

order

polynomial basis function set to this plot.

a) Give your ML estimates of all parameters involved, including the noise in the model.

b) Which model is more likely? The 0

th

order (flat prices) or the 1

st

order (linearly increasing

prices?) Justify your answer with the data.

c) What may be a problem in comparing the 1

st

order and 0

th

order model using likelihoods?

B [45%]

Next, we are going to pool the data together across time (ie. we just have price and position):

a) Visualise in a 3D plot (plot3) the rental price raw data for Greater London (rotate the view

appropriately to be as informative as possible).

b) Write the two trainRegressor and testRegressor mentioned functions and train

your system with the data set. Your task is to write two functions (the function files are

provided for your first convenience), so as to be able to answer the subsequently described

subquestions:

1. trainRegressor.m a Matlab function that accepts training inputs trainIn (a

two-column matrix of Longitude and Latitude pairs, where the rows correspond to

different training data points) and training outputs trainOut (a column vector of

rent prices, where each row rental price corresponds to the rental location in trainIn ).

The function is to return a single variable (potentially a structure or matrix)

params.

>>params = trainRegressor ( trainIn , trainOut )

2. testRegressor.m a Matlab function that accepts testing inputs testIn (a

two-column matrix of Longitude and Latitude pairs, where the rows correspond to

different test data points) and params (the parameter variable created by

trainRegressor). The function should return a column vector of rental prices for

the corresponding locations called results.

>>results = testRegressor ( testIn , params )

To test your code sanitycheck.m is a Matlab function that will check whether

your two functions match the specifications we use. Use the function with the

following command in matlab:

>>sanityCheck ( @trainRegressor ,@testRegressor )

Note: If your functions do not pass the automatic tests performed by this function we

will be unable to automatically evaluate their performance and may deduct marks.

Your code should be documented with sufficient comments so a stranger is able to follow

why and what and you did. Any free parameters that you do not want or cannot learn from the

provided data set, have to be set as constants within the trainRegressor function.

Explain and discuss your solution approach and how you chose any parameters that you set as

constants. We suggest that you use Gaussian Basis functions

c) Test your two functions by calculating the Root Mean Square error (RMSE) of your rental

price predictions, by performing cross-validation (you will have to choose suitable sizes for

the training and testing data chunks). How does the number of training points affect the

accuracy of your regression? Hint: acceptable solution will have typically a RMSE lower than

950 per calendar month.

d) Calculate the predicted rental price at all London tube stations, what is the mean and standard

deviation of these rental prices?

e) Use a bar chart (bar) plot to visualise the rent profile for the Central Line from Ealing to

Stratford.

f) What is the price to rent at Imperial College Campus (Latitude 51.499019, Longitude -

0.176256), and discuss if you believe the predicted value based on the data and the machine

learning strategy used by you.

g) What is the rent price at Upminster station, and discuss if you believe the predicted value

based on the data and the machine learning strategy used by you.

h) Generate a finely spaced grid of sample points (e.g. you could use the Matlab function

meshgrid) and visualise the landscape of London rental prices using the function surf

and add the tube stations (but not necessarily all tube stations names) in the plot (using

plot3 and text). Rotate the view so that you see London from above, you will thus get a

heatmap. Use colorbar to further annotate the plot. This allows you to answer the

following questions:

1. What is your predictors most expensive location in London and where is it located?

Provide price, latitude and longitude and use Google maps to visualise your location.

2. Where is the price gradient largest - i.e. where is the transition from an expensive area

to a cheaper area steepest?

i) How good does your machine learner predict your own rent and what is the relative error?

C [35%]

Now, we want to account for the passage of time and changes in rental prices. To this end we

will augment our data space to include position, price and time.

a) There are three main ways to achieve this:

1. Augment your sum of basis functions that operate on space (but not on time) with

additional basis functions that operates on time (but not on space).

2. Use basis functions that all take space and time as joint argument.

3. Chunk the data in time and repeat the analysis of B on each chunk.

Discuss each approach individually and sketch out a solution approach in a paragraph each.

What could be challenges or problems to explain the data using each approach?

Choose approach 1. or 2. implement them, and show and compare your results with those of

approach 3. If your choice requires you to write new regressor functions (e.g. ones

that take 3-column input arguments, etc.) please call them trainRegressorTime

and testRegressorTime (You will have to generate the files yourself,

following the template from question). Plot the change in prices for each month of the

data set as predicted by your regressor for each state of the Central Line (choose a

suitable way of visualising all this in a single figure).

1. trainRegressorTime.m a Matlab function that accepts training inputs

trainIn (a three-column matrix of 1. Datevec dates, 2. Longitude and 3. Latitude,

where the rows correspond to different test data points) and training outputs

trainOut (a column vector of rent prices, where each row rental price corresponds

to the rental location in trainIn ). The function is to return a single variable

(potentially a structure or matrix) params.

>>params = trainRegressorTime ( trainIn , trainOut )

2. testRegressorTime.m a Matlab function that accepts testing inputs

testIn (a three-column matrix of 1. Datevec dates, 2. Longitude and 3. Latitude,

where the rows correspond to different test data points) and params (the parameter

variable created by trainRegressor). The function should return a column vector

of rental prices for the corresponding locations called results.

>>results = testRegressorTime ( testIn , params )

b) Which coordinates displayed the highest increases in rental prices? Show and state your

results visualised on a London map.

c) Is the temporal change in rental prices a strong effect? We can test this by building a

nave Bayesian classifier that given a rental price and location (and using your regressors

from the previous sub-questions to give you a likelihood in time) can predict whether it

was from the early (e.g. first half) or late (e.g. second half) part of the data set? Choose a

suitable and convincing way to test your classifier on the data.

IT IS VERY IMPORTANT THAT YOU FOLLOW THE FUNCTION CALL CONVENTIONS

THAT ARE SPECIFIED. WE USE AUTOTESTING SCRIPTS TO TEST YOUR CODE. IF YOUR

CODE BREAKS DOWN IN AUTOTESTING BECAUSE IT IS NOT FOLLOWING THE

SPECIFIED CONVENTIONS THE GTAS CAN NOT BE EXPECTED TO FIX YOUR CODE TO

MAKE IT WORK. CODE THAT DOES NOT WORK MAY RESULT IN A DOWNGRADE OF

YOUR GRADE.

PLEASE SUBMIT ALL THE CODE THAT GENERATED YOUR FIGURES OR RESULTS,

ALONGSIDE THE EXPLICITLY REQUESTED MATLAB FILES. YOUR REPORT SHOULD BE

SUBMITTED AS PDF FILE.

How to visualise and present your scientific data for this coursework

(and many other applications):

A figure says more than 1000 words, ideally about what it should say or raise in as many words

doubts the authors abilities...therefore: Each figure deserves to look good. So make figures in

Matlab/Illustrator/Powerpoint/GIMP/TGIF etc. It is well worth making good looking figure. A figure

is almost always preferable to text. Figures are usually viewed in a small form. So use large fonts. It is

often a good idea if the caption of the figure allows understanding the conclusion from the figure.

Ideally, the figures allow understanding the entire paper without reading. Use the same font across all

figures and at most 3 sizes. A figure should not have more than 8 panels and usually at least 2.

Reference all your figures and subfigures in your main text (everything here holds also for tables,

numbered equations, etc). A figure not referenced is a figure that will be considered invisible by the

reader. Also, if there is no meaningful way to add reference in your main text to your figure/subfigure,

this perhaps suggests that you do not need that figure. Your figure caption should explain in direct

terms what one can see on the figure. If it is a data containing graph, the type of graph (e.g. scatter

plot), the axis, and the data typos (e.g. circles, dots) should be linked to the entities discussed. The

main text should refer to the figure and explain first what the reader is supposed to see in the figure

(imagine the reader is blind, what message should they be able you extract from the figure).

Ask yourself: Does each figure present evidence in favor of exactly one point that you are trying to

make with it, and does it present such evidence as clearly as you can imagine? E.g., if you are

conveying that one thing does better than another, are you showing relatively performance, delta

performance or absolute performance? absolute performance requires that the reader subtract in their

head but may be important when absolute performance is in itself meaningful (e.g. patient survival),

relative performance imposes no such requirement but the absolute value is lost, while delta

performance may be helpful to convey the meaning that conditions are different from each other. this

is certainly the most important consideration. the only question id answer if i were standing on one

foot, the rest are details.

Check list for your figures:

Are all axes are labeled, e.g. Height, are the axes tipped with arrows?

Do all axis labels have units, do you specify them using the square bracket

convention Height [cm]?

If there is 1 color, is it black? If there are 2, are they are black and grey

(or blue and red, but do not use red and green, please)?

If there are multiple lines/dots, is each a different line style and color?

Are all lines sufficiently thick? (If you used Matlab, and they are the default

thickness, the answer is no, it should be at least 2pt)

Are all font sizes are legible? (If you used Matlab, and they are the default

font size the answer is no, it should be at least 16-18pt)

Is the figure exported at sufficiently high DPI (400+)?

If there are multiple lines, is there is a legend, or other textual description

of each line?

If errorbars make sense, are they there? If there, does the caption explain

whether they are standard error or standard deviation? If they are not

either, is there a good justification provided for that?

Is your method in the figure named something other than proposed method

or our method? If not, name it and use it throughout.

Are you axes tight (that is, are the bounds of the axes just larger than the

max and min of what you want to show)? If not, do you have good reason

for the excess, e.g. because you need to show absolute levels.

Are all graphics that can be vector graphics actually vector graphics (unless

you export 600 DPI bitmaps, which may actually be the better way

sometimes in Matlab).

Do all axes have either clear tick marks or gridlines indicating magnitudes

of everything?

Are all the letters/numbers fully visible (i.e., not obscured by part of the

figure)?

Is the aspect ratio correct? On data where the different axis have the same units it may be important

to show equal unit as equal width in all axes (Hint: if you rescaled both the width and height

separately, it is probably not.)

If the data are 2D, are you displaying it in 2D? (If not, remove that additional

3rd dimension, it is just confusing and obfuscatory.)

Does the caption begin with a sentence (fragment) stating what the figure

is demonstrating (i.e., why it is there)?

Does the caption end by pointing out particularly interesting aspects of the

figure that one should note?

Does the caption define all acronyms used in the figure (especially if they

are not used anywhere else)?

- Matlab BasicsTransféré parDeepak Prakash Jaya
- Control Manual PGD Carel User Interface UK (1)Transféré parRodrigo
- Advanced Engineering Mathematics 3rd Ed - Merle C. Potter Et Al. (OUP, 2005)Transféré parMarcelo Fabiano
- 2013 Cape maths Specimen Paper Unit 2 Paper 1Transféré parDaveed
- Student Solution Chap 02Transféré parsmazai
- 36435784 s7 Wcf Blocks Supplements EnTransféré parRafael David
- Building Skills in PythonTransféré parRam Kumar
- Foreign Donor Money in USA ElectionsTransféré parJoshua J. Israel
- CM-ACAD-VBATransféré parHandoi Voidoi
- Pscad Manual(1)Transféré parSammie Audu
- Excel FunctionTransféré parShan Bsmani
- CL 400 PRESENTAT..Transféré paryashodasree
- HW1-Solns.pdfTransféré parApam Benjamin
- Plsql ProceduresTransféré parMukesh Mohan
- Technical_Specification DEBMAS IDOCTransféré parArpit Badaya
- Positioning Tutorial MEXLv2Transféré parkeyi_liu0125
- PRO1_11E [읽기 전용] [호환 모드]Transféré parNguyễn Anh Tú
- UCLA CS 31 Lecture 6 PostTransféré parManuel Sosaeta
- Research DesignTransféré parbairoju
- ZFI6IN11Transféré paranandp_76@y
- Chapter 2Transféré parQuang H. Lê
- HW-4.3Transféré pareshaaftab
- Program SsvsTransféré parmfanari
- Matlab en LarsonTransféré parPatt GP
- Design Specification - CTI Integration Addin - V1.0Transféré parraj_sristi
- tv_17_2010_4_465_473Transféré parharidileep
- Introduction to MATLABTransféré parMakhdoom Ibad Hashmi
- Python FunctionsTransféré parvignesh
- 7. List in R_ Create, Select Elements With Exampled1p7Transféré parMurali Dharan
- jacobs pre-reading lepTransféré parapi-378960179

- (Zurich Lectures in Advanced Mathematics )Guus Balkema-High Risk Scenarios and Extremes a Geometric Approach -European Mathematical Society(2007)Transféré parkaxapo
- Simulation and Modeling. Part 1Transféré parkaxapo
- Simulation and Modeling. Part 2Transféré parkaxapo
- Simulation and Modeling. Part 3Transféré parkaxapo
- Jordan Penn. The OECD's Call for an End to "corrosive" Facilitation Payments And The Internation Focus on the Facilitation Payments Exception Under The Foreign Corrupt Practices ActTransféré parkaxapo
- manualul fieraruluiTransféré parJohn Johnb
- tensor analysisTransféré parJoseph Raya-Ellis

- Stat Chap009Transféré parJoshz Tjoeng
- Biological Cybernetics Volume 108 Issue 5 2014 Van Hemmen, J. Leo -- Neuroscience From a Mathematical Perspective- Key Concepts, Scales and Scaling Hypothesis, UniverTransféré parMatias Osta Vélez
- C ProgrammingTransféré parKanishka Thomas Kain
- Larry Pesavento - Fibonacci Ratios With Pattern RecognitionTransféré parFrancisChuah
- Can main() be overloaded in C++_ - GeeksforGeeksTransféré parsasidharchennamsetty
- ieep202Transféré paranima1982
- double pipe.pdfTransféré parbalabooks
- page 12Transféré parjun
- Eliptic Curve ImplementationTransféré parNagarajan Munusamy
- automata theoryTransféré parNilesh Patel
- Spread SpectrumTransféré parMurugesan Eswaran
- Richard Berry Resume 06 11 2018Transféré parRichie Berry
- Material flow optimisation in a multi-echelon and multi-product supply chain.pdfTransféré parHadi P.
- Timber Grading and Scanning DG492Transféré parMohinuddin Ahmed
- Chapter 1 What is StatisticTransféré parKrittima Parn Suwanphorung
- Integration TechniquesTransféré parஏம்மனுஎல்லெ செலேச்டினோ
- DIP3E Chapter10 ArtTransféré parAntony Vijay
- Pipephase UGTransféré parGabi Araujo
- complianceTransféré parBas Ramu
- Quiz 2 review questions (solutions)-1.pdfTransféré parSteven Nguyen
- Ratna Koley and Sayan Kar- A novel braneworld model with a bulk scalar fieldTransféré parPollmqc
- Lesson PlanTransféré parSodik Bens Trie
- Temperature Control Lab Report MuzTransféré parAhmad Muzammil
- PTCUser Sweden Creo3.0 UpdateTransféré parAnonymous rhbWnv
- Patricia Churchland, The Significance of Neuroscience for PhilosophyTransféré pargzalzalkovia
- STATATransféré paramudaryo
- [Qm] Chapter 13 - Project Management (1)Transféré parNguyên Bùi
- The Hopfield Model - Emin Orhan - 2014.pdfTransféré parJon Arnold Grey
- Robert David Sack (Auth.)-Conceptions of Space in Social Thought_ a Geographic Perspective-Macmillan Education UK (1980)Transféré parFlávitcha Patrocinio
- 00 29 57Transféré parapi-287551832