Vous êtes sur la page 1sur 4

INF 1340H Assignment 3

A Data Mining exercise


Lead Instructor: Oce: Oce Hours: E-Mail: Course Web Page: Teaching Assistant: Teaching Assistant Oce Hours:

Assignment 3

Fall 2013

Periklis Andritsos BL-615 Monday 12:00 pm - 1:00 pm (also by appointment) periklis.andritsos@utoronto.ca Sign-in to Blackboard Matthew (Matt) Wells By appointment at: matthew.wells@mail.utoronto.ca

NOTE: Text items in blue are clickable hyperlinks.

DEADLINE: December 9, 2013, 11:59pm. Please submit it on Blackboard Introduction


The goal of this project is to gain more practice with le I/O, lists, functions and dictionaries. Data mining is the process of sieving through large amounts of data and discovering knowledge that is hidden. It is a mainstream tool nowadays by business intelligence organizations, and nancial analysts, but is increasingly being used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. Note the new terms Big Data and Data Science that have recently emerged. In this project, you will perform data mining on the prices of Google stock. Your program will calculate the monthly average prices of Google stock from 2004 to 2008, and report the 6 best and 6 worst months for Google. This assignment is worth 10% of the nal mark and at the end of the handout you will nd the criteria upon which we are going to evaluate you. We will give you a score out of 100 now and weigh it (to 10% of the nal mark) at the end of the course. You are allowed to use WingIDE or IDLE (or, of course, any other IDE that you prefer). Please make sure you use Python 3.3. Self and peer assessment: Assignment submission will include self and peer assessment forms that must be completed by each group member separately. When students upload their assignments (on Blackboard) they will be asked to upload a special form discussing the teamwork. These forms are strictly condential and will be provided during the course (on Blackboard).

Page 1 of 4

INF 1340H
Data Mining of Google Stock Data
Project Specications

Assignment 3

Fall 2013

A le of Google stocks historical prices will be given to you, whose name is table.csv. This le could be opened by notepad or wordPad, and is delimited by comma. If you open it with Excel, commas will not be shown. You are asked to write the following functions: 1. get data list(FILE NAME): In this function, you are required to read the le of stocks historical prices and after reading each line, you will split it into a list, and append this list to another main list, suppose its name is data list. So, data list is a list of lists, i.e. a 2-D list. At the end of this function, you should return the data list. 2. get monthly averages(data list): In this function, you will use the data list generated by the rst function as the parameter. Use Date, Volume, and Adj Close to calculate the average monthly prices. What is a good way to calculate the average price? Below are the instructions: Suppose one days volume and close price are V1 and C1 , respectively, then that days total sales equals V1 C1 . We will use the Volume column for the days volume and the Adj Close column for the days close. Now suppose another days volume and close price are V2 and C2 . The average of these 2 days is the sum of the total sales divided by the total volume. So, the average price of these two days is calculated in this way: average price = (V1 C1 + V2 C2 )/(V1 + V2 ) (1)

To average a whole month you just add up the total sales (V C ) for each day and divide by the sum of all the volumes (V1 + V2 + . . . + Vn ) For each month create a tuple with 2 items, the average for that month, and the date (you only need the month and year). Append the tuple for each month to a list (e.g. monthly averages list), and after calculating all the monthly averages, return this list. 3. print info(monthly averages list): In this function, you need to use the list of monthly averages calculated in the 2nd function. Here you will need to nd and print to a le the 6 best (highest average price) and 6 worst (lowest average price) months for Google stock. You will print to a le named monthly averages.txt. You will rst print a header like 6 best months and then print the 6 best months, 1 month per line, from highest price to lowest, in the following format: month-year, average price (to 2 decimal places). You will then print a blank line and then another header like 6 worst months and print the 6 worst months, 1 month per line from lowest price to highest, in the same format as for the best months (example output can be found below).This function does not return anything Deliverables the group x datamining.py le with source code solution Page 2 of 4

INF 1340H
the group x monthly averages.txt le with the result of your analysis List of Files to Download table.csv

Assignment 3

Fall 2013

Your code should be easily readable. The functions should include docstrings, comments and variable properly named. They should produce messages in case of errors (e.g. if the input strings do not comply with the specication of the inputs). Example output

6 best months for Google stock price averages: 12-2007, 693.76 11-2007, 676.55 10-2007, 637.38 01-2008, 599.42 05-2008, 576.29 06-2008, 555.34 6 worst months for Google stock price averages: 09-2004, 116.38 10-2004, 164.52 11-2004, 177.09 12-2004, 181.01 03-2005, 181.18 01-2005, 192.96

Page 3 of 4

INF 1340H
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Documentation ============= - Overall file/module documented: - Documents every function: - Comments are clear and descriptive: OVERALL DOCUMENTATION MARK: Style ===== - Passes the style checker: - Variables have meaningful names: - Modular code, no repeated code: - General readability: OVERALL STYLE MARK: Correctness =========== - Exercise 1

Assignment 3

Fall 2013

____ / 3 ____ / 5 ____ / 7 ____ / 15

____ / 10 ____ / 5 ____ / 5 ____ / 5 ____ / 25

____ / 60

OVERALL CORRECTNESS MARK: A2 MARK TOTAL:

____/ 60 ____/ 100

Page 4 of 4