Académique Documents
Professionnel Documents
Culture Documents
Throughout this project, I will have to learn many new programming concepts in
order to successfully complete it. This includes becoming familiar with new tools and
Software, learning programming languages, and developing an understanding of
computing and Machine Learning principles.
The tools that I will be using include Google Sheets, BigQuery, Data Studio,
Cloud Data Lab, Machine Learning Studio, the PyCharm Python IDE, and an API to
connect the Python code to the Cloud Data Studios. All of these tools will allow me to
progress towards creating the application. Google Sheets is simple to use and I have
used it many times in the past; therefore, it will be the easiest tool that I work with.
Cloud Data Lab and the Machine Learning Studio will probably be the most
complicating stages as they are professional development studios. Both are not as
straightforward as Google Sheets. However, considering that I will be stair-stepping up
from the easier tools, the transition should not be too challenging. Additionally, they
are more powerful than the tools I have worked with, so I should be able to attain
interesting results. Utilizing PyCharm will be doable as I am comfortable with similar
Java IDEs. The most complicating aspect of this step will be identifying an appropriate
API to assist in the development of the application. This will require intense research as
I want to find a relatively simple method of linking the code and data; however, I also
want the solution to be reliable.
The two programming languages that I will be working with are Python and
SQL. SQL will be a little challenging to learn as I have never done any Database
Programming and the language is completely new to me. Furthermore, it is not similar
to the languages that I am familiar with- Java and Swift. However, SQL will be crucial
to this project as almost all the stages, especially the initial ones. Furthermore, learning
SQL will allow me to connect applications to databases and analyze the statistics. In
order to learn the language, I have been watching Khan Academy videos and have
started a CodeAcademy course. Furthermore, I will attain a lot of practice using SQL as
I will be using it exclusively while working with BigQuery. Python should be easier for
me to pick up as it is similar to Java and Swift. Furthermore, since I am familiar with
programming logic, the only aspect that will be new is some Python syntax. In order to
learn the Python language, I will be reading through a reference book, using online
tutorials, and practicing on Khan Academy. I have already started looking through
some of the Python resources as well.
The most important aspect of this project is that I learn more about Machine
Learning and Cloud, Data, and Distributed Computing. Since I hope to continue
researching these fields through college and the work place, I want to have a solid
understanding of the fundamentals. I will learn these principles through discussions
with my mentor and reading research articles as I engage in the process of developing
this project.
Materials:
This entire project will be completed on a laptop and the materials are free. There may
be some costs associated with the Google Development Studios and API; however, for
my relatively simplistic purposes- the free threshold should not be crossed. The
following data, websites, and software will be utilized:
Methodology:
Prior to starting any Data Programming or Machine Learning project, it is vital
that there is an extensive amount of data to use. This allows for more profound and
more accurate results. My mentor came across a lot of NCAA data; therefore, this step
was completed.
The official first step to this project is compiling a data dictionary utilizing the
rows and columns available in the spreadsheet. The purpose of creating a data
dictionary is to gain familiarity with the available data and note the type of the data to
prepare for programming with the data. I have already completed this step. This is
followed by utilizing Google Sheets to create simple visuals to examine the data. This
includes filtering the data itself, creating pivot tables, and creating charts. These steps
allow looking closer at the data and gaining a perspective regarding the effects of
certain variables with straightforward steps. Furthermore, this does not require much
background knowledge. I have also completed this step.
Following this is using SQL in BigQuery to identify patterns in the data and
closely look at certain categories- such as the 10 lowest scores or the 20 longest games or
the 15 games played with the least amount of travel time for the visiting team. I am
currently working on this step; however, I have not been able to work on this step using
the game data due to security reasons. Instead, I have been getting accustomed to the
BigQuery platform by working with public data sets such as temperatures. BigQuery is
special in that it enables fast analysis and iterates through the data with great speeds.
Furthermore, the parallel processing capabilities will make it faster for BigQuery to
process SQL queries that I code.
After this, I will work with Data Studio and Cloud Datalab. They are
professional development studios and will allow me to create more sophisticated and
more meaningful visualizations with the data. This will be crucial as it will allow me to
truly understand the data and embark on utilizing Machine Learning. Data Studio
builds off of BigQuery and makes it easier to observe significant patters while Cloud
Datalab allows reviewing, visualizing, and coding all in one place. It is used by
professional researchers. Therefore, this stage will be the most profound in that it is at
the heart of the project- analyzing visual representations and utilizing Machine
Learning algorithms to make predictions. This will additionally entail that I go through
a multitude of articles and tutorials so that I will be able to make the most out of the
platforms.
Following this, I will utilize Python to create the application. First, I will create a
diagram and design my application on Lucidchart. While working on this, I will keep
the Software Development principles I have learned through research and the coding
best practices that I have learned through the BPA Java Competition. This step will
allow me to be more methodological and efficient when I am programming.
Additionally, I will design the application on a photo editing software so that I can
reason through what code I need in order to create the output I am hoping for. After
that, I will focus on learning how to connect the local Python code to the online
Machine Learning server and online data spreadsheet and code that aspect. This will
entail researching and finding an appropriate API to enable me to complete that task.
After getting all of the information in, I will look into utilizing TkInter in order to
code the graphics aspects. This part will be a little tricky as I have not had much
experience with animating programmatically; however, it aligns more with the
programming I am used to. Also, I will create the images and backgrounds using a
photo editing software during this step. This will allow me to finish coding the
application and ensure it can be downloaded and run from devices.
Conclusion:
This project is apt for me to work on as it will allow me to gain exposure in areas
of Computer Science that I do not have much experience with- Data Programming and
Machine Learning. Furthermore, since both of these areas are growing in the tech
industry at rapid rates, it will be useful knowledge for me in the future and give me a
taste of the field I hope to dive into during and after college.
Even though I may not publish this application for the public, I will be sharing
the steps along with my mentor on the Google blogs. I hope to have a product that is
captivating to look at and reflects months of work. Furthermore, since I will be utilizing
the same resources in the future; I hope to program and work through all aspects using
my best effort so that I will be able to look back at the product if I am stuck in the
future.
Calendar
Completed Steps
Create a data dictionary
Use Google Sheets to filter the data, create tables, and create charts
Analyze the visuals from Google Sheets
Week One (3/10-3/16)
Learn the fundamentals of SQL
Work with BigQuery and use standard SQL to look at specific patterns
Work on formatting queries, exploring my data to optimize cost, commenting
query code, and substring matching within BigQuery to familiarize myself with
the
Bind Data Studio to my data source and start reading tutorials
Week Two (3/17-3/23)
Create simple tables, graphs, and charts on Data Studio
Start working with Cloud DataLab and get accustomed to it after watching
tutorials
Week Three (3/24-3/30)
Export the NCAA data to BigQuery
Machine Learning through Cloud DataLab
Week Four (3/31-4/06)
Wrap up the Machine Learning portion with Machine Learning Studio and
Cloud Datalab
Week Five (4/07-4/13)
Create the UML Diagram
Design the GUI
Research how to connect Python code to Machine Learning servers and cloud
databases
Continue learning Python
Week Six (4/14-4/20)
Connect the application to the cloud database and Machine Learning server
Create all the images on a photo editing software
Week Seven (4/21-4/27)
Code the graphics in the application
Week Eight (4/28-5/04)
Finish coding the application
Make the application able to run on devices
Start working on the Final Presentation script
Week Nine (5/05-5/11)
Attain feedback and make adjustments if necessary to the application
Post tutorial to Google blogs
Practice for Final Presentation night
Week Ten (5/12-5/18)
Practice for final presentation night