Vous êtes sur la page 1sur 2

Final assignment (Online Media Analysis, 2017 Fall)

Deadline: December 20, 2017, midnight

For this assignment you need to do the following two tasks

Task 1

What to do?

For the game players whose ids are saved at the ‘list_of_players_fa.txt’, you need to do a social network
analysis.

1) For each player, collect the following information using the API provided by Kongregate.com

- list of friends

- game level

* For the players who set their information private (or personal) or who do not exist anymore on the site
so that you cannot collect the required data for the players, please skip those players.

2) Using the friends data for each player, you need to construct a social network using the ‘NetworkX’
module. The network should consist of only the players in the text file. For each node, you also add a
node attribute, which is game level.

3) Using the ‘NetworkX’ module, please find the following information for each player

- indegree (i.e., number of other players who follow the player)

- outdegree (i.e., number of other players whom the player follows)

4) Discuss the relationship between a player’s game level and the player’s in/outdegree.

5) Do visualization using Gephi.

- Set the size of a node according to the player’s level

- Show the label of each player

6) Report the analysis results in a Word file including the above discussion. You also need to include the
visualization result.

What to submit?

- Python codes (i.e., a Python file (either .ipynb or .py))

- The Word file

1
Task 2

What to do?

For the news articles whose urls are saved at the ‘list_of_articles_fa.txt’ file (they are news articles on the
People’s Daily, a Chinese newspaper), you need to do a semantic network analysis (You should analyze
all the news articles together, not separately).

1) First, collect the content of all the news articles using the Web scraping techniques and the ‘for’
statement.

2) Do text preprocessing.

- You should not remove symbols that indicate the end of a sentence such as ‘.’, ‘?’, ‘!’

3) Select only noun words

- You also should select DPRK, which means North Korea.

4) For top 20 most frequently used noun words, do a semantic network analysis.

A tie between two words is defined to exist if they are used in the same sentence. For each tie, you also
need to add an attribute, which is about the weight (i.e., the number of times that those two words have
been used in the same sentence). The name of this attribute should be ‘weight’.

5) Do visualization using Gephi.

- Show the label of each word.

6) Report the analysis results in a Word file and briefly discuss the results. Especially, you should discuss
what words have been frequently used with the word, ‘DPRK’, and implications of the result. You also
need to include the visualization result.

What to submit?

- Python codes (i.e., a Python file (either .ipynb or .py))

- The Word file

Where to submit

- On YSCEC

Vous aimerez peut-être aussi