Vous êtes sur la page 1sur 6

Assignment 2

Deadline: 5pm 18th May 2017


Ideas Bank
1. Background

Patents can protect inventors inventions of new technologies. However, lots of


companies blackmail users of new technologies by cornering patents. Some of the patents
are just simple ideas. The patent war between Apple and Samsung in few years ago is an
example. Something like slide-to-unlock or bounce-back is by no means high
technology but just a fashion (arguable).

One idea to block giants cornering trivial patents is to publish the ideas before anybody
uses them for applying patents. A number of websites are available for publishing new
ideas, such as ideaaday.org, ideastorm.com and ideastormz.com. One problem with such
a website is that after certain period of accumulation, the bank of ideas becomes a big
mess, full of similar, duplicated and trivial ideas. How to maintain such a website is a big
challenge. This assignment is to write a C++ or Java application to simulate such a
website (you are not required to create a website) with solutions for some of the
problems. The technology we use in this assignment is also useful for implementing
search engines1.

2. Description of the problem

In this assignment you are asked to write a C++ or Java application that can be used for
maintaining a simple Ideas Bank (a collection of ideas)2. The program should be able to
accept users input of a new idea, show all related ideas in the bank and search for related
ideas in the bank. In order to guarantee high efficiency of searching, certain indexing
technology must be applied.

Indexing is a data structure widely used in the implementation of search engines3. The
simplest index data structure is so-called inverted index, which stores a map from a
word to a list of text sources (documents or files)4.

1
http://en.wikipedia.org/wiki/Web_search_engine
2
http://en.wikipedia.org/wiki/Ideas_bank
3
http://en.wikipedia.org/wiki/Search_engine_indexing
4
http://en.wikipedia.org/wiki/Inverted_index
For instance, consider the following texts

Text 1: "it is what it is"


Text 2: "what is it"
Text 3: "it is a banana"

An inverted index for these texts is the following map (table 1), which lists all the words
that occur in the texts followed by the sets of indices of the texts that contains the
respective words:

Table 1: Inverted index


"a" {3}
"banana" {3}
"is" {1, 2, 3}
"it" {1, 2, 3}
"what" {1, 2}

When we search for a word, say what, in the text files, we refer to the inverted index map
to find which files contain this word, instead of traversing all the original text files to find
the word. This will make the search much more efficient. Note that the inverted index
must be updated whenever a change is made on the text files.

An inverted index not only makes search more efficient but also helps to find relevant
ideas. We say an idea is relevant to another idea if one of its keywords appears in the
other idea (either as a keyword or in its content).

3. Format of an idea and criteria of relevance

An idea object contains the following information: id, proposer, keywords and content.
The following are two examples of idea objects:

idea id: 370


proposer: Dongmo Zhang
keywords: smartphone, tablet
content: Every smartphone or tablet is equipped with a detachable stylus, which has built
in earphone and microphone (possibly even a camera), wirelessly connected to its body.

idea id: 451


proposer: ideastorm.com
keywords: computer, laptop, camera
content: Have you ever tried to record a video with a laptop or a front facing camera
(all in one desktop) of a subject other than yourself. It is not easy, so it would be very
productive to have a rear built in camera so that the user could just sit at the desktop
monitor as normal and record the subject that is in front. It would be nice if the camera
could change directions as well.

You can add more features (attributes) to an idea object if you like but must include the
above four data items.

Based on the definition of relevance, we can say that idea 370 is relevant to idea 451
because one of 451s keywords camera appears in 370s content. According to this
criterion, idea 451 is not relevant to idea 370 because none of 370s keywords appears in
idea 451.

4. Program tasks and specification

This assignment consists of a number of tasks. You do not have to do all of them. You
can complete your assignment by choosing some of the tasks. Your marks will be the
sum of all the tasks you have completed with the maximum marks of 100%.

Task 1 (20%): Create a class, named Idea, to model an idea object. Each object of the
class represents a single idea. The class should contain at least four data items id,
proposer, keywords and content. You have the freedom to select the data type for each
item. Make sure that the id of each idea object is unique.

Hint: You may use int for id, string for proposer and content, and array or vector of
strings for keywords. For simplicity, you may assume that a keyword consists of only one
word rather than a phrase.

Task 2 (5%): Implement a method in class Idea to check if a given word is in the list of
keywords or appears in the content.

Hint: You may implement two small functions, one searching for keywords and the other
one searching in the content, and then combine them together.

Task 3 (20%): Create a class, named IdeasBank, to implement a database of ideas


(collection of ideas). You have the freedom to select a data structure for storing idea
objects. Implement necessary functions for users to (1). Input new ideas from keyboard;
(2). Display an idea by giving its id; (4). Delete an idea; (3). Display all ideas in the
database.

Hint: You may use a STL container such as vector or list to store idea objects.
Task 4 (5%): Implement a method in class IdeasBank to input ideas from text files.

Hint: Design your own text format to facilitate file input.

Task 5 (30%): Implement an indexing algorithm for the idea bank using reverted index.
You are required to use AVL tree to store the indices. The algorithm can be
implemented as part of the Ideabank class or be implemented with a new class. You do
not have to write your own AVLTree ADT. You can download the AVLTree ADT (for
C++ or Java) from vUWS or from the Internet. You can change any part of the provided
code if necessary.

Hint: (1). The indices stored in the AVL tree must be structured according to the
requirement of the AVLTree ADT you use. For example, if you use the provided C++
ADT, I suggest you use the following data structure for indices:

struct Index {
string key;
vector<int> idList;
};

where key represents a word and idList represents a list of ids in which the word appears.
The attribute key then can be used as the key of the AVL tree. For instance, based on the
example ideas in Section 3, we have an index with the data as follows:
key = camera
idList = {370,451}
(2). You might need to re-index whenever you add a new idea item to or delete an
idea item from the idea bank.

Task 6 (15%): Implement a search algorithm as a function of the IdeaBank class.


Whenever a user inputs a word (query), your algorithm should return the list of all
relevant idea ids, i.e., those ideas that contain this word either in their keywords or in
their contents.

Hint: For efficient implementation, you should use your indexing algorithm implemented
in Task 5 to find all relevant ideas. You will receive 5% marks if you do not use the
indexing technique (simply call the search algorithm implemented in Task 2).

Task 7 (15%): Write a document that contains the following content:


1. List all the data structures you use in your program and briefly describe the
characteristics of each data structure (2%).
2. Describe your search algorithm (implemented based on Task 2) and analyze its
complexity using Big-O notation (Hint: Take the size of idea bank as n. Assume
that the number of words in each idea is constent. Quote your code for your
analysis rather than pure text description.) (5%)
3. Describe your search algorithm (implemented in Task 6) and analyze its
complexity using Big-O notation (Hint: take the number of total words as n.
Quote your code for your analysis rather than pure text description.) (5%)
4. Justifies your selection of data structures based on your complexity analysis
(Hint: you may assume that if I had used another data structure, the complexity of
searching would be higher. You must justify your claims.) (3%)

Task 8 (10%): Enhance your search algorithm so that a query may contain Boolean
operators AND and OR. For instance, a query word1 AND word2 means to find all
the ideas, each of which contains both of the words. A query word1 OR word2 means
to find all the ideas, each of which contains either word1 or word2.

Task 9 (10%) Convert your code from C++ to Java or from Java to C++. Two sets of
code must be closely related with similar data structure and similar implementation of
algorithms.

5. Marking criteria

Grade Requirement
Pass Complete any set of tasks that give you 50-64% of total
marks. You may complete tasks 1-4, part of task 6 without
indexing.

Credit Complete a set of tasks that give you 65-74% of total


marks. You must submit a document by completing all or
part of task 5.

Distinction Complete a set of tasks, including tasks 5 & 6, which give


you 75-84% of total marks. You must submit a document
by completing all or part of task 7.

High Distinction Complete a set of tasks, including tasks 5 & 6, which give
you more than 84% of marks. You must submit a document
by completing all questions in task 7.

No matter which level of program you have implemented, your program should be
executable. No incomplete program is acceptable.

5. Deliverables

5.1 Source code

You can either use C++ or Java to code your solution but highly recommend using both.
You can use any compiler or IDE to demonstrate your program provided it is available
during your demonstration. You are allowed to demonstrate your program on your
laptop. All comments must be deleted from your source code when you demonstrate.
The code should be purely written by you. No part of the code can be written by any
other persons or copied from any other source except for the AVLtree ADT.

5.2 Declaration

All students are required to submit a document contain the following

DECLARATION
I hereby certify that no part of this assignment has been copied from any other students work or
from any other source. No part of the code has been written/produced for me by another person
or copied from any other source.

I hold a copy of this assignment that I can produce if the original is lost or damaged.

5.3 Documentation

In addition to the declaration, the students who seek for Distinction or higher are required
to submit a document with the content specified in Task 7. The document should be
formatted in Word and submitted to vUWS in Word or PDF. Print a hard copy and hand
it to me when you demonstrate your program.

6. Submission

Both the documentation and source code should be submitted via vUWS before the
deadline. Your programs (.h, .cpp, or .java) can be put in separate files (executable file is
not required). All these files should be zipped into one file with your student id as the
zipped file name. Submission that does not follow the format will not be accepted.

Email submission is not acceptable (strict rule).

6. Demonstration

You are required to demonstrate your program during your scheduled practical session
on 19 May 2017 or any day before with an appointment. I will check your code and your
understanding of the code. You will receive no marks if you fail the demonstration,
especially if you are absent during the specified time. Note that it is students
responsibility to get the appropriate compilers or IDEs to run their programs. You are
allowed to run your program from your laptop. The feedback to your work will be
delivered orally during the demonstration. No further feedbacks or comments are
given afterward. Print a hard copy of your documentation and give it to me during the
demonstration. The demonstration program should be the same as the one you submit
except that all the comments should be taken off during the demonstration.

Vous aimerez peut-être aussi