Académique Documents
Professionnel Documents
Culture Documents
One idea to block giants cornering trivial patents is to publish the ideas before anybody
uses them for applying patents. A number of websites are available for publishing new
ideas, such as ideaaday.org, ideastorm.com and ideastormz.com. One problem with such
a website is that after certain period of accumulation, the bank of ideas becomes a big
mess, full of similar, duplicated and trivial ideas. How to maintain such a website is a big
challenge. This assignment is to write a C++ or Java application to simulate such a
website (you are not required to create a website) with solutions for some of the
problems. The technology we use in this assignment is also useful for implementing
search engines1.
In this assignment you are asked to write a C++ or Java application that can be used for
maintaining a simple Ideas Bank (a collection of ideas)2. The program should be able to
accept users input of a new idea, show all related ideas in the bank and search for related
ideas in the bank. In order to guarantee high efficiency of searching, certain indexing
technology must be applied.
Indexing is a data structure widely used in the implementation of search engines3. The
simplest index data structure is so-called inverted index, which stores a map from a
word to a list of text sources (documents or files)4.
1
http://en.wikipedia.org/wiki/Web_search_engine
2
http://en.wikipedia.org/wiki/Ideas_bank
3
http://en.wikipedia.org/wiki/Search_engine_indexing
4
http://en.wikipedia.org/wiki/Inverted_index
For instance, consider the following texts
An inverted index for these texts is the following map (table 1), which lists all the words
that occur in the texts followed by the sets of indices of the texts that contains the
respective words:
When we search for a word, say what, in the text files, we refer to the inverted index map
to find which files contain this word, instead of traversing all the original text files to find
the word. This will make the search much more efficient. Note that the inverted index
must be updated whenever a change is made on the text files.
An inverted index not only makes search more efficient but also helps to find relevant
ideas. We say an idea is relevant to another idea if one of its keywords appears in the
other idea (either as a keyword or in its content).
An idea object contains the following information: id, proposer, keywords and content.
The following are two examples of idea objects:
You can add more features (attributes) to an idea object if you like but must include the
above four data items.
Based on the definition of relevance, we can say that idea 370 is relevant to idea 451
because one of 451s keywords camera appears in 370s content. According to this
criterion, idea 451 is not relevant to idea 370 because none of 370s keywords appears in
idea 451.
This assignment consists of a number of tasks. You do not have to do all of them. You
can complete your assignment by choosing some of the tasks. Your marks will be the
sum of all the tasks you have completed with the maximum marks of 100%.
Task 1 (20%): Create a class, named Idea, to model an idea object. Each object of the
class represents a single idea. The class should contain at least four data items id,
proposer, keywords and content. You have the freedom to select the data type for each
item. Make sure that the id of each idea object is unique.
Hint: You may use int for id, string for proposer and content, and array or vector of
strings for keywords. For simplicity, you may assume that a keyword consists of only one
word rather than a phrase.
Task 2 (5%): Implement a method in class Idea to check if a given word is in the list of
keywords or appears in the content.
Hint: You may implement two small functions, one searching for keywords and the other
one searching in the content, and then combine them together.
Hint: You may use a STL container such as vector or list to store idea objects.
Task 4 (5%): Implement a method in class IdeasBank to input ideas from text files.
Task 5 (30%): Implement an indexing algorithm for the idea bank using reverted index.
You are required to use AVL tree to store the indices. The algorithm can be
implemented as part of the Ideabank class or be implemented with a new class. You do
not have to write your own AVLTree ADT. You can download the AVLTree ADT (for
C++ or Java) from vUWS or from the Internet. You can change any part of the provided
code if necessary.
Hint: (1). The indices stored in the AVL tree must be structured according to the
requirement of the AVLTree ADT you use. For example, if you use the provided C++
ADT, I suggest you use the following data structure for indices:
struct Index {
string key;
vector<int> idList;
};
where key represents a word and idList represents a list of ids in which the word appears.
The attribute key then can be used as the key of the AVL tree. For instance, based on the
example ideas in Section 3, we have an index with the data as follows:
key = camera
idList = {370,451}
(2). You might need to re-index whenever you add a new idea item to or delete an
idea item from the idea bank.
Hint: For efficient implementation, you should use your indexing algorithm implemented
in Task 5 to find all relevant ideas. You will receive 5% marks if you do not use the
indexing technique (simply call the search algorithm implemented in Task 2).
Task 8 (10%): Enhance your search algorithm so that a query may contain Boolean
operators AND and OR. For instance, a query word1 AND word2 means to find all
the ideas, each of which contains both of the words. A query word1 OR word2 means
to find all the ideas, each of which contains either word1 or word2.
Task 9 (10%) Convert your code from C++ to Java or from Java to C++. Two sets of
code must be closely related with similar data structure and similar implementation of
algorithms.
5. Marking criteria
Grade Requirement
Pass Complete any set of tasks that give you 50-64% of total
marks. You may complete tasks 1-4, part of task 6 without
indexing.
High Distinction Complete a set of tasks, including tasks 5 & 6, which give
you more than 84% of marks. You must submit a document
by completing all questions in task 7.
No matter which level of program you have implemented, your program should be
executable. No incomplete program is acceptable.
5. Deliverables
You can either use C++ or Java to code your solution but highly recommend using both.
You can use any compiler or IDE to demonstrate your program provided it is available
during your demonstration. You are allowed to demonstrate your program on your
laptop. All comments must be deleted from your source code when you demonstrate.
The code should be purely written by you. No part of the code can be written by any
other persons or copied from any other source except for the AVLtree ADT.
5.2 Declaration
DECLARATION
I hereby certify that no part of this assignment has been copied from any other students work or
from any other source. No part of the code has been written/produced for me by another person
or copied from any other source.
I hold a copy of this assignment that I can produce if the original is lost or damaged.
5.3 Documentation
In addition to the declaration, the students who seek for Distinction or higher are required
to submit a document with the content specified in Task 7. The document should be
formatted in Word and submitted to vUWS in Word or PDF. Print a hard copy and hand
it to me when you demonstrate your program.
6. Submission
Both the documentation and source code should be submitted via vUWS before the
deadline. Your programs (.h, .cpp, or .java) can be put in separate files (executable file is
not required). All these files should be zipped into one file with your student id as the
zipped file name. Submission that does not follow the format will not be accepted.
6. Demonstration
You are required to demonstrate your program during your scheduled practical session
on 19 May 2017 or any day before with an appointment. I will check your code and your
understanding of the code. You will receive no marks if you fail the demonstration,
especially if you are absent during the specified time. Note that it is students
responsibility to get the appropriate compilers or IDEs to run their programs. You are
allowed to run your program from your laptop. The feedback to your work will be
delivered orally during the demonstration. No further feedbacks or comments are
given afterward. Print a hard copy of your documentation and give it to me during the
demonstration. The demonstration program should be the same as the one you submit
except that all the comments should be taken off during the demonstration.