Académique Documents
Professionnel Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/281259912
CITATIONS READS
0 27
1 author:
Sahil Shah
Vidya Pratishthan’s, College of Engineering, …
5 PUBLICATIONS 1 CITATION
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Sahil Shah
letting you access and read them immediately. Retrieved on: 24 October 2016
iASSIST:An Intelligent Assistance System
Mr.Shah Sahil K. (ME-II Computer Engg.) Vidya Pratishthas College of Engineering,Baramati.
Prof.Takale Sheetal A. Assistant Professor, Information Technology Department
Vidya Pratishthans College of Engineering,Baramati.
Key terms
Intelligent Helpdesk, Semantic Similarity, Web Search Re- 1.1. Existing System
sults,Document Summarization
1
1.2. Motivation nonnegative matrix factorization (SNMF) algorithm and
request focused summarization technique.
• Help desk Systems: Many industries use help desk
systems/customer care for solving various customer
1.4. Features of Proposed System
queries. Companies provide the solution to the
customer problem in three ways viz. online help • It automatically finds “problem-solution” pattern
desk system,customer representative or customer from web search engine. No need of maintaining
care representative(telephonic enquiry). an up-to-date case history enables the system to ad-
Many of the problems/queries solved by these dress queries from any domain.
systems are based on reference to solutions for sim-
ilar type of problems which were faced previously • Use of Semantic Role labeling and semantic dictio-
by the customers or just by asking different ques- nary for extraction of semantics of sentences and
tions to the user related with problem and narrow- query is done.
2
features describing the searched products in order to
generate questions/features that a user would likely
reply, and if replied, would effectively reduce the result
size of the initial query. Classical entropy-based feature
selection methods can be effective in term of result
size reduction, but they select questions uncorrelated
with user needs and therefore unlikely to be replied.
Feature-selection methods that combine feature entropy
with an appropriate measure of feature relevance can
better capture related questions with the user and can
avoid unwanted questions.
quality, decision confidence, perceived ease of use and Incremental Case-Based Reasoning (I-CBR)
perceived usefulness. is an incremental case-retrieval technique based on
information-theoretic analysis. The technique is incre-
Drawbacks: mental in the sense that it does not require the entire
This model was developed for online shopping target case description to be available at the start, but
system but it considers very small domains tracing in fact builds it up by asking focused questions of the
different decisions. This model was quantitative and user. The ordering of these questions reflects their power
structural, considering input, process and output to discriminate effectively between the set of candidate
variables, but does not trace the processes themselves. cases at each step.
Another approach would be to model the user as a
Bayesian information processor. This approach would Drawbacks:
require the updating of probabilistic beliefs as users When the description of cases or items becomes
acquire information. complicated, these case-based systems suffer from the
curse of dimensionality, and the similarity/distance
2.1.2. Conversational Recommender Systems
between cases or items becomes difficult to measure.
with Feature Selection [10]
Furthermore, the similarity measurements used in these
In these systems given an initial user query, the systems usually are based on keyword matching, which
recommender systems ask the user to provide additional lacks the semantic analysis of customer requests and
3
existing cases.
Drawbacks:
Similar to the case-based systems, the similar-
ity is measured based on keyword matching, which have
difficulty to understand the text deeply i.e. it does not
consider contextual relevance between user requests and
stored past cases.
Existing search engines often return a long list Table 1: Mathematical Model of Proposed System-I
4
cording the context. Top ranking documents are clus-
tered using Minimum Description Length (MDL) prin-
ciple [1]. Sentence Clustering Module groups sentences
having similar meaning into a cluster using Symmetric
Non-negative Matrix Factorization (SNMF) [3]. Sen-
tence Cluster Summarization module selects most rel-
evant sentences [2] from each cluster in order to form a
concise summary which is represented as reference solu-
tion to the user.
Table 2: Mathematical Model of Proposed System-II
5
ent. This is an important step towards making sense of uments from search engine are required to be ranked
the meaning of a sentence. A semantic representation of based on their semantic importance to the input user
this sort is at a higher-level of abstraction than a syntax query. In order to rank these documents, the similarity
tree. For instance, the sentence “The book was sold by scores between the retrieved documents and the input
Riya to Abbas” has a different syntactic form, but the user query are computed. Simple keyword-based simi-
same semantic roles. larity measurement, such as the cosine similarity, cannot
In order to analyze user query and documents, se- capture the semantic similarity. Thus, this system uses
mantic roles of each sentence are computed by passing a method to calculate the semantic similarity between
these sentences through semantic role parser. This helps the sentences in retrieved documents from search engine
in categorizing the documents based on their semantic and the user query based on the semantic role analysis.
importance with user query. In iASSIST, NEC SENNA Along with this, the similarity computation uses Word-
is used as the semantic role labeler, which is based on Net in order to better capture the semantically related
PropBank [9] semantic annotation. This semantic role words.Figure 3 gives algorithmic design of SLSS Calcu-
labeler labels each verb in a sentence with its proposi- lation and top document ranking.
tional arguments, and the labelling for each particular
Algorithm:Sentence-Level Semantic Similarity
verb is called a ”frame.” Therefore, for each sentence,
Calculation and Top Document Ranking
the number of frames generated by the parser equals the
number of verbs in the sentence. A set of abstract argu-
ments given by the labeler indicates the semantic role of
each term in a frame. In general, Arg[m] represents role
of term in given sentence where m indicates argument
number within sentence. For example, Arg0 is actor,
Arg-NEG indicates negation.
In general a given sentence is parsed into different ar-
guments by semantic role labeler with syntax as shown
below
6
3.2.3. Document Clustering Using MDL
Principle
The identified top ranking cases are all relevant to the MDL COST Equation
user query. But these relevant cases may actually be-
long to different categories. For example, if the user M DL Cost of C = β
α ·(no.of 1sinMT C +no.of 1sand−
query is “Give Information about Taj Mahal”, the rel- 1sinM∆ ) + |D| · log2 |D|
evant cases may involve Taj Mahal as Tea Brand, Taj
as Five Star Hotel or Taj Mahal as white marble mau-
where α and β are computed using MT D matrix.
soleum etc. Therefore, it is necessary to further group P
β= xε0,1 −P r(x) log2 P r(x)
these cases into different contexts. The proposed system
makes use of Minimum Description Length (MDL) prin-
ciple in order to cluster documents with similar meaning α = T otal no. of 1s in matrix MT D
in one group. MDL Principle states that “Best model
inferred from a given data is the one which minimizes,
Algorithm AggloMDL (D)
length of the model in bits and the length of encoding
Begin
of data, in bits.” Figure 4 describes detailed document 1. Let C = c1 ,c2 ,c3 ,.............,cn , with ci = ({di })
clustering steps using MDL approach. 2. Select best cluster pair (ci ,cj ) from C for merging
and form new cluster ck .
(ci ,cj ,ck ) := GetBestPair(C)
3.while(ci ,cj ,ck )is not empty do {
Algorithm:Document Clustering using MDL 4.C:= C- {ci , cj } U {ck }
5.(ci ,cj ,ck ):=GetBestPair(C);
Principle.
6.}
7.return C
End
procedure GetBestPair(C)
Begin
1.MDLcostmin := ∞
2.for each pair(ci ,cj ) of clusters in C do
3.{
4.(MDLcost,ck):=GetMDLCost(ci ,cj ,C);
/*GetMDLCost returns the optimal MDLCost
when ck is made by merging ci and cj */
5.if MDLcost < MDLcostmin then
6.{
7.MDLcostmin :=MDLCost;
8.(cB B B
i , cj , ck )=(ci ,cj ,ck );
9.}
10.}
11.return (cB B B
i , cj , ck )
End
7
procedure GetMDLCost(ci ,cj ,C) greater than zero. This factorization is carried out in
Begin
order to extract important objects. As,the input matrix
1. Dk = Di Dj ;
3. ck = (Dk ); is symmetric, we use SNMF algorithm here.Stepwise
4. C = C - {ci ,cj } {ck };
procedure to cluster sentences using SNMF is as shown
5. MDL := Approximate MDL Cost of C
by MDL COST Equation in figure 5.
6. return(MDL,ck );
End Algorithm:Symmetric Non negative Matrix
Factorization(SNMF).
8
2. Identifying important discrimination between docu- Figure 7 shows flow of proposed system.
ments and covering the informative content as much
as possible.
Internal Similarity
P Measure : Figure 7: Flow of Proposed System
F1 (Si ) = N 1−1 Sj εCk −Si Sim(Si , Sj )
9
tions/queries were selected from different context and
search results returned by the search engine were used
as the dataset. During user survey, user is asked to man-
ually generate solution for the selected queries referring
the dataset. The sentences in the solution generated by
the user are considered as relevant sentence set.
In this section, some illustrative scenarios are pre-
sented, in which proposed request-focused case-ranking
results are analyzed with user evaluated summarization,
which is assumed to have high accuracy.
Scenario 1: Give information about taj mahal.
Table 4 shows the concise solution generated by iASSIST
and manually evaluated summary respectively. For iAS-
SIST, the word “give” is a verb, and the corresponding
semantic role is “rel.”Therefore, the cases related to the
keyword give will have less similarity score as compared
to the cases having actual information of taj mahal.
Scenario 2: The computer in the printing room needs to
add memory.
In this scenario(Table 5), search engine will take “print-
Table 4: Top-ranking Summary Samples By Manual
ing” as the keyword and return many cases related to
Evaluation and iASSIST In Scenario I
printing or printers as the search results. Obviously,
these are not the results which are useful to the user. In Sman : Set of sentences selected by manual evaluation
iASSIST while ranking different cases, the semantic role Ssys : Set of sentences selected by iASSIST in final
of the word “printing” is the location tag, which decides summary
that the cases related to “printing” will not be retrieved. We assume, sentences selected by user while manual
In this case, more importance is given to term“add” as itsevaluations are always relevant according to user per-
semantic role is rel (verb).This helps in returning cases spective. Thus, Sman are considered as relevant sentence
which are related with how to add memory to computer. set.Table 6 shows precision and recall values for sample
Performance of proposed system is anal- user queries.Figure 7 and 8 show the average precision
ysed by comparing the solution generated by iAS- and recall of the two techniques. Graphically, recall and
SIST with standard automated summarization tool re- precision can be shown as in figure 9 and 10 for differ-
sults.Performance of iASSIST is measured using stan- ent user queries. The higher precision and recall values
dard IR measures:precision and recall of iASSIST as compared to automated summarization
|Sman ∩SSys |
Recall = |Sman | tools demonstrates that the semantic similarity calcula-
tion can better capture the meanings of the user requests
|Sman ∩SSys |
P recision = |Ssys | and case documents returned by the search engine.
Where, Comparison of proposed iASSIST system with current
10
Table 6: Performance Analysis of iASSIST
5. Conclusion
Table 5: Top-ranking Summary Samples By Manual The proposed system presents a new approach
Evaluation and iASSIST In Scenario II
to the problem of intelligent help desk system and ad-
dresses the problem of search result summarization.The
proposed iASSIST system provides its users a single
helpdesk systems is shown in Table 7. point of access to their problems by providing solutions
From the analysis, it is observed that the user from different domains. This system will automatically
satisfaction can be improved by capturing semantically find problem-solution pattern for new request given by
related cases as compared to only keyword-based match- user by making use of search results returned by the
ing cases. From the values of recall and precision ob- search engine. Use of semantic case ranking, MDL clus-
tained for sample scenarios, we conclude that combining tering and SNMF with request-focused multi document
the MDL principle that groups documents according to summarization helps to improve the performance of iAS-
different contexts and the SNMF clustering algorithm SIST. In this proposed work, we presented a new tech-
can help users to easily find their desired solutions from nique in which text documents can be clustered using
multiple physical pages. The problem of maintaining an MDL principle. The basic idea of clustering using MDL
up-to-date history of past cases is solved by making use was applied for clustering the web pages and extract-
of search engine as a database. Also, user can query any ing the templates. We adapted this technique in order
problem related to any domain. to cluster text documents returned from the search en-
gine.As the proposed system uses search engine results as
11
Figure 9: Recall of retrieved solutions
References
[4] R. Agrawal, R. Rantzau, and E. Terzi, “Context-
[1] Chulyun Kim and Kyuseok Shim, Member, IEEE sensitive ranking,” in Proc. SIGMOD, 2006, pp. 383-
12
[6] C. Ding, T. Li, W. Peng, and H. Park, “Orthogonal
nonnegative matrix t-factorizations for clustering,” in
Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discov-
ery Data Mining, 2006, pp. 126-135.
13