Académique Documents
Professionnel Documents
Culture Documents
e-ISSN: 2455-5703
I. INTRODUCTION
Many unpredictable information are available as documents, databases or multimedia resources. Users current activities do not
initiate a search to access that information. So we adopt just-in-time retrieval system. This system spontaneously recommends
documents by analyzing users current activities. Here, the activities are recorded as conversations. For instance, conversations
recorded in a meeting. A real-time Automatic Speech Recognition (ASR) technique constructs implicit queries for document
retrieval. This recommendation is done from the web or a local repository. Just-in-time retrieval system must construct implicit
queries from conversations and this contains a much larger number of words than a query. For instance, four people must make a
list of all the items required to survive in the mountains. A short fragment of 120 seconds conversations which contains 600 words
is recorded. This is then split based on the variety of domains such as Job, Plan or Business. The Multiplicity of topics or
speech disfluencies produces ASR noise. The main objective is to maintain multiple records about users information needs. In
this paper, extracting a relevant and diverse set of keywords is the first step. Clustering into specific-topic queries are ranked by
importance. This topic-based clustering technique decreases ASR errors and increases diversity of document recommendations.
For instance, Word Frequency retrieves the Wikipedia pages like Job, Hiring and Interview. Whereas, users would
prefer fragments such as Job, Plan and Business.
Relevance and Diversity can be explained in three stages as follows. Extraction of keywords, which retrieves the mostly used
words. Build one or several implicit queries namely Single and Multiple. While using Single query, irrelevant documents are
retrieved. Whereas in Multiple query, Diversity constraint is maintained. Ranking the results is considered as the final step. It reorders the document list recommended to users. Previous methods in the formulation of implicit queries rely on Word Frequency
weights. But other methods perform keyword extraction using Topical Similarity. They do not set Topic diversity constraint.
From the ASR output, users information needs are satisfied and the number of irrelevant words is reduced. When the
keywords are extracted, it is followed by clustering which builds topically separated queries. They are run independently than
topically mixed query. These results are then ranked and organized.
The paper is organized as follows. In Section II-A the just-in-time retrieval system and the policies used for query
formulation is reviewed. In Section II-B keyword extraction methods are discussed. In Section III the proposed technique for
formulation of implicit queries is given. In Section IV introduces data sets and the comparison of keyword sets is made to retrieve
documents using crowd sourcing. In Section V the experimental results on keyword extraction and clustering is presented.
133
Ranking of Document Recommendations from Conversations using Probabilistic Latent Semantic Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 022)
134
Ranking of Document Recommendations from Conversations using Probabilistic Latent Semantic Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 022)
diversity in the keyword set. For this, a parameter is set to cover those constraints.
B. Keyword Clustering
The keywords extracted using diverse keyword extraction methodologies responses user needs in the form of topics. Diversity of
topics and reduction of noisy effect is maintained by splitting the keyword set into several topically disjoint subsets. Every subset
forms an implicit query to retrieve the document. These subsets are formed by clustering of topically similar keywords. It is
explained as follows.
Ranking keywords for each main topic of the conversation is done for keyword clustering. The decreasing values of
.
p (z|w) is considered for ordering of keywords. The keywords with higher value are considered for each topic z. They are ranked
themselves with
values.
C. Document Recommendation
For document retrieval, implicit queries can be prepared for each conversation fragment. Improvement of the retrieval results is
ensured through Formulation of multiple implicit queries. With implicit query formulation, after ranking the first d documents are
retrieved. By this way, recommendation lists were prepared from the document sets. Ranking based on topical similarity
corresponding to queries is formulated. This is the baseline algorithm.
135
Ranking of Document Recommendations from Conversations using Probabilistic Latent Semantic Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 022)
F. Single Queries
Single query comparisons are made based on D (.75), TS and WF methods over the fragments. The suggested documents are listed
out through transcripts. Again, D (.75) > WF > TS is the resulting rank for diversity keyword extraction technique. When single
queries are used relevance of resulting document sets are superior compared to the extraction technique.
Relevance (%) Relevance (%)
m1
m2
TS vs. WF
40
60
WF vs. D(.75)
42
58
TS vs. D(.75)
20
80
Fig. 1: Comparative Relevance of Single Queries Inferred As: D (.75) > WF > TS.
Compared Methods (m1 vs. m2)
G. Multiple Queries
Binary comparisons were performed using multiple queries which results in topically disjoint sets. Here, TS and D (.75) keyword
extraction methods are used. The above two methods are notified as CTS and CD (.75) (C indicates cluster of keywords). They are
derived from TS and D (.75) keyword lists after clustering. Clustering on WF is not applicable because it does not formulate on
topic modeling. Now, CD (.75) outperforms CTS with 62% vs. 38%.
H. Single versus Multiple
From the same keyword lists, single and multiple queries are compared as D (.75) and TS. It reveals that using multiple queries
leas more relevant documents when compared with single queries as CD (.75) > D (.75) and CTS > TS. The relevance score
indicates that 65% to 35% for D (.75) and TS.
1) Example
The priority for CD (.75) shows that it retrieves the most relevant results. In Appendix A, a list of 12 items is discussed and listed
out to survive in a cold mountain. The list of keywords extracted by D (.75), TS and WF does not consider topical information
(Job, Work, and Company). Here no relevance information is retrieved. The CTS method covers main topics of the fragment
with the keywords as Plan, Interview, Company and Business. Then time and questions is selected to cover the remaining
topics. Next, Resume, studies, mail and school covers all the other topics.
The highly ranked retrieval results are obtained by WF, TS, D (.75), CD (.75) and CTS. Then, multiple queries (CTS and
CD (.75)) retrieve the most relevant documents. First, WF does not recommend relevant documents. Then D (.75) retrieves the
largest number of topics. But multiple queries CTS and CD (.75) in comparison with single queries retrieves the most relevant
documents. Here TS and D (.75) do not separate mixture of topics. So we move onto CTS and CD (.75). On the contrary, CD (.75)
covers more topics while compared to CTS.
WF
TS
D (.75)
Interview
Resume
Job offer
Job
School
136
Ranking of Document Recommendations from Conversations using Probabilistic Latent Semantic Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 022)
Hire
Hiring
Interview
Hiring Company
Resume
Campus
CTS
CD (.75)
Job
Finance
Interview
Job
Hiring
Advice
Company Academic
Agency
Business
Fig. 5: Comparison of Single and Multiple Queries
V. CONCLUSION
A specific just-in-time retrieval system includes conversational environment which recommends documents to users along with
the information needs. First, topic modeling is performed by retrieving implicit queries from fragments. A novel keyword
extraction technique covers maximum number of important topics in a fragment. They are clustered to divide the keyword set into
smaller topically-disjoint subsets formulated using implicit queries.
Based on WF and TS, relevance of the retrieved documents is represented. Therefore, both relevance and diversity
empowers improvement of keyword extraction technique. It is considered by comparing n words in addition to individual words.
The current goal is to process explicit queries and rank the results to cover all the information needs of the user. These
techniques are not integrated in the working environment with human users in real-life meetings.
VI. APPENDIX
The following transcript of a four-party conversation (speakers AD) was submitted to the document recommendation system. The
keyword lists, queries and documents retrieved are respectively shown in Tables VII, VIII, and IX.
Nancy: Hi. It is good to see you, John.
John: Same here, Nancy. It has been a long time since I last saw you.
Nancy: Yes, the last time we saw each other was New Years Eve. How are you doing?
John: I am doing OK. It would be better if I have a new job right now.
Nancy: You are looking for a new job? Why?
John: I already finished my studies and graduated last week. Now, I want to get a job in the
Finance field. Payroll is not exactly Finance.
Nancy: How long have you been looking for a new job?
John: I just started this week.
Nancy: Didnt you have any interviews with those firms that came to our campus last month? I believe quite a few companies
came to recruit students for their Finance departments.
John: I could only get one interview with Fidelity Company because of my heavy work- schedule. A month has already gone
by, and I have not heard from them. I guess I did not make it.
Nancy: Dont worry, John. You always did well in school. I know your good grades will help you get a job soon. Besides, the
job market is pretty good right now, and all companies need financial analysts.
John: I hope so.
Nancy: You have prepared a resume, right?
John: Yes.
Nancy: Did you mail your resume to a lot of companies? How about recruiting agencies?
John: I have sent it to a dozen companies already. No, I have not thought about recruiting agencies. But, I do look closely at
the employment ads listed in the newspaper every day.
Nancy: Are there a lot of openings?
John: Quite a few. Some of them require a certain amount of experience and others are willing to train.
Nancy: My friends told me that it helps to do some homework before you go to an interview. You need to know the company
wellwhat kind of business is it in? What types of products does it sell? How is it doing lately?
John: Yes, I know. I am doing some research on companies that I want to work for. I want to be ready whenever they call me
in for an interview.
Nancy: Have you thought about questions they might ask you during the interview?
John: What types of questions do you think they will ask?
Nancy: Well, they might ask you some questions about Finance theories to test your academic understanding.
John: I can handle that.
137
Ranking of Document Recommendations from Conversations using Probabilistic Latent Semantic Analysis
(GRDJE / CONFERENCE / ICIET - 2016 / 022)
Nancy: They might tell you about a problem and want you to come up with a solution.
John: I dont know about that. I hope I will be able to give them a decent response if the need arises.
Nancy: They will want to know you a little bit before they make a hiring decision. So, they may ask you to describe yourself.
For example, what are your strengths and your weaknesses? How do you get along with people?
John: I need to work on that question. How would I describe myself? Huh!
Nancy: Also, make sure you are on time. Nothing is worse than to be late for an interview. You do not want to give them a
bad impression, right from the start.
John: I know. I always plan to arrive about 10 or 15 minutes before the interview starts.
Nancy: Good decision! It seems that you are well prepared for your job search. I am sure you will find a good job in no time.
John: I hope so.
Nancy: I need to run; otherwise, I will be late for school. Good luck in your job search, John.
John: Thank you for your advice. Bye!
RREFERENCES
[1] M. Habibi and A. Popescu-Belis, Enforcing topic diversity in a document recommender for conversations, in Proc. 25th Int.
Conf. Comput. Linguist. (Coling), 2014, pp. 588599.
[2] D. Harwath and T. J. Hazen, Topic identification based extrinsic eval-uation of summarization techniques applied to
conversational speech, in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 50735076.
[3] A. Popescu-Belis, M. Yazdani, A. Nanchen, and P. N. Garner, A speech-based just-in-time retrieval system using semantic
search, in Proc. Annu. Conf. North Amer. Chap. ACL (HLT-NAACL), 2011, pp. 8085.
[4] D. Traum,P. Aggarwal,R. Artstein,S. Foutz,J. Gerten,A. Katsamanis, A. Leuski, D. Noren, and W. Swartout, Ada and Grace:
Direct inter-action with museum visitors, in Proc. 12th Int. Conf. Intell. Virtual Agents, 2012, pp. 245251.
[5] A. S. M. Arif, J. T. Du, and I. Lee, Examining collaborative query reformulation: A case of travel information searching, in
Proc. 37th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2014, pp. 875878.
[6] M. Habibi and A. Popescu-Belis, Using crowdsourcing to compare document recommendation strategies for conversations,
Workshop Recommendat. Utility Eval.: Beyond RMSE (RUE11), pp. 1520, 2012.
[7] M. Yazdani, Similarity learning over large collaborative networks, Ph.D. dissertation, EPFL Doctoral School in Information
and Commu-nication (EDIC), Lausanne, Switzerland, 2013.
[8] A. Nenkova and K. McKeown, A survey of text summarization tech-niques, in Mining Text Data, C. C. Aggarwal and C.
Zhai, Eds. New York, NY, USA: Springer, 2012, ch. 3, pp. 4376.
[9] D. F. Harwath, T. J. Hazen, and J. R. Glass, Zero resource spoken audio corpus analysis, in Proc. Int. Conf. Acoust., Speech,
Signal Process. (ICASSP), 2013, pp. 85558559.
138