K-Means in Text Mining For News Sources

Transféré par

yngyani

0% ont trouvé ce document utile (0 vote)

29 vues3 pages

Academic presentation on K-Means being used in Text mining for News resources.

Titre original

K-Means in Text Mining for News Sources

Copyright

Formats disponibles

PPTX, PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

Academic presentation on K-Means being used in Text mining for News resources.

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PPTX, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

29 vues3 pages

K-Means in Text Mining For News Sources

Transféré par

yngyani

Academic presentation on K-Means being used in Text mining for News resources.

Droits d'auteur :

Attribution Non-Commercial (BY-NC)

Formats disponibles

Téléchargez comme PPTX, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 3

Rechercher à l'intérieur du document

K-Means in text mining for News sources

Source - http://www.wakhok.ac.jp/~dipesh/files/THESIS-DipeshShrestha.pdf
Abstract In his thesis Dipesh Shrestha presents a new method for news document clustering using nonnegative matrix factorization. The key idea is to cluster the documents after measuring the proximity of the documents with the extracted features. The extracted features are considered as the final cluster labels and clustering is done using cosine similarity which is equivalent to k-means with a single iteration.

The tools used for proving this concept are Apache Lucene & Hadoop MapReduce. Apache Mahout which is the analytics tool for Hadoop has adopted K-means clustering as the default clustering algorithm after this paper was presented. This application was named 'Swami'. MapReduce is a software framework introduced by Google to computer large scale data.
The performance of models with proposed technique was conducted on news from 20 newsgroups datasets and the accuracy was found to be above 80% for 2 clusters and above 75% for 3 clusters. Since the experiments were carried only in one cluster of Hadoop, the significant reduction in time was obtained by MapReduce implementation when clusters size exceeded 9 i.e. 40 documents averaging 1.5 kilobytes Process Let documents in the term document matrix be V = {d1,d2,d3....dn} and extracted feature vectors be F={f1,f2,f3....fk} 1. Construct the term document matrix V from the files of a given folder using term Frequency-inverse document frequency 2. Normalize length of columns of V to unit Euclidean length. 3. Perform NMF on V and get W and H using (3) 4. Apply cosine similarity to measure distance between the documents di and extracted features/vectors of W. Assign di to wx if the the angle between di and wx is smallest. This is equivalent to k-means algorithm with a single turn. To run the parallel version of k-means algorithm, Hadoop is started in local reference mode and pseudo distributed mode and the kmeans job is submitted to the JobClient. The time taken for steps from 1 though 3 and the total time taken were noted separately.

Ganesh Yanamandra

NMIMS MPE-03 BA Assignment 1

Slide 1 of 3

Method & Results

Ganesh Yanamandra

NMIMS MPE-03 BA Assignment 1

Slide 2 of 3

Conclusion
The accuracy of this model was tested and found to be 80% for 2 clusters of documents and 75% for 3 clusters and the results averages to 65% when for 2 through 10 clusters. Non-Negative Matrix factorization has shown to be a good measure for clustering document and this work has also shown similar results when the extracted features are used as the final cluster labels for k-means algorithm. To scale the document clustering the proposed model uses the mapreduce implementation of k-means from Apache Hadoop Project and it has shown to scale even in a single cluster computer when clusters size exceeded 9 i.e. 40 documents averaging 1.5 kilobytes.

Ganesh Yanamandra

NMIMS MPE-03 BA Assignment 1

Slide 3 of 3

Vous aimerez peut-être aussi

(SAP TechEd Las Vegas) SessionDownloadByFilters
Document168 pages
(SAP TechEd Las Vegas) SessionDownloadByFilters
yngyani
0% (1)
Sap Book
Document445 pages
Sap Book
Sai Chivukula
Pas encore d'évaluation
Ariba - SSO
Document1 page
Ariba - SSO
yngyani
Pas encore d'évaluation
AEG - Dispensation Model
Document5 pages
AEG - Dispensation Model
yngyani
Pas encore d'évaluation
21 Indispensable Qualities of A Leader - John Maxwell (Presentation)
Document23 pages
21 Indispensable Qualities of A Leader - John Maxwell (Presentation)
Le Po
100% (1)
Mobile Wallet Info Graphic
Document1 page
Mobile Wallet Info Graphic
yngyani
Pas encore d'évaluation
Leadership & Change MGMT Theories
Document20 pages
Leadership & Change MGMT Theories
yngyani
Pas encore d'évaluation
Mechanical Energy
Document3 pages
Mechanical Energy
yngyani
Pas encore d'évaluation
Chilkur Ce Layout
Document1 page
Chilkur Ce Layout
yngyani
Pas encore d'évaluation
Working Capital Operating Cycle, Profitability & Liquidity
Document5 pages
Working Capital Operating Cycle, Profitability & Liquidity
yngyani
Pas encore d'évaluation
Vegas Must Do!
Document5 pages
Vegas Must Do!
yngyani
Pas encore d'évaluation
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
D'Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Évaluation : 4 sur 5 étoiles
4/5 (5794)
The Little Book of Hygge: Danish Secrets to Happy Living
D'Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Évaluation : 3.5 sur 5 étoiles
3.5/5 (400)
Shoe Dog: A Memoir by the Creator of Nike
D'Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Évaluation : 4.5 sur 5 étoiles
4.5/5 (537)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
D'Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Évaluation : 4 sur 5 étoiles
4/5 (895)
The Yellow House: A Memoir (2019 National Book Award Winner)
D'Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Évaluation : 4 sur 5 étoiles
4/5 (98)
The Emperor of All Maladies: A Biography of Cancer
D'Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Évaluation : 4.5 sur 5 étoiles
4.5/5 (271)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
D'Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Évaluation : 3.5 sur 5 étoiles
3.5/5 (231)
Never Split the Difference: Negotiating As If Your Life Depended On It
D'Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Évaluation : 4.5 sur 5 étoiles
4.5/5 (838)
Grit: The Power of Passion and Perseverance
D'Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Évaluation : 4 sur 5 étoiles
4/5 (588)
On Fire: The (Burning) Case for a Green New Deal
D'Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Évaluation : 4 sur 5 étoiles
4/5 (74)
Yes Please
D'Everand
Yes Please
Amy Poehler
Évaluation : 4 sur 5 étoiles
4/5 (1891)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
D'Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Évaluation : 4.5 sur 5 étoiles
4.5/5 (474)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
D'Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Évaluation : 4.5 sur 5 étoiles
4.5/5 (266)
The Unwinding: An Inner History of the New America
D'Everand
The Unwinding: An Inner History of the New America
George Packer
Évaluation : 4 sur 5 étoiles
4/5 (45)
Fear: Trump in the White House
D'Everand
Fear: Trump in the White House
Bob Woodward
Évaluation : 3.5 sur 5 étoiles
3.5/5 (738)
Team of Rivals: The Political Genius of Abraham Lincoln
D'Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Évaluation : 4.5 sur 5 étoiles
4.5/5 (234)
Principles: Life and Work
D'Everand
Principles: Life and Work
Ray Dalio
Évaluation : 4 sur 5 étoiles
4/5 (599)
Angela's Ashes: A Memoir
D'Everand
Angela's Ashes: A Memoir
Frank McCourt
Évaluation : 4.5 sur 5 étoiles
4.5/5 (440)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
D'Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Évaluation : 3.5 sur 5 étoiles
3.5/5 (2259)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
D'Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
Évaluation : 4 sur 5 étoiles
4/5 (1090)
Rise of ISIS: A Threat We Can't Ignore
D'Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Évaluation : 3.5 sur 5 étoiles
3.5/5 (137)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
D'Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Évaluation : 4.5 sur 5 étoiles
4.5/5 (344)
Bad Feminist: Essays
D'Everand
Bad Feminist: Essays
Roxane Gay
Évaluation : 4 sur 5 étoiles
4/5 (1015)
Steve Jobs
D'Everand
Steve Jobs
Walter Isaacson
Évaluation : 4.5 sur 5 étoiles
4.5/5 (806)
John Adams
D'Everand
John Adams
David McCullough
Évaluation : 4.5 sur 5 étoiles
4.5/5 (2409)
The Glass Castle: A Memoir
D'Everand
The Glass Castle: A Memoir
Jeannette Walls
Évaluation : 4.5 sur 5 étoiles
4.5/5 (1712)
The Outsider: A Novel
D'Everand
The Outsider: A Novel
Stephen King
Évaluation : 4 sur 5 étoiles
4/5 (1839)
The Light Between Oceans: A Novel
D'Everand
The Light Between Oceans: A Novel
M.L. Stedman
Évaluation : 4.5 sur 5 étoiles
4.5/5 (789)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
D'Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Évaluation : 4.5 sur 5 étoiles
4.5/5 (121)
Brooklyn: A Novel
D'Everand
Brooklyn: A Novel
Colm Toibin
Évaluation : 3.5 sur 5 étoiles
3.5/5 (1937)
The Woman in Cabin 10
D'Everand
The Woman in Cabin 10
Ruth Ware
Évaluation : 3.5 sur 5 étoiles
3.5/5 (2322)
A Man Called Ove: A Novel
D'Everand
A Man Called Ove: A Novel
Fredrik Backman
Évaluation : 4.5 sur 5 étoiles
4.5/5 (4609)
Little Women
D'Everand
Little Women
Louisa May Alcott
Évaluation : 4 sur 5 étoiles
4/5 (104)
Manhattan Beach: A Novel
D'Everand
Manhattan Beach: A Novel
Jennifer Egan
Évaluation : 3.5 sur 5 étoiles
3.5/5 (792)
Wolf Hall: A Novel
D'Everand
Wolf Hall: A Novel
Hilary Mantel
Évaluation : 4 sur 5 étoiles
4/5 (3811)
The Perks of Being a Wallflower
D'Everand
The Perks of Being a Wallflower
Stephen Chbosky
Évaluation : 4.5 sur 5 étoiles
4.5/5 (2103)
The Art of Racing in the Rain: A Novel
D'Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Évaluation : 4 sur 5 étoiles
4/5 (4200)
A Tree Grows in Brooklyn
D'Everand
A Tree Grows in Brooklyn
Betty Smith
Évaluation : 4.5 sur 5 étoiles
4.5/5 (1929)
Sing, Unburied, Sing: A Novel
D'Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Évaluation : 4 sur 5 étoiles
4/5 (1103)
Her Body and Other Parties: Stories
D'Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Évaluation : 4 sur 5 étoiles
4/5 (821)
The Constant Gardener: A Novel
D'Everand
The Constant Gardener: A Novel
John le Carré
Évaluation : 3.5 sur 5 étoiles
3.5/5 (104)
Hungarian Algorithm For Assignment Problem - Set 1 (Introduction)
Document10 pages
Hungarian Algorithm For Assignment Problem - Set 1 (Introduction)
Rahul Choudhary
Pas encore d'évaluation
Some Dsa Programs
Document56 pages
Some Dsa Programs
hashir mahboob
Pas encore d'évaluation
03 SortingAlgorithms
Document143 pages
03 SortingAlgorithms
Sajid ali
Pas encore d'évaluation
Slides Ch4 Disk Scheduling
Document13 pages
Slides Ch4 Disk Scheduling
Abuky Ye Aman
Pas encore d'évaluation
Data Structure
Document97 pages
Data Structure
Kibru Abera
Pas encore d'évaluation
Graph Theory: What Are Graphs?
Document25 pages
Graph Theory: What Are Graphs?
Himanshu Ranjan
Pas encore d'évaluation
D Q, N Q
Document6 pages
D Q, N Q
Justin Ciagar
Pas encore d'évaluation
Maching Learning Exercise
Document4 pages
Maching Learning Exercise
scott.work96
Pas encore d'évaluation
Iteration Methods
Document16 pages
Iteration Methods
Marc Alamo
Pas encore d'évaluation
WDM11 01 Que 20210304
Document32 pages
WDM11 01 Que 20210304
윤소리
Pas encore d'évaluation
Program 6 Dijkstra Algorithm
Document6 pages
Program 6 Dijkstra Algorithm
Ragula srinivas
Pas encore d'évaluation
Company Questions
Document14 pages
Company Questions
kartikburele1000
Pas encore d'évaluation
CSI 217: Data Structures: Complexity Analysis
Document15 pages
CSI 217: Data Structures: Complexity Analysis
assd
Pas encore d'évaluation
Route Assignment1
Document8 pages
Route Assignment1
Rodel Llames Villaos
Pas encore d'évaluation
Spatial Data Management
Document7 pages
Spatial Data Management
muthu Lakshmi
Pas encore d'évaluation
ICT 34 Data Structures and Analysis of Algorithm
Document9 pages
ICT 34 Data Structures and Analysis of Algorithm
Kimondo King
Pas encore d'évaluation
CS365 Optimization Techniques Module4
Document30 pages
CS365 Optimization Techniques Module4
Pes2010 2
Pas encore d'évaluation
MCS 031
Document5 pages
MCS 031
rajivkk
Pas encore d'évaluation
Difference Between ArrayList and LinkedList - Javatpoint
Document4 pages
Difference Between ArrayList and LinkedList - Javatpoint
manoj
Pas encore d'évaluation
Artificial Intelligence - Mini-Max Algorithm
Document5 pages
Artificial Intelligence - Mini-Max Algorithm
Afaque Alam
Pas encore d'évaluation
Numerical Methods For Engineers 7
Document13 pages
Numerical Methods For Engineers 7
naj
Pas encore d'évaluation
Daa QB PDF
Document12 pages
Daa QB PDF
shri hari
Pas encore d'évaluation
Fast Fourier Transform Basics PDF
Document2 pages
Fast Fourier Transform Basics PDF
Vincent
Pas encore d'évaluation
Study FY BSC - Cs - Paper-II
Document300 pages
Study FY BSC - Cs - Paper-II
nileshnemade
Pas encore d'évaluation
JNTUK - M Tech - 2018 - 1st Semester - Feb - R17 R16 R15 R13 - G4001022018 ADVANCED DATA STRUCTURES AND ALGORITHM ANALYSIS
Document1 page
JNTUK - M Tech - 2018 - 1st Semester - Feb - R17 R16 R15 R13 - G4001022018 ADVANCED DATA STRUCTURES AND ALGORITHM ANALYSIS
Balakrishna Reddy
Pas encore d'évaluation
List Data Structure: Data Structures and Algorithms in Java 1/23
Document24 pages
List Data Structure: Data Structures and Algorithms in Java 1/23
Lê Trần Đức Thịnh
Pas encore d'évaluation
19 RecursiveBacktracking PDF
Document3 pages
19 RecursiveBacktracking PDF
جعفر عباس
Pas encore d'évaluation
Name: Nazim Firdous Ali Bscs 6C Registration# 1612202 Assignment# 1
Document11 pages
Name: Nazim Firdous Ali Bscs 6C Registration# 1612202 Assignment# 1
nazim virani
Pas encore d'évaluation
05 Vectors PDF
Document2 pages
05 Vectors PDF
Pritesh Gethewale
Pas encore d'évaluation
OODP-Unit 5
Document128 pages
OODP-Unit 5
Akshat Dutt Kaushik
Pas encore d'évaluation