Académique Documents
Professionnel Documents
Culture Documents
and
Semantic Web
Presented By:
Mohammad Aminul Islam (11103812)
Muhammad Misbahur Rahman (11101850)
Web
Mining
Contents
What is web mining?
Classification of web mining
Web structure mining
HITS Algorithm
Page rank algorithm
Web content mining
Web usage mining
Conclusion
References
Web mining
Web is the collection of inter related files
on one or more web servers.
Web Mining is the application of data
mining techniques to extract knowledge
from web data.
It discover global as well as local structure
within and between web pages
It help transformation human
understandable content to machine
understandable semantics.
Example 1
Yes, I am looking for this
obama
Example 2
hyperlinks
Web pages
Algorithms
For web structure mining there are
two main algorithms
HITS (Hypertext Induced Topic Search)
Page Rank Algorithm
HITS
Hub: Pages that point lots of other
pages such as Google, Yahoo,
Facebook, etc.
Authority: Lots of pages refer to
this page
HITS Algorithm
In HITS algorithm ranking of the web
pages decided by the textual content
of the web pages against a given
query.
After collecting of the web pages
HITS algorithm only concentrates on
the structure, forget about the
content of the web pages.
HITS Algorithm
Step 1: Initialized the number of pages N
Step 2: Calculate the good hubs links to the many
good authorities (Hub Score)
H(x)= A(y)
Step 3: Calculate authority reference by many
good hubs (Authority Score)
A(x) = H(y)
A(x)2=1
Page rank is the half of HITS
Step 4: Normalize H, A: H(x)2=
Equation
Suppose Page A has T1 to Tn pointing to it
(Incoming Links). Calculating the page rank of
page A we can use the following equation
Example
Example
Answer
Authority page
Lots of other important pages refer
to this page
HOW?
HITS Algorithm
Page rank algorithm
Conclusion
Web mining is related with search
engin optimization. If we have good
knowledge about content mining,
usage mining, structrue mining then
we will able to make good web sites.
References
http://en.wikipedia.org/wiki/PageRank
http://www.ijcsit.com/docs/vol1issue3/ijcsit2010010308.pdf
https://mathscinotes.wordpress.com/2012/01/02/worked-pagerank-example/
http://infomesh.net/2001/swintro/
https://www.youtube.com/watch?v=OGg8A2zfWKg
http://kobra.bibliothek.uni-kassel.de/handle/urn:nbn:de:hebis:34-2009022726508
http://www.semantic-web-journal.net/content/inductive-learning-semantic-web-what-d
oes-it-buy
http://blog.seagatesoft.com/wp-content/uploads/2012/03/web_mining_diagram.png
http://www.expertsupdates.com/ArticleAttachments/seo/web-mining/Figure2.gif
http://soltisconsulting1.files.wordpress.com/2013/08/hubs_and_authorities.gif