Vous êtes sur la page 1sur 25

Data Mining

and
Semantic Web

Presented By:
Mohammad Aminul Islam (11103812)
Muhammad Misbahur Rahman (11101850)

Web
Mining

Contents
What is web mining?
Classification of web mining
Web structure mining
HITS Algorithm
Page rank algorithm
Web content mining
Web usage mining
Conclusion
References

Web mining
Web is the collection of inter related files
on one or more web servers.
Web Mining is the application of data
mining techniques to extract knowledge
from web data.
It discover global as well as local structure
within and between web pages
It help transformation human
understandable content to machine
understandable semantics.

Example 1
Yes, I am looking for this
obama

Example 2

Area of Web mining


Web Content Text, image, records,
etc.
Web structure hyperlinks, tag, etc.
Web usagehttp logs, web server
logs, etc.

Web Mining Diagram


Web Mining

Web Structure Mining


The structure of the web consists of web pages as
a node and hyperlinks as a edge connecting
between two related pages.

hyperlinks
Web pages

Web Structure Mining


Web structure mining is the process
of discovering structure information
from the web
This type of mining can be performed in
the inner page of the web or at the
hyperlink level
The research of the hyperlink level also
called hyperlink analysis

Algorithms
For web structure mining there are
two main algorithms
HITS (Hypertext Induced Topic Search)
Page Rank Algorithm

HITS (Hypertext Induced


Topic Search)
Hypertext Induced Topic Search also
known as Hub and Authorities is a
link analysis algorithm that rate web
pages
Introduced by Jon Kleinberg.

HITS
Hub: Pages that point lots of other
pages such as Google, Yahoo,
Facebook, etc.
Authority: Lots of pages refer to
this page

HITS Algorithm
In HITS algorithm ranking of the web
pages decided by the textual content
of the web pages against a given
query.
After collecting of the web pages
HITS algorithm only concentrates on
the structure, forget about the
content of the web pages.

HITS Algorithm
Step 1: Initialized the number of pages N
Step 2: Calculate the good hubs links to the many
good authorities (Hub Score)
H(x)= A(y)
Step 3: Calculate authority reference by many
good hubs (Authority Score)
A(x) = H(y)

A(x)2=1
Page rank is the half of HITS
Step 4: Normalize H, A: H(x)2=

Page Rank Algorithm


Page Rank is an algorithm used by Google for
showing the pages in the Google search engine
result.
Calculate the importance of the pages, how many
pages refer to the pages
The number of pages linking to a page is called
back links of the page
Links from one page to another page consider as
a vote
Measuring page rank not only depends on the
vote but also importance and relevance of the
pages

Equation
Suppose Page A has T1 to Tn pointing to it
(Incoming Links). Calculating the page rank of
page A we can use the following equation

Here d is the damping factor value is 0.85 (To


stop other pages having too much influence, the
total vote is damped down by multiplying it by
0.85). C is the number of links point to A.

Example

We assume that initially every page has page rank 1.


PR(A)=1, PR(B)=1, PR(C), PR(D)=1, PR(E)=1, PR(F)=1,D= 0.85
PR (B) =1-d + d (PR (A) +PR (D)/3 + PR(C)/3 + PR (E)/4) =2.28
PR (C) =1-d + d (PR (B)/3 + PR(D)/3 + PR (E)/4)=1.62
PR (D) =1-d + d (PR (B)/3 + PR(C)/3 + PR (E)/4)=1.62
PR (E) =1-d + d (PR (B)/3 + PR(C)/3 + PR (D)/3)=1.71
PR (F) =1-d + d (PR (E)/4) =0.51

Web Content Mining


Discovering useful information from
the web content
Content means text, audio, video
etc.
Content could be structued, semi
structured, unstructured

Example

Web usage mining


Existing tools report the number of hits of Web
pages and where the hits came from. Although
useful, the information is not sufficient to learn
user behavior. Tools providing further analysis of
such information are useful.

Example: HTTP logs, web server logs


etc.
.

Why those links are showing on the first page ?

Answer
Authority page
Lots of other important pages refer
to this page
HOW?
HITS Algorithm
Page rank algorithm

Conclusion
Web mining is related with search
engin optimization. If we have good
knowledge about content mining,
usage mining, structrue mining then
we will able to make good web sites.

References

http://en.wikipedia.org/wiki/PageRank
http://www.ijcsit.com/docs/vol1issue3/ijcsit2010010308.pdf
https://mathscinotes.wordpress.com/2012/01/02/worked-pagerank-example/
http://infomesh.net/2001/swintro/
https://www.youtube.com/watch?v=OGg8A2zfWKg
http://kobra.bibliothek.uni-kassel.de/handle/urn:nbn:de:hebis:34-2009022726508
http://www.semantic-web-journal.net/content/inductive-learning-semantic-web-what-d
oes-it-buy
http://blog.seagatesoft.com/wp-content/uploads/2012/03/web_mining_diagram.png
http://www.expertsupdates.com/ArticleAttachments/seo/web-mining/Figure2.gif
http://soltisconsulting1.files.wordpress.com/2013/08/hubs_and_authorities.gif

Vous aimerez peut-être aussi