Vous êtes sur la page 1sur 75

I HC QUC GIA H NI

TRNG I HC CNG NGH




L Th Kim Dung


MT S THUT TON PHN HNG NH
PH BIN V P DNG TRONG H THNG
TM KIM NH LP TRN TH NGHIM







KHO LUN TT NGHIP I HC H CHNH QUY

Ngnh: Cng ngh thng tin














H NI - 2010




I HC QUC GIA H NI
TRNG I HC CNG NGH


L Th Kim Dung


MT S THUT TON PHN HNG NH
PH BIN V P DNG TRONG H THNG
TM KIM NH LP TRN TH NGHIM







KHO LUN TT NGHIP I HC H CHNH QUY

Ngnh: Cng ngh thng tin


Cn b hng dn: PGS.TS H Quang Thy
Cn b ng hng dn: ThS Nguyn Cm T











H NI - 2010



Li cm n

Trc tin, ti xin gi li cm n v lng bit n su sc nht ti Ph Gio s
Tin s H Quang Thy v Thc s Nguyn Cm T, ngi tn tnh ch bo v
hng dn ti trong sut qu trnh thc hin kho lun tt nghip.
Ti chn thnh cm n cc thy, c to nhng iu kin thun li cho ti hc
tp v nghin cu ti trng i hc Cng ngh.
Ti cng xin gi li cm n ti cc anh ch v cc bn sinh vin trong nhm
Khai ph d liu gip ti rt nhiu trong vic h tr kin thc chuyn mn
hon thnh tt kho lun.
Cui cng, ti mun gi li cm v hn ti gia nh v bn b, nhng ngi thn
yu lun bn cnh v ng vin ti trong sut qu trnh thc hin kha lun tt nghip.
Ti xin chn thnh cm n!



Sinh vin
L Th Kim Dung



Tm tt

S tng khng ngng v lng nh trn Web to ngun nh phong ph p ng
c ngun cung nh cho nhu cu ca con ngi. Mc d mt s my tm kim nh
ra i p ng phn no nhu cu tm kim nh, song nng cao cht lng tm kim
lun l vn c t ra. Bi ton xp hng nh l bi ton ct li ca cc my tm
kim nh, v nng cao cht lng xp hng nh v ang nhn c s quan tm
c bit.
u tin, kha lun kho st cc thut ton tnh hng nh, c bit l VisualRank
[39] theo o tng ng gia cc nh c tnh theo cc c trng ni dung vn
bn v ni dung hin th. Sau , kha lun xut mt m hnh h thng tm kim
nh lp trn (image meta-search engine [18] [11]), trong s dng thut ton ni trn
lm thnh phn xp hng nh. H thng tm kim nh ny s dng mt c s d liu
lu tr cc cu truy vn v cc nh tng ng vi chng nh mt gii php nhm rt
ngn thi gian p ng yu cu truy vn. ng thi, h thng s dng mt b t in
dng trong vic h tr cc truy vn dng ting Vit.
Thc nghim do kha lun tin hnh bc u thu c nhng kt qu tng
i kh quan, chnh xc ca h thng khi p dng thut ton vi c trng vn bn
v c trng hin th t 81.2%. Trong phm vi cc th nghim ca kha lun, kt qu
ny l tt hn so vi hai my tm kim nh ln l Google v Yahoo v khng nh
c tnh kh thi ca m hnh.


Mc lc

M u ............................................................................................................................ 1
Chng 1. Khi qut v cc thut ton tnh hng ..................................................... 3
1.1. Gii thiu v bi ton tnh hng ......................................................................... 3
1.2. Tnh hng trang Web ......................................................................................... 4
1.2.1. Tnh hng theo lin kt ................................................................................ 4
1.2.2. Tnh hng nh hng ng cnh ............................................................... 15
1.3. Tnh hng thc th ........................................................................................... 17
1.4. S b v tnh hng nh ..................................................................................... 18
1.5. Mt s cng trnh nghin cu lin quan .......................................................... 20
Tm tt chng mt..................................................................................................... 22
Chng 2. Mt s thut ton tnh hng nh ph bin ............................................. 23
2.1. Gii thiu ......................................................................................................... 23
2.2. VisualRank ....................................................................................................... 23
2.3. Multiclass VisualRank ..................................................................................... 26
2.4. Visual contextRank .......................................................................................... 28
2.5. Nhn xt ........................................................................................................... 32
Tm tt chng hai ...................................................................................................... 32
Chng 3. M hnh my tm kim nh lp trn ....................................................... 34
3.1. Kin trc chung ca my tm kim lp trn .................................................... 34
3.1.1. Giao din ngi dng ................................................................................ 35
3.1.2. B iu vn ............................................................................................... 35
3.1.3. B x l kt qu ........................................................................................ 36
3.1.4. M un tnh hng ...................................................................................... 36
3.2. M hnh my tm kim nh lp trn MetaSEEk .............................................. 37
3.2.1. Truy vn trc quan da trn ni dung ....................................................... 38
3.2.2. Giao din truy vn ..................................................................................... 38
3.2.3. B iu vn ............................................................................................... 40
3.2.4. Thnh phn hin th ................................................................................... 42
3.2.5. nh gi .................................................................................................... 43
3.3. Xp hng nh trong my tm kim nh lp trn .............................................. 43
Tm tt chng ba ....................................................................................................... 45


Chng 4. Th nghim ............................................................................................... 46
4.1. M hnh th nghim ......................................................................................... 46
4.1.1. Cch tip cn ............................................................................................. 46
4.1.2. M hnh xut v cc thnh phn trong m hnh ................................... 47
4.2. Mi trng v cc thnh phn trong h thng phn mm ............................... 50
4.2.1. Cu hnh phn cng................................................................................... 50
4.2.2. Cc thnh phn trong h thng phn mm ................................................ 50
4.3. Xy dng tp d liu ........................................................................................ 52
4.3.1. Tp truy vn .............................................................................................. 52
4.3.2. Tp my tm kim ngun .......................................................................... 53
4.3.3. T in ...................................................................................................... 53
4.4. Quy trnh, cc phng n th nghim ............................................................. 53
4.5. Kt qu th nghim v nh gi ...................................................................... 54
Kt lun ........................................................................................................................ 60
Ti liu tham kho ....................................................................................................... 62







Danh sch cc bng

Bng 1. V d v bn ghi ca mt nh trong c s d liu ........................................... 42
Bng 2. Cu hnh phn cng s dng trong thc nghim ............................................. 50
Bng 3. Mt s phn mm s dng ............................................................................... 50
Bng 4. Mt s th vin s dng ................................................................................... 50
Bng 5. chnh xc trung bnh trn 35 truy vn ........................................................ 56





Danh sch hnh v

Hnh 1. M t tnh cht authority v hub....................................................................... 13
Hnh 2. M rng tp c s T t tp nhn S ................................................................... 14
Hnh 3. Mt m hnh hc xp hng trong my tm kim thc th ................................ 18
Hnh 4. Mt minh ha v th tng ng ca nh ............................................... 24
Hnh 5. Bin i ma trn k ........................................................................................... 27
Hnh 6. Kt qu xp hng ca 3 phng php vi truy vn Notre Dame.................. 28
Hnh 7. M hnh xp hng nh s dng thut ton ContextRank ................................. 29
Hnh 8. Mt v d v biu din visual words ................................................................ 32
Hnh 9. Kin trc ca mt my tm kim lp trn in hnh ........................................ 34
Hnh 10. Mt thit k ca b iu vn ......................................................................... 35
Hnh 11. Kin trc tng th ca MetaSEEk ................................................................. 37
Hnh 12. Giao din hin th ca MetaSEEk .................................................................. 39
Hnh 13. Cu trc phn cp ca c s d liu ............................................................... 42
Hnh 14. M hnh xut .............................................................................................. 48
Hnh 15. Giao din ca chng trnh ............................................................................ 52
Hnh 16. Biu so snh chnh xc trung bnh gia cc h thng .......................... 57
Hnh 17. Biu chnh xc mc K ca mt s truy vn ting Vit ......................... 58
Hnh 18. 10 kt qu u tin ca truy vn sun trong cc my tm kim .................... 59



Danh sch cc t vit tt

CSDL C s d liu
AP Average Precision
Google CSE Google Custom Search Engine
HIST Hypertext Induced Topic Search
MAP Mean Average Precision
SIFT Scale Invariant Feature Transform






Danh sch cc thut ng

STT Thut ng ting Anh Ngha ting Vit
1 Content-based Image Ranking Xp hng nh da trn ni dung hin th
2 Content-based visual query
Truy vn trc quan da trn ni dung
hin th
3 Display interface Thnh phn hin th
4 Edge Cnh
5 Image tag Th nh
6 Inter-image Context Modeling M hnh ng cnh ngoi nh
7 Intra-mage Context Modeling M hnh ng cnh ni nh
8 Local features Cc thuc tnh cc b
9 Offline Ngoi tuyn
10 Online Trc tuyn
11 Performance database C s d liu hiu sut
12 Performance score im s hiu sut
13 Query dispatcher B iu vn truy vn
14 Query translator B dch truy vn
15 Random surfer model M hnh duyt ngu nhin
16 Re-rank Xp hng li
17 Scoring module M un tnh hng
18 Text-based Image Ranking Xp hng nh da trn vn bn
19 Texture Kt cu
20 Title Tiu
21 Topic Sensitive PageRank PageRank theo ch
22 Visual hyperlink Siu lin kt trc quan
23 Visual vocabulary Tp t vng trc quan
1

M u
Tnh hng cc i tng trn Web (trang Web, thc th... ni chung v tnh hng
nh ni ring) l bi ton c ngha quan trng trong lnh vc tm kim. S hnh thnh v
pht trin khng ngng ca my tm kim gn hai thp k qua ko theo mt s lng
khng nh cc cng trnh nghin cu v tnh hng trang Web c cng b, trong
thut ton PageRank tr thnh mt trong mi thut ton khai ph d liu in hnh
nht. Thi gian gn y, cc cng b cng trnh nghin cu v tnh hng thc th cng
nh tnh hng nh c xu th tng nhanh.
Thut ton tnh hng nh thng c pht trin trn c s cc thut ton tnh hng
trang Web, bao gm c cc gii php hng ng cnh, hng ngi dng hoc ch da
trn th lin kt. Chng ti cng tin hnh mt s nghin cu lin quan trong cng
trnh nghin cu khoa hc sinh vin.
Kha lun tt nghip vi ti Mt s thut ton phn hng nh ph bin v p
dng trong h thng tm kim nh lp trn th nghim nhm kho st, phn tch cc gii
php phn hng nh, ng thi trnh by mt m hnh my tm kim nh lp trn v thi
hnh gii php phn hng nh trong my tm kim nh lp trn th nghim.
Kha lun gm nhng ni dung chnh c bn nh sau:
Chng 1: Khi qut v cc thut ton tnh hng trnh by mt s thut ton tnh
hng trang in hnh v ang c s dng rng ri trong cc my tm kim. Cng vi
, chng ny cng nu ln mt s nt c bn v bi ton xp hng thc th v xp hng
nh. ng thi, chng 1 cng cp n mt s cng trnh nghin cu lin quan trong
nc v trn th gii.
Chng 2: Gii thiu mt s thut ton tnh hng nh ph bin tp trung trnh
by mt s thut ton tnh hng nh da trn ni dung hin th ca nh. Mi thut ton u
c phn tch, nh gi, a ra cc u nhc im. T , kha lun xut thut ton
tnh hng nh p dng VisualRank cho cc c trng hin th v c trng vn bn ca
nh.
Chng 3: M hnh my tm kim nh lp trn trnh by m hnh tng quan ca
mt my tm kim lp trn. ng thi, chng 3 i chi tit vo mt m hnh tm kim nh
lp trn MetaSEEk tm hiu cc thnh phn cn thit trong h thng my tm kim nh
2

lp trn. T , nh hnh ra nhng thnh phn cn phi xy dng m hnh my tm kim
nh lp trn nh xy dng.
Chng 4: Thc nghim a ra m hnh my tm kim nh lp trn p dng th
nghim thut ton c xut chng 2. Chng ny trnh by cc thnh phn ca
m hnh v cc cng vic thc nghim m kha lun tin hnh. T nhng kt qu t
c, tin hnh nh gi, so snh vi cc h thng khc.
Phn kt lun tm lc cc kt qu t c v nu r ng gp ca kha lun,
ng thi nh hng mt s hng nghin cu tip theo trong thi gian sp ti.

3

Chng 1. Khi qut v cc thut ton tnh hng
Xp hng l mt bi ton ph bin, c ngha quan trng v c nhiu ng dng
trong thc t. Chng ny tp trung lm r khi nim v bi ton tnh hng tng qut,
ng thi trnh by mt s thut ton tnh hng trang in hnh v gii thiu s b v bi
ton tnh hng nh.
1.1. Gii thiu v bi ton tnh hng
Xp hng cc i tng theo tiu ch no (n gin nh xp hng cc hc sinh
trong mt lp theo im trung bnh, xp hng cc trng i hc) l cng vic ht sc
cn thit trong nhiu ng dng, c bit l vic xp hng cc kt qu tr v ca my tm
kim. Xp hng cc i tng l sp xp cc i tng theo ph hp vi tiu ch ty
vo tng ng dng c th. Do cn phi xc nh php o v ph hp ca mt i
tng tm c vi yu cu ca ngi dng theo cc tiu ch t ra [1] [2] [3] [4].
Mt in hnh ca bi ton xp hng i tng l vic xp hng cc i tng tr v
ca my tm kim. Trong cc my tm kim thng thng (nh Google, Yahoo) quan
trng hay cn gi hng trang (PageRank) l i lng c s xp hng. Gi tr c s ca
hng trang c tnh ton da trn vic phn tch mi lin kt gia cc trang Web. Xp
hng l cng vic cui cng trong mt my tm kim nhng cng khng km phn quan
trng. Vi tp cc ti liu = J
1
, J
n
v truy vn o ca ngi dng, my tm kim cn
tm nhng ti liu trong ph hp vi o. Qu trnh xp hng l qu trnh sp xp cc ti
liu m my tm kim tm c theo ph hp vi truy vn v quan trng gim
dn. Vic xc nh hm tnh hng ng vai tr quan trng v quyt nh i vi cht
lng ca my tm kim. Lin quan ti vic xc nh hm tnh hng, ngi ta quan tm ti
hai hng gii quyt:
Hng th nht s dng hng trang ca trang Web lm ph hp vi yu cu
ngi dng. Hu ht cc nghin cu u tha nhn mt gi thit l nu mt trang
Web m c nhiu trang Web khc lin kt ti th trang Web l trang Web quan
trng. Trong trng hp ny, hng trang c tnh ton ch da trn mi lin kt
gia cc trang Web vi nhau. Mt s thut ton in hnh theo hng ny l
PageRank, Modified Adaptive PageRank.
Hng th hai coi ph hp ca trang Web vi cu truy vn ca ngi dng
khng ch da trn gi tr hng trang Web m cn phi tnh n mi lin quan
4

gia ni dung trang Web vi ni dung truy vn theo yu cu ca ngi dng.
Khi , hm tnh hng l hm kt hp ca gi tr tng t gia ti liu vi truy
vn similority(o, J

) v hng trang. Cc thut ton xp hng theo hng ny


c gi l cc thut ton xp hng nh hng ng cnh. Mt thut ton xp
hng nh hng ng cnh in hnh l PageRank theo ch (Topic Sensitive
PageRank).
Vi cc ng dng m kt qu tr v l mt danh sch cc i tng cn c sp
xp, xp hng gip ngi dng nhanh chng tip cn vi kt qu gn vi yu cu ca
mnh nht c th. iu cho thy, xp hng l mt bi ton quan trng v c ngha.
Sau y, chng ta s nghin cu mt s phng php tnh hng trang Web, cc phng
php ny hoc l phng php c bn u tin, hoc l ang c p dng trn mt s
my tm kim in hnh trn Internet nh Google, Yahoo!
1.2. Tnh hng trang Web
Nh ni trn, lin quan ti vn xc nh o quan trng ca mt trang
Web vi yu cu ngi dng ngi ta quan tm ti hai hng gii quyt: hng gii
quyt th nht khng quan tm ti vai tr ca cu hi trong xp hng, ngc li hng
gii quyt th hai lin quan trc tip vi cu hi ca ngi dng. Tng ng vi hai
hng gii quyt trn l cc thut ton xp hng da theo lin kt gia cc trang Web v
cc thut ton xp hng nh hng ng cnh. Phn ny s trnh by mt s thut ton
in hnh ca c hai hng trn.
1.2.1. Tnh hng theo lin kt
1.2.1.1. PageRank
PageRank [30] l mt thut ton phn tch lin kt (link) c Lary Page v cng
s pht trin ti trng i hc Stanford (M) v c s dng cho my tm kim
Google. Mt cch trc gic, chng ta c th thy rng trang ch ca Yahoo! th quan
trng hn trang ch ca mt c nhn A no . iu ny c phn nh qua s lng
cc trang c lin kt n trang ch ca Yahoo! nhiu hn s trang c lin kt ti trang
ch ca c nhn A. Do , ta c th dng s lng cc lin kt n mt trang tnh
quan trng ca trang . Tuy nhin, cch ny s khng hot ng tt khi ngi ta
c th d dng to ra cc trang Web c lin kt n mt trang Web no v nh vy
hng ca trang ny s tr nn cao hn.
PageRank pht trin thm vo tng c bng cch ch n quan trng ca
cc trang Web lin kt n trang Web m ta ang xt. Phng php ny tha nhn nu
5

c lin kt t trang A ti trang B th quan trng ca trang A cng nh hng (c
san s) ti quan trng ca trang B.
PageRank n gin
Gi 0 l mt th cc trang Web. t 0 = (I, E) vi I = {1, 2, n] l tp n
nh ca th 0 (mi nh l mt trang Web cn tnh hng trang) cn E l tp cc
cnh, E = {(i, j) / nu c siu lin kt t trang i ti trang j}. Chng ta gi thit rng
th trang Web l lin thng, ngha l t mt trang bt k c th c ng lin kt ti
mt trang Web khc trong th .
Cho mt th trang Web 0 nh trn. Vi mi trang Web i, k hiu N(i) l s
lin kt i ra t trang Web th i v B(i) l s cc trang Web c lin kt n trang i.
Khi hng trang r(i) ca trang Web i c nh ngha nh sau:
r(i) = `
r(])
N(])
]eB()
(1.1)
Vic ta chia cho N(]) cho thy rng nhng trang c lin kt ti trang i s phn
phi hng ca chng cho cc trang Web m chng lin kt ti.
Cc phng trnh ny c vit li di dng ma trn r = rP trong :
r = |r
1
, r
2
, , r
n
] l vector PageRank, vi r

l hng ca trang Web


i trong th trang Web.
P l ma trn chuyn n n vi gi tr cc phn t c xc nh:
o
]
= |
1N

nu co lin kt t i n ]
u ngc li

T cng thc PageRank c vit li:
r = rP (1.2)
Phng trnh trn cho thy vector PageRank r chnh l vector ring ca ma trn
chuyn P tng ng vi gi tr ring z = 1. Trong i s tuyn tnh c mt s phng
php tnh vector ring ca ma trn, tuy nhin do kch thc qu ln ca ma trn ang xt,
khi thi hnh cc tc gi [30] s dng phng php lp tnh ton vector PageRank
Tnh ton PageRank
Nh ni trn, mt trong nhng cch thc n gin nht tnh vector ring ca
ma trn c th c thc hin thng qua vic lp php nhn mt vector bt k vi ma trn
cho n khi no vector hi t. u tin, chng ta s gn cho vector PageRank mt
6

gi tr khi to bt k. Sau , ta thc hin php nhn vector ny vi ma trn cho mt
cch lin tc cho ti khi n t ti iu kin hi t th dng li. Vector thu c chnh l
vector PageRank cn tnh.
Quy trnh tnh ton c din t nh sau:
1. s - vector bt k
2. r - sP
3. nu |r - s| < e th kt thc(e l s dng rt b, c gi l sai s lp). r l
vector PageRank
nu khng s - r, quay li bc 2.
Gi tr hi t ca ma trn i vi vng lp ty thuc vo khong cch ca hai gi
tr ring c gi tr ln nht (ni cch khc l hiu ca hai gi tr ring ln nht). Page v
Brin khng nh rng vng lp hi t kh nhanh, trong khong 100 vng lp.
M hnh duyt ngu nhin
Qu trnh tnh ton PageRank c th c xem nh hnh ng ca mt ngi ang
duyt Web. Ta tng tng rng c mt ngi dng duyt Web bng cch i theo cc lin
kt trn cc trang Web m h ving thm mt cch ngu nhin. Cch duyt ngu nhin
ny tng ng vi vic di chuyn ngu nhin trn mt th c hng. N th hin
rng vector PageRank t l vi phn phi xc sut dng ca mt qu trnh ngu nhin.
PageRank ca mt trang Web chnh l xc sut mt ngi ngu nhin duyt trang Web
.
PageRank trong thc t
Trn thc t c nhiu trang Web khng c lin kt n hoc khng c lin kt ra.
Cc trang Web ny c th l cc trang ch cha mt bc nh, mt file pdf, mt bng d
liu hay c th l mt trang m cc trang lin kt ca n cha c my tm kim ko
v. Cc trang c lp nh vy c gi l cc dangling nodes [9]. Trong trng hp ,
khi gii phng trnh (1.2) cc dangling nodes s phi chu mt hng bng 0, v ta
khng th tnh c quan trng ca trang Web . iu ny l khng ph hp vi thc
t, v bt k trang Web no c xy dng cng mang mt ng ngha no , tc l c
quan trng dng.
7

V th Web trn thc t l khng lin thng nn trong ma trn P vn tn ti hng
ch ton s 0, do khng tn ti mt phn phi xc sut dng n nh ca P hay chnh l
vector hng trang.
Page v cng s [30] xut x l cc vn ny bng cch thay th cc hng ch
ton s 0 trong P bi mt vector xc nh xc sut phn phi :, vi :

l xc sut trang
Web i c gi n ln duyt u tin. Khi khng xt n ng cnh, : c th c chn
gi tr : = |
1
n
|
n1
.
Gi a l vector n 1 vi:
o

= |
1 n

u N(i) = u
u ng cloi

Ma trn P c bin i thnh ma trn P
i
:
P
i
= P + o: (1.S)
m bo phn phi dng n nh (duy nht), cng thc tnh PageRank c
iu chnh bng vic thm vo mt h s hm J cho ph hp, J s nhn gi tr trong
khong [0,1]. Vi nh ngha mi ny, ch mt phn nh l J trong gi tr hng ca trang
Web c phn phi gia cc nt c lin kt ti n. Gi tr cn li trong hng trang s
c phn b u gia cc trang trn Web. Cng thc PageRank c sa i c dng:
r(i) = J - `
r(])
N(])
+
1 -J
n
]cB
i
(1.4)
Ma trn Markov c xc nh li nh sau:
P
ii
= JP
i
+(1 -J): (1.S)
Vic thm h s hm J (theo thc nghim thng c chn J = u.8S) c
ngha nh vic b sung thm gi tr hng trang cho nhm cc trang khng c lin kt ra
ngoi. Cng thc PageRank nguyn thy chnh l trng hp c bit ca gi tr
PageRank va nu khi J = 1.
Reodering PageRank
Langville v Meyer [9] ch ra rng, vic b i cc dangling nodes trong qu trnh
tnh hng c th lm cho kt qu tnh hng khng cn chnh xc na. Bi v mt s
dangling nodes c th c PageRank cao. V d nh mt file pdf c ni dung tt c th
c nhiu lin kt tr ti t cc ngun v do n c th nhn c th hng cao.
8

Langville v Meyer xut mt gii php khc gii php ca Page v cng s [30]
gii quyt vn trn gi l thut ton Reodering PageRank [8] [9]. Phng php ca
Langville v Meyer a ra l s dng mt h thng tuyn tnh trong vic khai thc cc
dangling nodes gim s tnh ton, v do to ra mt ma trn c cc phn t c
sp xp li mt cch thch hp.
Theo [9], vector PageRank c tnh theo cng thc sau:
r(I -JP) = : (1.6)
Trong I l ma trn n v, (I - JP) l mt ma trn h s, cc tnh cht ca
(I -JP) c trnh by chi tit trong [8]. Chng ta cn ch tnh cht cui cng c
pht biu nh sau:
- Mt hng ca ma trn nghch o (I -JP)
-1
ng vi dangling node i l mt
vector chuyn v c

1
, vi c

l ct th i ca ma trn n v I.
Tnh cht ny lm cho cc tnh ton ca vector PageRank c bit hiu qu. Chng
ta gi s rng cc hng v ct ca ma trn P c bin i sao cho cc hng ng vi cc
dangling nodes nm y ca ma trn. Khi ma trn P c dng:
P =
N
N

I
P
11
P
12
u u
]
Vi ND l tp cc nt khng phi l dangling nodes v D l tp cc dangling
nodes. T , vector hng trang PageRank c th c tnh bi cng thc:
r = (:
1
(I - JP
11
)
-1
|J:
1
(I - JP
11
)
-1
P
12
+ :
2
) (1.7)
Vi vector : c tch thnh hai phn: vector nondangling :
1
v vector
dangling :
2
. Chng ta tip tc thc hin vic bin i a cc hng bng 0 v y
ca ma trn i vi cc ma trn con P
11
v P
12
v tip tc chia nh cc ma trn ny ging
nh lm vi ma trn P. Vic bin i ny c thc hin lp i lp li i vi cc ma
trn con nh hn cho n khi gp cc ma trn con khng c hng bng 0. Khi vic bin
i cc ma trn kt thc, vector hng trang PageRank c tnh mt cch quy nh
sau:
1. Tnh r
1
trong phng trnh r
1
(I -JP
11
) = :
1

2. Tnh r
2
= Jr
1
P
12
+ :
2
.
3. Tnh r
3
= Jr
1
P
13
+ Jr
2
P
23
+ :
3
.
.
9

4. Tnh r
b
= Jr
1
P
1b
+ Jr
2
P
2b
++Jr
b-1
P
b-1,b
+ :
b
.
5. Tng hp r = |r
1
r
2
r
b
] |r
1
r
2
r
b
|
1
.
Phng php sp xp li ma trn PageRank do Langville v Meyer xut s dng
cc php bin i i s chia ma trn P thnh cc ma trn con nh hn, v sau tnh
vector hng trang cho tng ma trn con nn c thi gian tnh ton kh nhanh, v do c
th p dng tt cho mt th Web rt ln. Qua thc nghim cho thy, phng php ny
c tc hi t nhanh hn hoc bng so vi tc hi t ca phng php PageRank
nguyn thy.
nh gi PageRank
Theo [9] PageRank l mt phng php tnh hng kh tt v c qu trnh tnh ton
c lp vi ngi dng nn c th thc hin c lp v khng nh hng n tc tm
kim. Phng php PageRank c ci t trn my tm kim Google mang li kt qu
rt kh quan. Tuy nhin, v thut ton ch quan tm n cc lin kt gia cc trang Web
m khng quan tm n ni dung trang Web nn c th d b nh la bi cc cng ngh
spam. Do vy, yu cu t ra l cn phi ci tin tc tnh ton PageRank v quan tm
hn na ti ni dung ca cc trang Web i vi truy vn ca ngi dng.
1.2.1.2. Modify Adaptive PageRank
PageRank l mt phng php tt v hiu qu nhm nh gi hng cc trang thng
qua vic phn tch cc lin kt gia cc trang Web. Vic tnh ton gi tr PageRank cho
ton b cc trang Web c thc hin thng qua vic tnh vector ring ca ma trn k biu
din cho lin kt gia cc trang Web. Tuy nhin, vi kch c khng l ca WWW, cng
vic tnh ton ny c th tn thi gian nhiu ngy. V vy, yu cu t ra l cn phi tng
tc tnh ton hng trang. Yu cu ny l v hai l do:
Cn sm c c kt qu tnh ton a nhng thng tin hng trang sang cc
thnh phn khc ca my tm kim, vic tnh ton nhanh vector PageRank c th
gip tn dng c thi gian ri ca nhng b phn .
Hin nay, cc phng php nghin cu mi u tp trung vo vic nh gi da trn
nhng tiu ch c tnh n s quan tm ca ngi dng, do vy cn phi tnh ton
nhiu vector PageRank, mi vector hng ti mt tiu khc nhau. Vic tnh ton
nhiu vector ny cng i hi mi vector thnh phn cn c tnh ton nhanh
chng.
Mt s phng php tng hiu nng tnh ton ca thut ton PageRank c
xut. Mt trong cc phng php tng tc tnh ton ph bin hin nay l Modified
10

Adaptive PageRank c gii thiu bi Sepandar Kamvar v cng s [32]. tng
ca xut ny da trn nhn xt: trong qu trnh chy chng trnh, quan trng cc
trang Web c tc hi t khng ging nhau, c nhng trang Web c tc hi t nhanh,
c trang li c tc hi t chm. V vy ta c th tn dng nhng trang hi t sm, v
kt qu quan trng ca nhng trang hi t c th khng cn phi tnh tip na.
iu ny cho php gim c nhng tnh ton d tha, v do lm tng c hiu sut
tnh ton ca h thng. Nh vy, phng php ny thc cht l mt ci tin ca phng
php PageRank, phng php ny c th lm tng tc tnh ton bng cch gim i
nhng tnh ton d tha.
Phng php Adaptive PageRank
Nh gii thiu trn, vic tnh ton vector ton cc PageRank cho cc trang Web
c thc hin bng phng php lp. Ta gi s rng vic tnh ton vector PageRank
c thc hin n vng lp th k v bc tnh ton tip theo:
x
(k+1)
= Ax
(k)
(1.8)
Gi C l tp hp cc trang Web c gi tr hng trang hi t n mc e no v
N l tp hp cc trang Web c gi tr hng trang cha hi t. Khi , ta chia ma trn A ra
lm hai ma trn con, A
N
c m n l ma trn k i din cho nhng lin kt ca m trang
cha hi t, cn A
C
c (n -m) n l ma trn k i din cho nhng lin kt ca
(n - m) trang hi t.
Tng t, ta cng chia vector x
(k)
ti vng lp th k ra thnh 2 vector: X
N
(k)
tng
ng vi nhng thnh phn ca x
(k)
hi t, cn X
C
(k)
tng ng vi nhng thnh phn
ca x
(k)
cha hi t. Ma trn A v vector x
(k)
c vit li di dng sau:
X
(k)
= |
X
N
(k)
X
C
(k)
+ v A = |
A
N
A
C
1
V phng trnh (1.8) c vit li nh sau:
|
X
N
(k+1)
X
C
(k+1)
+ = |
A
N
A
C
1 |
X
N
(k)
X
C
(k)
(1.9)
Do nhng thnh phn ca X
C
(k)
hi t, do vy ta khng cn tnh X
C
(k+1)
na v
nh vy vic tnh ton s c gim i ng k do khng phi tnh ton A
C
X
(k)
na m
ch cn tnh:
X
N
(k+1)
= A
N
X
(k)
(1.1u)

X
C
(k+1)
= X
C
(k)
(1.11)
11

Ci tin Adaptive PageRank
V kch thc ca WWW rt ln nn vic sp xp li ma trn A to ma trn con
A
N
s kh c th thc hin c trong mi vng lp. Hn na, khng c cch hiu qu
pht l i nhng u vo khng cn thit (chnh l nhng lin kt ti cc trang hi t),
do vy trong thc t vic ci t thut ton c th c thc hin nh sau:
nh ngha ma trn A' nh sau:
A' = |
u nu i e C
A
]
ngc li

(X'
C
(k)
)

=
X

(k)
nu i e C
u ngc li

Ngha l ta s nhn c ma trn A' khi thay hng th i ca ma trn A bi 0 nu
i e C. iu ny c ngha nh chng ta s lc i nhng phn t ca A
C
v chng khng
cn cn thit cho cng vic tnh ton na.
Phng trnh (1.8) c vit li nh sau:
X
(k+1)
= A'X
(k)
+X
i
C
(k)
(1.12)
Ma trn A' m chng ta nhn c c s chiu ging nh ma trn A, tuy nhin ma
trn A' tha hn rt nhiu so vi ma trn A (c nhiu phn t 0 hn m cng vic tnh ton
vi s 0 rt n gin) nn thi gian tnh ton s tr nn nhanh hn so vi vic sp xp li
ma trn i din cho cc lin kt gia cc trang Web c ma trn con A
N
v A
C

tng chnh ca Adaptive PageRank l lm gim nhng tnh ton d tha bng
vic tnh ton li PageRank theo cc phng trnh (1.10) v (1.11). Tuy nhin trong [32]
gii thiu chi tit hn v vic tng tc tnh ton bng cch chia nh ma trn A thnh
bn ma trn con.
Ma trn A c vit li nh sau:
A = |
A
NN
A
NC
A
CN
A
CC
1 (1.1S)
Vi A
NN
l ma trn k i din cho nhng lin kt ca cc trang c gi tr PageRank
cha hi t ti nhng trang c gi tr PageRank cha hi t, A
CN
l ma trn k i din
cho nhng lin kt ca cc trang c gi tr PageRank hi t ti nhng trang c gi tr
PageRank cha hi t, v tng t cho cc thnh phn khc A
NC
, A
CC
.
V X
C
v A
NC
X
C
khng thay i sau vng lp th k do chng hi t, nn phng
trnh (1.8) c th c vit li nh sau:
X
N
(k+1)
= A
NN
X
N
(k)
+A
CN
X
C
(k)
(1.14)
12

Ma trn A c chia nh ra, ng thi khng phi tnh li gi tr mt s ma trn
con, do vy cng vic tnh ton c th c gim i ng k. Hn na vic tnh ton A
CN

cng khng cn phi tin hnh thng xuyn m c th xem xt chng mt cch nh k.
nh gi
Vic chia nh v lc ma trn A khng nhng gim i c nhng tnh ton d tha
khng cn thit, m cn gim i vic c cc u vo v ghi cc gi tr u ra khng cn
thit, gip nng cao hn hiu sut tnh ton. Hn na phng php ny cn gip gim
c chi ph tn km v b nh khi thc hin cng vic tnh ton. Nhng kt qu thc
nghim trong [32] cho thy thi gian tnh hng c th c gim i ti hn 20% so vi
thut ton PageRank nguyn thy.
1.2.1.3. HITS
Phng php HITS (Hypertext Induced Topic Search), do Kleinberg xut [23],
tnh hng ca mt trang Web khng ch da trn mt gi tr quan trng nh
PageRank m mi trang Web c xc nh hai trng s khc nhau: authority v hub.
Authority pages: L nhng trang c xem l ph hp nht i vi mi cu truy
vn c th no . V d, trang ch ca Yahoo chnh l trang authority ca cu truy
vn yahoo.
Hub pages: L nhng trang khng cn c c tnh authority nhng li tr ti
nhiu trang c c tnh authority. V d nh trang Searchenginewatch.com l mt
trang hub v n lin kt ti nhiu trang ch ca my tm kim. Trang hub c
ngha kh quan trng, th nht bi v n c nhng thng tin c th c s dng trong
vic tm kim nhng thng tin hu ch, th hai bi v n c s dng trong thut ton
HIST tnh ton authority. V trang hub mang ngha l trang tr ti nhiu trang
authority nn nu mt trang authority tt c th c coi l trang c nhiu hub
ch ti.
Gii thut HITS
Thut ton HITS khng lm vic trn ton b th Web m ch lm vic trn
mt tp nh cc trang Web v kt hp chng thnh mt th cc trang Web (gi l
th con). Thut ton khng hon ton c lp vi ngi dng nh phng php
PageRank m ty thuc vo cu truy vn ca ngi dng, vi mi cu truy vn khc
nhau cng vic tnh ton phi c thc hin li. Tuy nhin, cu truy vn ch c vai tr
trong vic to th con ch khng nh hng ti phng php tnh ton. V vy,
trc tin phi xy dng th con cc trang ty theo truy vn v sau phn tch cc
13

lin kt gia cc trang trong th xc nh cc gi tr authority v hub ca cc
trang.

Hnh 1. M t tnh cht authority v hub
Xc nh th con:
th con cc trang Web hay cn gi l tp c s S c th c xy dng theo
gii thut sau:
1. R - tp t trang Web c cha xu truy vn (tp nhn)
2. S - R
3. Vi mi trang p thuc R
(a) Thm cc trang c lin kt n bi p vo S
(b) Thm cc trang Web c lin kt n p vo S (ti a l d trang)
4. th to bi S chnh l th con cn tm.
Vic tm tp nhn lin quan n truy vn c th xc nh da vo kt qu tm
kim ca cc my tm kim khc nh Google. V d, tp nhn c th c ly t cc
trang u tin, c th l 10 a ch trang Web u tin c tr v tng ng vi truy
vn. Hoc l cc trang c a ch cha ni dung truy vn, v d vi truy vn java th
trang ch l http://java.sun.com. Cc trang Web trong th con S cng c nh ch
s t 1 n n v th c biu din bi ma trn k A.
14


Hnh 2. M rng tp c s T t tp nhn S
Tnh authority v hub:
Cc trng s authority o

v hub b

ca mi trang Web c khi to bng 1 v


sau s c tnh da theo cng thc:
o

= b
] ]eB()
v b

= o
] ]eN()
(1.15)
Gi A l ma trn k biu din lin kt gia cc trang Web vi:
o
]
= |
1 nu co lin kt t trong i n trong ]
u ngc li

Biu din 1.15 theo ma trn ta c:
o = A
1
b

v b

= Ao (1.16)
Trong : o = (o
1
, o
2
, o
n
), b

= (b
1
, b
2
, b
n
) ln lt l vector trng s
authority v hub ca cc trang trong tp S.
T 1.16 ta bin i c:
o = AA
1
o v b

= A
1
Ab


Vy cng tng t nh phng php PageRank, vector o, b ln lt l vector ring
ca cc ma trn AA
1
v A
1
A. Do vy, tng t phng php tnh PageRank, c th p
dng tnh cht hi t tnh vector o, b. Vector o, b thng c chun ha: o =

b = 1

.
15

Kleinberg [23] ch ra s hi t ca cc trng s authority v hub tc thut
ton tha mn tnh dng nhng cha a ra c gii hn s vng lp cn tnh. Tuy
nhin, thc nghim cho thy thut ton nhanh chng hi t.
nh gi
Theo [9], thut ton HITS c phn hng ngi dng do s dng thng tin truy vn
cht lc nhng trang Web c ni dung lin quan n xu truy vn xy dng tp con S
cc trang Web. Thut ton th hin mi quan h cht ch gia cc trang mang tnh ch
(authority) v trang trung tm (hub).
Tuy nhin, thut ton HITS li gp phi vn kh kh khn l cn tnh ton trc
tuyn (online), ngha l ch khi my tm kim nhn c cu truy vn ri th con mi
c xy dng v sau cc trng s authority, hub mi c tnh. iu ny lm
chm thi gian tr kt qu v cho ngi dng. Nhng chng ta c th ng dng thut ton
HITS trong cc phng php c xc nh link spam sau ny nhm tnh nh hng ca
cc trang xu ti cc trang khc khi xc nh c tp nhn cc trang xu.
1.2.2. Tnh hng nh hng ng cnh
1.2.2.1. PageRank theo ch
PageRank l phng php xp hng hiu qu v hin ang c p dng trn my
tm kim Google. Tuy nhin, phng php ny ch quan tm n cc lin kt m khng
quan tm n ni dung ca trang Web c cha lin kt , do vy c th dn ti nhng sai
lc trong thng tin tm kim c. Yu cu t ra l cn phi tm kim mt phng php
c tc nhanh nh phng php PageRank v li c quan tm n ni dung ca trang
Web c cha nhng lin kt cn thit. Hn na, nu khai thc c mi quan tm ca
ngi dng i vi cc trang Web trong vic tnh ph hp ca trang Web vi cu hi
ngi dng th vic cng c ngha. Nhm p ng nhng yu cu trn, Taher H.
Haveliwala [35] xut phng php PageRank theo ch (Topic sensitive
PageRank) s dng khi nim phm vi ng cnh biu th mi quan tm ca ngi
dng. Phng php nm c quan trng ca cc trang Web, cho php tm kim theo
ng cnh, v iu quan trng l c th tm kim nhng trang ph hp vi ni dung truy
vn ca ngi dng vi tc cho php.
Thut ton gm hai bc c m t s b nh sau.
o Bc u tin c thc hin ngoi tuyn (offline) trong sut qu trnh tin x l
ca b tm duyt v hon ton c lp i vi nhng truy vn nh phng php
16

PageRank thng thng. Ti bc ny, cc trang Web trong c s d liu c
phn thnh cc lp theo cc ch c
1
, c
2
, , c
n
; gi I
]
l tp hp nhng trang
Web theo ch ca c
]
. Mi lp tng ng vi mt vector PageRank ca mi
trang trong lp. Vector PageRank ca ch c
]
c tnh bng : = :
]
trong
:
]
=
1
|I
]
|
nu i e I
]
u ngc li
(1.17)
Gi
]

l vector cc t kha, gm tt c cc t kha trong cc ti liu ca cc ch


;
]t
l s ln xut hin ca t kha t trong tt c cc ti liu ca ch c
]
.
o Bc th hai ca thut ton c thc hin trong thi gian truy vn, ngha l khi
my tm kim nhn c cu truy vn ca ngi dng th mi thc hin cng
vic tnh ton quan trng cho cc trang. Gi s chng ta c truy vn o, gi o
i

l phm vi ng cnh ca o. Phm vi ng cnh ngha l nu truy vn o c yu
cu bng cch t sng t kha o trong trang Web u no th o
i
s cha cc t
kha trong u bao gm c o. Vi truy vn bnh thng khng tm theo ng cnh
th o
i
= o. Sau ta tnh xc sut o
i
thuc v cc ch khc nhau. Bc ny
c th coi nh l bc phn lp xem xt o
i
thuc v lp no trong cc lp ch .
S dng thut ton phn lp Bayes vi:
Tp hun luyn: gm nhng trang c lit k trong cc ch .
u vo: cu truy vn hoc phm vi ng cnh ca cu truy vn.
u ra: xc sut u vo thuc mi ch .
Gi o

i
l t kha th i trong ng cnh o
i
. Vi mi lp c
]
, xc sut o
i
e c
]
l:
P(c
]
|o
i
) =
P(c
]
). P(o
i
|c
]
)
P(o
i
)
= P(c
]
). |P(o

i
|c
]
) (1.18)


Trong P(o

i
|c
]
) c tnh t vector cc t kha
]

c nh ngha trn.
Gi tr P(c
]
) c xc nh hoc l cc gi tr bng nhau cho mi ch hoc c
th lm nh sau: chng ta gi s rng c k ngi dng, ta s bit c s ln m
ngi dng ny c cu truy vn lin quan n ch no, t c th tnh c
P
k
(c
]
); ri t hp cc gi tr ny th nhn c P(c
]
).
Gi ronk
]d
l hng ca vn bn d cho bi vector PR(J, :
]
) vector PageRank
ca ch c
]
th quan trng S
qd
da theo cu truy vn c tnh nh sau:
17

S
qd
= `P(c
]
|o
i
)
]
. ronk
]d
(1.19)
Phng php PageRank theo ch c th cho nhng kt qu tnh ton chnh xc
hn v n da trn c nhng lin kt v ni dung trang Web. Tuy nhin, phng php ny
cng gp phi nhng tr ngi l: vic phn chia cc ch c th khng y , khng
bao hm c tt c cc ch ; vn ny c th gii quyt bng cch tng thm cc ch
nhng vic tng thm cc ch chc chn s lm tng thi gian tnh ton...
1.3. Tnh hng thc th
Tm kim thc th trn Web l mt hng i mi da trn tm kim vn bn thng
thng. Cng vi s pht trin ca cc k thut trch rt thng tin, cc my tm kim thc
th ngy cng nhn c nhiu s quan tm nghin cu ca cc nh khoa hc. Vi my
tm kim thc th, ngi dng c th d dng tm c thng tin v mt i tng no .
V d, i vi truy vn cc trng i hc Vit Nam, my tm kim thc th s tr v
danh sch tn cc trng i hc Vit Nam ng nh mong mun ca ngi dng.
Trong khi , cc my tm kim thng thng s tr v danh sch cc trang Web c cha
t kha trong truy vn. Do vy, ngi dng s phi duyt qua ni dung nhiu trang Web
m khng chc chn s c c thng tin mong mun nhng kt qu u tin. Kt qu
tr v ca my tm kim thc th l cc thc th ca i tng cn tm, mi thc th c
xc nh khng ch xt trn mt trang c lp m c th c tng hp qua nhiu trang
Web. V th, vn a cc thc th ph hp vi truy vn nht ln u tin trong danh
sch tr v cho ngi dng l rt quan trng. Hay ni cch khc, xp hng thc th l vn
ct li ca my tm kim thc th.
Bi ton xp hng thc th c pht biu nh sau:
Gi E = {c
1
, c
2
, , c
n
] l tp cc thc th c trch ra t cc trang Web. Mi thc
th c

c biu din bi cc cp (<thuc tnh>,<gi tr>). nh ngha J(c

) = {I

, P

]
l mt m t ca thc th c

, trong I

l nh danh thc th: I

= iJ(c

) v tp cc
c tnh P

= {(o
1

, :
1

) (o
n

, :

)] l tp cc cp (<thuc tnh>,<gi tr>). V d, trng


i hc Cng Ngh c ID l DHCN v cc c tnh nh l (tn, i hc Cng Ngh),
(nm_thnh_lp, 2005)
Truy vn o = {(o
1
, :
1
) (o
n
, :
n
)] l mt tp cc cp (<thuc tnh>,<gi tr>) th
hin yu cu ca ngi dng tm kim cc thc th c cc gi tr ng vi cc thuc tnh
o
1
, , o
n
.
18

Vi u vo l mt tp cc m t thc th = {J(c
1
) J(c
n
)] v mt truy vn q,
u ra ca mt h thng xp hng thc th l mt danh sch cc thc th c xp
hng E = {c

c
]
]. ph hp ca thc th c

i vi truy vn q c xc nh bi
scoring_unctionO(o, J(c

)).
Gi tr ca O(o, J(c

)) c dng xp hng cc kt qu tr v, do vic xc


nh hm O(o, J(c

)) l vn quan trng. Vi mi bi ton xp hng thc th cho mi


loi i tng s c mt s thut ton xp hng thc th ph hp vi bi ton ty thuc
vo cc thuc tnh ca i tng cn tm.

Hnh 3. Mt m hnh hc xp hng trong my tm kim thc th [4]
1.4. S b v tnh hng nh
Cng vi s bng n thng tin trn Web v s pht trin ca cng ngh k thut
s, lng nh lu tr trn Web cng tng mt cch nhanh chng. Mi ngy, c hng
triu bc nh c ng ti trn cc trang nh trc tuyn nh: Flickr
1
, Photobucket
2
,
Facebook
3
. Theo thng k, c 10 t nh trn Facebook (tnh n thng 10/2008), 3 t
nh trn Flickr (tnh n thng 11/2008), 6.2 t nh trn Photobucket (tnh n thng
10/2008) [19]
Bn cnh nhu cu tm kim thng tin th tm kim nh cng l mt nhu cu ang
nhn c s quan tm ln ca ngi s dng. Tuy nhin, vi mt lng nh trn

1 Flickr: http://www.flickr.com
2 Photobucket: http://www.photobucket.com
3 Facebook: http://www.facebook.com
19

Internet qu ln cng vic tm kim s tr nn v cng kh khn. gii quyt vn
ny, c cc h thng tm kim nh ra i nh: Yahoo, MSN, Google Image Search,
Bing. Cng nh i vi cc h thng tm kim thng thng v cc h thng tm
kim thc th khc, m un xp hng l mt phn quan trng ct li trong my tm
kim nh. Hin nay, bi ton xp hng nh tr thnh mt trong nhng bi ton in
hnh ca lnh vc khai ph d liu ni chung v lnh vc xp hng thc th ni ring.
tm kim v xp hng nh trn Web, cc my tm kim thng da vo cc thuc
tnh sn c ca nh. Cc nh trn Web c nhn bit qua cc thuc tnh c nhm
thnh hai loi: vn bn v ni dung hin th. Cc thuc tnh vn bn c th l: tn nh, th
nh (tags
1
)
,
vng vn bn xung quanh nh, tn trang Web cha nh, . Ni dung hin th
ca nh c th l: mu sc, hnh dng, kt cu, cc thuc tnh cc b (local features),
hay bt c thng tin no bt ngun t chnh ni dung ca bc nh. Da vo hai loi c
trng ny ca cc nh trn Web, cc thut ton xp hng nh cng phn thnh hai hng
l: xp hng nh da theo ni dung hin th v xp hng nh da theo vn bn. Cc my
tm kim nh thng dng hin nay nh: Google Image Search, Yahoo! Image Search,
MSN, AltaVista, xp hng cc nh tr v da trn vng vn bn i km vi nh. Cc
h thng ny cho php ngi s dng nhp cc chui truy vn v ch nh m h
cn tm kim, thng qua vic phn tch cc vng vn bn i km vi cc bc nh, h
thng gi tr li cc nh c nhn tng ng vi ch nh m ngi s dng yu cu.
Phng php ny cho kt qu kh quan cng nh p ng nhanh nhu cu ca ngi s
dng. Tuy nhin, i vi cc cu truy vn mang ngha nhp nhng c th s c cc
kt qu tr v khng ng vi yu cu t ra bi v vng vn bn i km nh khng th
din t c ht ni dung nh. Mt hng nghin cu khc l phn tch cc c trng
hin th ca nh v tin hnh xp hng theo cc c trng ny. Mt s cng c tm kim
nh da trn ni dung in hnh nh: Google Image Swirl, Tiltomo, Byo Image Search
. Cc cng c ny nhn u vo l mt chui truy vn di dng vn bn hoc mt bc
nh v cho php ngi dng ty chnh la chn tm nh theo mt s c trng no .
Tuy nhin, cc my tm kim ny thng ch tp trung khai thc vo mt phn ni
dung ca nh v thng tn kh nhiu thi gian do phi phn tch ni dung cc bc
nh.

1 Tags: l l cc t nh du mt vng trong nh m khi di chut qua vng th cc t s hin th ln
ch thch cho bc nh.

20

Mt trong cc hng nghin cu nhm gii quyt v khc phc vn trn l kt
hp c vic phn tch cc c trng ca nh vi cc c trng ca chui truy vn vo
qu trnh tm kim nh. y l mt hng nghin cu mi c s quan tm ca
nhiu hi ngh quc t nh: International Journal of Computer Vision, IEEE
conference
1.5. Mt s cng trnh nghin cu lin quan
Cc nghin cu v tm kim Web bt u t nhng nm 1990. Cng vi s ci
tin khng ngng ca cc cng c tm kim Web, cc thut ton tnh hng trang cng
nhn c s quan tm su sc ti cc hi ngh quc t. S ra i ca thut ton
PageRank [30] nh du mt bc pht trin nhy vt ca cc my tm kim Web
m in hnh ca n l Google, mt trong s cc my tm kim hng u hin nay.
Ko theo l s ra i ca mt lot cc thut ton tnh hng trang khc [9] [23] [32]
[35] nhm ci tin thut ton PageRank.
Phn ln cc nghin cu tm kim Web l tp trung vo tm kim cc trang Web
(ti liu dng vn bn) v ch mt s t trong l v tm kim cc thng tin a
phng tin trn Web (nh, video, MP3). Tuy nhin, trong nhng nm gn y, vn
tm kim v xp hng cc i tng a phng tin trn Web (c bit l vn
tm kim v xp hng nh) ang tr thnh mt vn thu ht c rt nhiu s quan
tm ca cc nh khoa hc trn th gii. Bng chng l ngy cng c nhiu cc cng
trnh nghin cu v cc thut ton tnh hng nh c cng b [17] [29] [30] [34] [36]
[38] [39][40]. Bn cnh l s ra i ca cc my tm kim nh v cc my tm kim
thng thng cng c xu hng tch hp thm dch v tm kim nh.
Mt hng pht trin mi cho cc my tm kim Web ang rt c ch l
cc my tm kim lp trn (Meta-search engine). c mt s cng trnh nghin cu
v my tm kim lp trn [11] [14] [18] [28] c cng b cng nh c mt s my
tm kim lp trn (Dogpile, Clussty, KartOO, Google CSE) c mang vo s dng
trong thc tin. Tuy nhin, nhng cng c tm kim ny vn cha mang li c thnh
tu ni bt v cha cnh tranh c vi Google.
Vit Nam, nghin cu v ng dng tm kim v xp hng Web cng ang
nhn c nhiu s quan tm. Hin ti, cng c mt s cng ty lm v my tm kim
nh Bamboo, Zing, Xalo, Socbay. Th trng B TT-TT Nguyn Minh Hng
1
cho
rng, cc my tm kim trc tuyn ra i l s ng gp ln cho nn cng nghip

1 http://www.tin247.com/xalovn_gia_nhap_thi_truong_cong_cu_tim_kiem_viet-4-21280879.html
21

CNTT Vit Nam. Tuy nhin, nhng sn phm ny vn cha th vt qua cc cng c
tm kim ca cc i gia nc ngoi trn th trng ni a. Theo ng L Ngc
Quang
1
, Gim c Pht trin Kinh doanh v Cng ngh ca IDG Ventures Vietnam,
cng c tm kim ca Vit Nam hin nay gn nh b khng, khng to doanh thu, rt
t ngi dng v nh vy l mt s lng ph. Ngoi cc my tm kim cn c mt s
cng trnh nghin cu v tm kim v xp hng c cng b. Mt s cng trnh
nghin cu bc u nh ci tin thut ton tnh hng trang ca Nguyn Hoi Nam
[2], m hnh hc xp hng ca Nguyn Thu Trang [4], xy dng cng c tm kim
MP3 cho ting Vit ca Nguyn Hong Trung [5].
Cng trnh nghin cu ca Nguyn Hoi Nam [2] da trn c s mt s phng
php tm kim v xp hng trang c bn, t a ra nhng xut ci tin cho thut
ton PageRank theo ch . Phng php m [2] a ra l gn cc gi tr quan trng
khc nhau i vi cc lin kt lm chnh xc hn cc kt qu tm kim. C th nh
nhng lin kt t cc trang trong cng ch i vi trang c lin kt c th mang
ti cho trang gi tr nhiu hn nhng trang khng nm trong cng ch . Phng
php ny c p dng th nghim cho my tm kim Vietseek v bc u
mang li hiu qu.
Mt nghin cu khc cng v vn xp hng l nghin cu v hc xp hng
trong tnh hng i tng v to nhn cm ti liu ca Nguyn Thu Trang [4]. Cng
trnh ca [4] thc hin kho st, phn tch cc phng php hc xp hng ang c
quan tm hin nay v t a ra m hnh xp hng thc th p dng vo my tm
kim thc th trong ting Vit, c th l tm kim thc th thuc v hc xp hng
to nhn cho cm ti liu. Cc kt qu thu c chng minh vai tr v hiu qu ca
hc xp hng p dng vo my tm kim.
Nguyn Hong Trung [5] tin hnh xy dng th nghim mt thnh phn tm
kim MP3 cho ting Vit cho my tm kim Socbay. H thng ny tm kim cc file
MP3 da vo cc trng m t file. Phn mm tm kim ny cho kt qu tng i
chnh xc i vi c nhng tm kim ting Vit khng du v c du trong thi gian
cho php.
Qua qu trnh tm hiu v tnh hnh nghin cu trong v ngoi nc, nhn thy
yu cu ca thc t t ra l rt cn thit v cp bch, trong kha lun ny, ti tp
trung nghin cu v cc thut ton tnh hng nh v sau p dng vo vic xy dng

1 http://vietnamnet.vn/cntt/2005/11/517349/
22

mt m hnh my tm kim lp trn th nghim cho nh. Ti tin rng nhng nghin
cu ca mnh l rt thit thc v s l nn tng cho nhng nghin cu tip theo ca
mnh.
Tm tt chng mt
Trong chng mt, kha lun tp trung kho st, phn tch mt s thut ton
tnh hng trang in hnh ang c s dng rng ri hin nay. ng thi kha lun
cng trnh by s b v vn xp hng i tng ni chung v xp hng nh ni
ring. Trong chng tip theo, kha lun s gii thiu chi tit hn v cc thut ton
tnh hng nh theo ni dung hin th.


23

Chng 2. Mt s thut ton tnh hng nh ph bin
2.1. Gii thiu
Nh trnh by chng trc, xp hng nh l mt bi ton in hnh trong lnh
vc xp hng thc th v ang nhn c nhiu s quan tm nghin cu ca cc nh khoa
hc. Cc nghin cu v xp hng nh hin nay ch yu tp trung vo phn tch cc c
trng v ni dung hin th ca nh.
Phng php ph bin l dng l thuyt th xy dng mi quan h gia cc
bc nh. Phng php tip cn ny xy dng mt th kt ni cc nh ging nhau, sau
s dng vector c trng tm cc nh l trung tm ca th. Mt hng tip cn
n gin v hiu qu ng dng trong vic x l cc thng tin v ni dung hin th ca nh
c xut bi Yushi Jing v Shumeet Baluja [39][40]. Phng php ca Jing v
Baluja s dng o tng ng gia cc bc nh xy dng mt th tng ng v
da trn thut ton PageRank tnh hng cho cc bc nh. Theo hng tip cn ny
[34], [29] cng c mt s xut ci tin thut ton m Jing v Baluja a ra.
Mt k thut khc l xy dng cc cm cc bc nh v sau s dng tng
ng trong cng mt cm hoc trung tm cm tm nh ni bt nht [27] [36]. Nghin
cu ca T.L.Berg v A.C.Berg m rng tng phn cm bng cch tm cc nh m c
mt i tng ni bt r rng, v v th c nhiu kh nng i din nht. Ngoi ra cn mt
s hng tip cn theo hng ngi dng [38] hoc hc bn gim st [17]. Cc phng
php ny thng kt hp c cc c trng v vn bn ca nh.
Chng ny s tp trung gii thiu chi tit hn mt s thut ton ph bin xp hng
nh da trn ni dung hin th.
2.2. VisualRank
VisualRank l thut ton tnh hng nh da vo vic phn tch tng ng v
ni dung gia cc bc nh do Yushi Jing v Shumeet Baluja [39][40] xut. Phng
php m Jing v Baluja a ra ly t tng c bn t thut ton phn tch lin kt
PageRank. Cng ging nh PageRank, thut ton VisualRank s dng l thuyt th
xy dng th nh v dng vector c trng trung tm tnh hng cho cc nh.
Vi nhn nh trc quan rng, nu mt ngi dng xem mt bc nh, th ngi
cng c th quan tm n cc nh khc gn ging vi nh va xem. Ngha l nu gia
cc nh c cc lin kt biu th s ging nhau gia cc nh th s c mt xc sut
no ngi dng khi xem nh ny s chuyn sang xem mt nh gn ging vi n.
24

Xy dng th t tp d liu nh vi cc nh ca th biu din cc nh
tng ng trong tp d liu. Cc nh c ni vi nhau bi cc cnh c trng s l
tng ng gia hai nh m c biu din bi hai nh ca cnh . Cc cnh ny
c gi l cc lin kt trc quan (visual hyperlinks) gia cc bc nh. VisualRank s
dng qu trnh duyt ngu nhin xp hng cc nh da vo cc lin kt ny. Nu
mt nh u c lin kt ti nh v, th s c mt xc sut ngi dng chuyn t u sang
v. Mt cch trc quan ta c th thy cc nh ph hp vi truy vn s c nhiu nh khc
tr ti, v do chng s c thm thng xuyn. Cc nh c thm thng xuyn
thng c cho l quan trng. Hn na, nu mt nh v l quan trng v n c lin kt
ti nh w, th n s gp quan trng ca n cho quan trng ca w v bn thn v l
quan trng. VisualRank c nh ngha nh sau:
IR = S
-
IR (2.1)
Trong , S
-
l ma trn ct gim theo ct ca ma trn S, vi S
u,
l tng
ng gia hai nh u v v. Vic lp i lp li php nhn VR vi S
-
s thu c vector
c trng ca ma trn S
-
. Mc d VR c kt qu c nh, nhng theo thc nghim, n
c th c c lng mt cch hiu qu hn qua phng php tip cn lp.


Hnh 4. Mt minh ha v th tng ng ca nh
25

VisualRank hi t ch khi ma trn S
-
l khng tun hon v ti gin. Cng ging
nh PageRank, Jing a vo VisualRank mt tha s hm d m bo th nh l
th lin kt mnh. Jing cng ch ra rng, ma trn tng ng S cng c th l ma
trn i xng. Trong trng hp , s xut hin ca tha s hm c th lm mt tnh
i xng ca ma trn ny.
Vi tp n nh, VisualRank c nh ngha li theo cng thc sau:
IR = JS
-
IR + (1 -J)p :i p = |
1
n
|
n1
(2.2)
Tiong thc nghim, d thng c chn gi tr d > 0.8.
Mt o tin cy ca tng ng l yu t quyt nh ti tnh hiu qu ca
VisualRank bi v n nh hng rt ln ti cu trc ca th. Qua phn tch cc c
tnh ca nh, Jing v Baluja cho rng cc c tnh cc b ca nh giu thng tin hn v
vn gi c tnh n nh khi qua cc php bin i khc nhau. V th, trong nghin
cu ca mnh, Jing v Baluja [40] chn c trng SIFT [24] [25] v biu hng
lm c trng cho cc c tnh ca nh. Ma trn tng ng c xy dng t
tng ng ca cc cp nh trong ton b d liu nh. tng ng ca mi cp nh
c th l s cc thuc tnh cc b ph hp ca cp nh .
Vi khi lng nh khng l trn Web hin nay, lng kt qu tr v ca my
tm kim nh i vi mt truy vn l rt ln. Nhn thy rng vic tnh ton to ra
th tng ng S cho hng t bc nh l khng th, trong thc t thi hnh VisualRank,
Jing v Baluja xut phng php tin phn cm cc nh Web da trn vic s dng
cc thuc tnh vn bn ca nh gim bt tp nh u vo. Vic ny c th thc hin
thng qua cc my tm kim thng mi bng cch trch rt tp N nh tr v u tin
khi truy vn vo cc my tm kim thng mi thng thng, sau tin hnh xy
dng th tng ng v tnh VisualRank ch trn tp con N nh ny.
Thut ton VisualRank trnh by mt k thut n gin kt hp cc li im
trong vic s dng lin kt v phn tch mng cho tm kim trang Web vo tm kim
nh. Thut ton c cc tc gi th nghim v cho kt qu tt hn kt qu xp
hng ca my tm kim nh Google trong phn ln cc truy vn trong khi vn duy tr
c hiu qu tnh ton hp l cho vic trin khai quy m ln.
26

2.3. Multiclass VisualRank
Multiclass VisualRank l thut ton xp hng nh m rng tng t phng
php VisualRank ca Jing v Baluja [39] [40] xp hng nh cho nhiu phn loi
nh, do Misur Ambai v Yuichi Yoshida [29] xut. Multiclass VisualRank chia cc
nh c tr v t my tm kim thnh nhng phn loi khc nhau da vo cc c
trng ni dung ca nh v tin hnh xp hng trong tng phn loi . Multiclass
VisualRank gm ba bc sau:
o Tnh tng ng v ni dung nh: Cng ging nh phng php VisualRank,
Ambai s dng gii thut SIFT tnh tng ng w
,]
gia hai nh I

, I
]
. Thut
ton VisualRank nguyn thy s dng t s C
]
l s cc key points chung gia hai nh
I

v I
]
chia cho s key points trung bnh ly c t I

, I
]
lm o tng ng gia
hai nh . Tuy nhin, cc my tm kim nh thng tr v cng mt tp nh i vi
cng mt truy vn. Trong trng hp ny, gi tr C
]
tr nn qu ln so vi cc o
tng ng khc, v c th lm cho kt qu phn cm khng cn chnh xc. Do ,
phng php Multiclass VisualRank p dng mt hm xch ma vo C
]
lm gim
cc gi tr ln.
o Phn cm: Bc ny tin hnh phn tp cc nh thnh cc phn loi khc nhau
da vo vic phn cm cc o tng ng.
Nhn thy rng, cc nh cng gn ging nhau th o tng ng cng ln v
th tng ng cha mt s cm ng vi cc phn loi nh khc nhau. Do , [29]
s dng k thut Nomarlized cut phn cm cc bc nh trong tp d liu bng cch
phn cm cc o tng ng trong ma trn tng ng. Cng thc phn cm c
tnh nh sau:
( -w): = : (2.S)
Vi W l mt ma trn k c cc phn t l cc o tng ng w
]
, D l mt
ma trn cho, l gi tr ring v : l vector ring.


27


Hnh 5. Bin i ma trn k
o Tnh hng: Tng t nh phng php ca Jing, Wang cng s dng PageRank
tnh hng cho cc nh:
r - (1 -o)wr + op (2.4)
vi r = (r
1
, , r
N
)
1
l vector tnh hng ca cc nh, o l tha s hm.
Bi v cc nh c chia thnh cc phn loi, im s tnh hng ca mt nh
thuc phn loi ny khng b nh hng bi im s tnh hng ca cc nh trong phn
loi khc. Do , ta c th b i o tng ng gia cc phn loi khc nhau, tc l
b i lin kt gia cc nh thuc v cc phn loi khc nhau. Khi , ma trn k W
c sa i nh sau:
w
]
i
=
w
]
nu I

: I
]
tbuc : cng mt pbn loi
u ngc li
(2.S)
Bng cch bin i ma trn k W thnh ma trn w
i
, cng vic tnh ton c
gim i ng k. Vic loi b o tng ng gia cc nh thuc v cc phn loi
khc nhau lm cho mi nh trong mt phn loi cng ging vi i din ca phn loi
th c th hng cng cao.
Trong thc nghim, Multiclass VisualRank cho kt qu xp hng tt vi chnh
xc xp x bng chnh xc ca VisualRank. chnh xc ca 10 nh c xp hng
u tin bng thut ton Multiclass VisualRank l 0.949 trong khi chnh xc ca
VisualRank l 0.953 [29].
B i trng s gia
cc phn loi nh
khc nhau
28


Hnh 6. Kt qu xp hng ca 3 phng php vi truy vn Notre Dame
2.4. Visual contextRank
Phn trn trnh by mt phng php xp hng nh kh hiu qu c xut
bi Jing v cc ng nghip, phng php ny tin hnh xy dng ma trn tng ng
ca 1000 nh v s dng VisualRank tnh hng cho mi bc nh. Tuy nhin,
phng php ca Jing v cng s coi cc c tnh cc b c quan trng l nh nhau
v khng quan tm n cc thng tin v ng cnh. Do c th s gy ra nhiu sai st
trong vic tnh tng ng gia cc bc nh. Mt phng php xp hng nh c
trnh by di y s dng thng tin ng cnh khc phc cc vn trn.
ContextRank l phng php do Shuhui Wang v cng s [34] xut, s dng
qu trnh duyt ngu nhin Markov gia cc visual words
1
trong cng mt nh v gia
cc nh vi nhau nhm xp hng li (re-ranking) cc nh l kt qu tr v t mt my
tm kim nh thng mi. Wang v cng s a ra hai gi nh:
i. C th di chuyn gia cc visual word ln cn nhau trong cng mt nh.
ii. C th di chuyn t mt visual word A trong nh i ti visual word B trong nh
j nu tn ti mt lin kt gia hai visual word ny.

1 visual word: l mt ch mc m t mu ca mt phn cm cc m t thuc tnh cc b ca nh [33]
29

Phng php ContextRank coi mi visual word l mt trng thi trong khng
gian trng thi Markov. Qu trnh duyt qua cc visual word l s chuyn trng thi t
trng thi hin ti r n trng thi kt thc s.

Hnh 7. M hnh xp hng nh s dng thut ton ContextRank
M hnh xp hng ContextRank m Wang v cng s xut gm cc thnh phn
nh sau:
o Chun b d liu: Vi u vo l mt chui truy vn dng t kha, ContextRank
a chui truy vn ny vo cc my tm kim nh theo cc thuc tnh vn bn ca
nh. Sau tin hnh trch rt N kt qu tr v u tin t my tm kim thc
hin xp hng li cho N nh ny.
o M hnh ng cnh ni nh (Intra-image Context Modeling): Thnh phn ny thc
hin vic xy dng lin kt gia cc visual word trong cng mt nh vi tp N
nh ly c t d liu.
o M hnh ng cnh ngoi nh (Inter-image Context Modeling): Thnh phn ny
thc hin vic xy dng lin kt gia cc visual word trong cc nh khc nhau
trong tp N nh.
o Kt hp ng cnh ni nh (intra-image context) vi ng cnh ngoi nh (inter-
image context) v dng l thuyt th duyt ngu nhin m hnh ha mi
30

quan h ni nh (intra-image) v ngoi nh (inter-image) ca cc visual word. T
tin hnh tnh hng li cho nh da vo im s (score) ca cc visual word
trong mt nh.
Hng ca nh c tnh theo phng php ContextRank nh sau:
Cho trc mt tp t vng trc quan (visual vocabulary) vi K visual keywords.
Mi thuc tnh cc b ca nh c gn mt ch mc visual word. Vi mt tp cha N
nh kt qu tr v u tin thu c t cc my tm kim nh da trn vn bn, biu
din cc visual word trong tp N nh nh mt th, mi visual word xut hin trong
mi nh c xem nh l mt nh ca th. Gi I
,m
vi i = 1, N: m = 1, K l
biu din ca mt nh. Vy ta c tt c K N nh. Mc ch ca thut ton l tm
im s n(I
m,
) ca visual word th m trong nh i.
Biu din th cc visual word bi ma trn chuyn P,
n = {n(I
,m
)], n(I
,m
)
,m
, i = 1, N: m = 1, K, n l vector ring ca ma trn P
vi gi tr ring bng 1, mi phn t ca n l im s ca mt visual word. n c tnh
theo cng thc:
n = Pn (2.6)
Do tnh cht ca chui Markov, tnh vector ring ca P, th visual word
phi l lin thng, tc l vi cp hai visual word i, j bt k lun c ng i t i ti j v
ngc li. Tuy nhin, vn tn ti khng t cc visual word khng c lin kt n cc
visual word khc. V vy cn phi bin i ma trn P trnh vic khi duyt cc nh
khng c lin kt s khng duyt c tip. Hn na, m bo tnh phn phi dng
n nh (duy nht) ca n, tc l t mt nh trong qu trnh duyt c th chuyn ti
mt nh bt k khc, th cn phi thm mt nhn t hm. Cng thc (2.6) c vit
li nh sau:
n = oPn +(1 - o) (2.7)
Trong :
o l h s hm. Trong thc t o thng c chn gi tr 0.8.
D l vector KN 1 vi tng cc phn t bng 1. D c th c tnh theo
cng thc:
31

|(i - 1) K +m] =
IS(i)
num
nz(m)
nu :w(m) c img(i)
u ngc li
(2.8)
IS(i) l im s khi to ca nh i, num_nz(m) l s visual word vi
tn s khng bng 0.
Nhn thy rng P l ma trn rt tha, php nhn vi cc phn t bng 0 trong P l
rt n gin. V th ta c th chia P thnh cc ma trn con. Mi ma trn con l mt ma
trn dch chuyn ca cc visual words trong nh i v cc visual words trong nh j. Gi
B
,]
l biu din ca ma trn con ni trn. P c t chc li thnh hai ma trn con:
P = b0 + (1 -b)C (2.9)
G l mt ma trn bao gm cc ma trn con B
,
, i = 1, N v tt c cc ma trn
con khc bng 0.
C l mt ma trn bao gm cc ma trn con B
,
, i = 1, N bng 0 v tt c cc ma
trn con khc khc 0.
G nh gi xc sut dch chuyn ca cc visual word trong cng mt nh cn C
nh gi cho cc visual word trong cc nh khc nhau. Tham s b l nhn t trng s.
Sau khi tnh c vector ring n ca cc visual words trong mi nh, hng ca
nh i c tnh theo cng thc:
R(i) = ` n
K
m=1
(I
,m
), i = 1, N (2.1u)
Vi n(I
,m
) l im s ca visual word th m trong nh i. i vi cc visual
keyword m khng tn ti trong nh i th n(I
,m
) = u.
32


Hnh 8. Mt v d v biu din visual words
Trong thc nghim, Wang trch xut khong 200 c trng SIFT cho mi bc
nh v s dng mt tp khong 400 t vng tnh ton. Thut ton c p dng th
nghim cho hai tp d liu khc nhau v cho kt qu tt hn ng k i vi
phng php VisualRank trong vic xp hng li nh.
2.5. Nhn xt
Qua tm hiu cc phng php xp hng nh trn, ti nhn thy cc phng php
ny u p dng k thut phn tch lin kt phn tch mi quan h gia cc nh da
vo c trng hin th. Cc phng php ny l kh n gin v cho kt qu xp hng
kh quan. Tuy nhin, cc phng php ni trn u ch da vo ni dng hin th ca
nh m khng quan tm n cc d liu dng vn bn i km vi nh. V cc d liu
vn bn ny l do ngi dng to ra nn chng u c mt ngha nht nh i vi
ni dung hin th ca nh. S thnh cng ca cc my tm kim nh da trn vn bn
khng nh iu . Hn na, nhng thnh tu trong lnh vc phn tch v x l
vn bn mang li nhiu thun li trong vic tnh tng ng gia cc vn bn.
Da vo nhng u im, nhc im trn, i vi kha lun ny, ti quyt nh
s dng phng php xp hng nh p dng thut ton VisualRank cho c c trng
hin th v c trng vn bn ca nh m c xut trong bo co cng trnh
nghin cu khoa hc sinh vin ca chng ti.
Tm tt chng hai
Trong chng hai, kha lun gii thiu chi tit mt s phng php xp hng
nh in hnh da trn ni dung hin th. ng thi, chng ny cng a ra c
mt s nhn xt v u im, nhc im ca cc phng php ny v a ra phng
php xp hng nh kt hp gia ni dung hin th v d liu vn bn i km vi nh s
33

c s dng trong phn thc nghim ca kha lun. Trong chng tip theo, kha
lun s gii thiu m hnh chung ca my tm kim lp trn v mt m hnh my tm
kim nh lp trn, ng thi trnh by mt s vn trong vic xp hng nh trong
my tm kim nh lp trn.

34

Chng 3. M hnh my tm kim nh lp trn
Meta-search engine (tm dch l my tm kim lp trn) [18] [28] l mt my tm
kim m tm kim thng tin da trn cc my tm kim khc. Vi mi truy vn ca
ngi dng, my tm kim lp trn s chuyn n n cc my tm kim khc (gi l
cc my tm kim ngun) v sau x l kt qu tr v t cc my tm kim ny trc
khi a ra kt qu cho ngi dng trn mt giao din duy nht. My tm kim lp trn
ch yu c s dng m rng vng tm kim nh vic s dng kt qu tm kim
t nhiu ngun d liu ca cc my tm kim khc nhau, gip tng c hi cho ngi
dng tm c thng tin mong mun.
Chng ny s trnh by v kin trc chung ca my tm kim lp trn, ng thi
gii thiu mt m hnh my tm kim nh lp trn v sau s gii thiu s b v vn
xp hng nh trong my tm kim nh lp trn.
3.1. Kin trc chung ca my tm kim lp trn

Hnh 9. Kin trc ca mt my tm kim lp trn in hnh [18]
Kin trc ca mt my tm kim Web lp trn cng gn ging vi kin trc ca
mt my tm kim Web thng thng [18]. S khc bit c bn l my tm kim
lp trn khng c thnh phn c s d liu lu tr cc trang Web nh my tm
kim Web thng thng. Thay vo l mt c s d liu o bao gm: b iu vn
(dispatcher), cc my tm kim thng thng khc, v mt b x l kt qu (result
processor). Mt my tm kim lp trn bao gm bn thnh phn chnh: giao din
ngi dng (user interface), b iu vn (dispatcher), b x l kt qu (result
processor), v m un tnh hng (scoring module).
35

3.1.1. Giao din ngi dng
Giao din ngi dng l b phn nhn truy vn u vo ca ngi dng v hin
th kt qu u ra. Giao din thng l mt trang Web c hp thoi nhn cc m t v
thng tin m ngi dng cn tm kim. Mt my tm kim lp trn thng c mt s
ty chn nh l chn danh sch cc my tm kim m my tm kim lp trn s ly d
liu t t mt danh sch cc my tm kim thng thng cho trc, thit lp su
tm kim, thi gian tm kimMt trong nhng hn ch ca my tm kim lp trn l
thi gian kim thng chm v phi ch kt qu tr v t cc my tm kim khc. Nu
mt my tm kim lp trn gi truy vn n cng nhiu my tm kim th tc ca n
cng chm.
3.1.2. B iu vn
B iu vn ca my tm kim lp trn gn ging vi b x l truy vn ca my
tm kim thng thng. B x l truy vn to ra cc truy vn n c s d liu da
trn cc truy vn u vo cn b iu vn th to ra cc truy vn n my tm kim
thng thng t truy vn ca ngi dng. Mt b iu vn phi xc nh c cc
my tm kim m n s truy vn v lm th no truy vn trn chng.

Hnh 10. Mt thit k ca b iu vn [18]
Mt b iu vn c th bao gm bn thnh phn:
Source Selector: Thnh phn ny s la chn cc my tm kim thng thng
truy vn trn n. Nu b iu vn gi yu cu n qu nhiu my tm kim th c th
s lm qu ti ti nguyn mng v do s mt nhiu thi gian hon tt cng vic
tm kim. Vic quyt nh gi yu cu n my tm kim no l rt quan trng, bi v
mi my tm kim khc nhau s cho tp d liu khc nhau v s nh hng n kt
qu tm kim ca my tm kim lp trn. Nu my tm kim X cho kt qu tr v qu
tt, hn hn my tm kim Y v Z, th my tm kim lp trn kt hp c ba my ny
36

cha chc c kt qu tm kim tt hn kt qu ca X. Tuy nhin, nu kt hp cc
kt qu ca cc my tm kim khc khng qu tt li c th gip cho kt qu tt hn.
Query Generator: thc hin vic sa i cc truy vn sao cho ph hp vi mi
my tm kim ngun. Mi my tm kim thng ch lm vic hiu qu trn mt s
dng truy vn nht nh. Do , mt truy vn khng thch hp s mang li kt qu tm
kim khng tt.
Request Generator: Thnh phn to yu cu kt hp truy vn ca ngi dng vi
my tm kim ngun c la chn v la chn truy vn sa i to mt yu cu
hp l.
Request Submitter: Thnh phn ny nhn cc yu cu t request generator v
thc thi chng. Request submitter phi tng tc vi cc giao thc cp thp v m
bo rng cc li xy ra c ghi li mt cch thch hp.
3.1.3. B x l kt qu
B x l kt qu ca mt my tm kim lp trn nhn kt qu tm kim ca cc
my tm kim thng thng v x l chng chuyn sang cho m un tnh hng. Cc
kt qu gi ti m un tnh hng t b x l kt qu cng ging vi cc kt qu nhn
c t c s d liu trong my tm kim thng thng. B x l kt qu nhn cc hi
p t my tm kim v trch xut chng t cc kt qu n l.
Trang phn hi t mt my tm kim ch cha thng tin ti thiu v mi kt qu.
V thng tin c cung cp trong u ra ca cc my tm kim khc nhau cng rt a
dng, theo ngha mi my tm kim c mt dng u ra khc nhau. V d mt my tm
kim c th cung cp tn, a ch URL, bn tm tt, trong khi mt my tm kim khc
c th cung cp tn, ngy thng, a ch URL, ng cnh ca truy vn. Cng mt
trang Web c tr v t hai my tm kim khc nhau c th trng s khc nhau. Mt
b x l kt qu tin tin c th thc hin hnh ng thu thp thng tin b sung
thm d liu vo mi kt qu nhm lm giu cho d liu.
Hn na, trong cc kt qu tr v t cc my tm kim ngun khc nhau c th
c nhng kt qu ging nhau. V vy, b x l kt qu cn phi nhn bit cc kt qu
trng lp ny v loi b bt nhng kt qu tha, ch gi li mt kt qu duy nht.
3.1.4. M un tnh hng
Cng ging nh m un tnh hng trong cc my tm kim thng thng, m un
tnh hng ca mt my tm kim lp trn thc hin vic tnh hng cho mi kt qu
37

trong ngun d liu nhn c t b x l kt qu. My tm kim lp trn cn phi c
cc thut ton hiu qu c th hiu c u l kt qu ph hp nht vi ngi
dng trong tp hp kt qu tm kim t nhiu ngun khc nhau, t tr v kt qu
theo th t xp hng mi. Khng ging nh cc my tm kim thng thng, my tm
kim lp trn b gii hn thng tin v cc kt qu nhn c. S thiu thn thng tin
ny lm cho vic tnh hng tr nn kh khn hn.
3.2. M hnh my tm kim nh lp trn MetaSEEk
Trong nhng nm gn y, cng vi s pht trin khng ngng ca cc dch v
trn Internet, cc my tm kim mi ra i ngy cng nhiu vi chc nng ngy cng
phong ph v a dng. c bit trong c s ra i ca cc my tm kim lp trn.
Mt s my tm kim lp trn in hnh phi k n nh l: Google CSE, Helios,
Dogpile, VisualSEEk, WebSEEk, MetaSEEk, . Trong MetaSEEk [11] l mt
my tm kim lp trn chuyn tm kim cc nh trn Web da vo ni dung ca nh.
Phn ny s tp trung trnh by m hnh ca my tm kim nh lp trn MetaSEEk.

Hnh 11. Kin trc tng th ca MetaSEEk [11]
38

Cng ging nh cc my tm kim lp trn khc, m hnh ca MetaSEEk gm c
ba thnh phn chnh: b dch truy vn (query translator), b iu vn (query
dispatcher) v giao din hin th (display interface).
Khi nhn c mt cu truy vn t pha ngi dng, b iu vn s chn cc
my tm kim ngun thch hp da trn c s d liu v kh nng thc thi ca tng
truy vn thc hin trc . C s d liu ny cha cc im s ch ra kh nng
thc thi tt hay km ca tng truy vn trong qu kh i vi mi ty chn tm kim.
Cc im s ny c gi l cc im s hiu nng (performance score) ca cc ty
chn tm kim i vi mi truy vn, v c s d liu cha cc im s nh vy c
gi l c s d liu hiu nng (performance database). Sau khi chn c cc my
tm kim ngun, b dch truy vn chuyn cc truy vn sang dng thch hp v gi ti
cc my tm kim ngun c chn. Cui cng, thnh phn hin th kt hp v xp
hng li cc kt qu nhn c t cc my tm kim v hin th cho ngi dng.
MetaSEEk nh gi cht lng ca cc kt qu tr v bi mi ty chn tm kim da
trn cc phn hi t pha ngi s dng.
3.2.1. Truy vn trc quan da trn ni dung
Nh ni cc chng trc, u vo ca mt my tm kim nh c th l mt
t kha hay mt bc nh. Ngi s dng c th a vo my tm kim nh mt truy
vn c dng l mt nh c ly t mt tp nh mu ngu nhin hoc t mt tp cc
nh tr v bi mt truy vn dng t kha, hoc truy vn c th l mt URL ca mt
nh trn Web. Cc truy vn c dng nh vy c gi l cc truy vn trc quan da
trn ni dung (Content-based visual query). Vi truy vn l mt nh, my tm kim
tnh ton cc c trng ca nh u vo ny v s dng cc c trng ny tm cc
nh gn ging vi nh truy vn nht trong c s d liu. Tm kim nh da trn t
kha c th c s dng phn nhm cc nh theo ch v do lm gim phm
vi tm kim. c th x l hiu qu cc truy vn trc quan trong c s d liu nh
ln, cc my tm kim cng cn phi s dng cc k thut phn cm v nh ch mc
trn cc vector c trng ca nh.
3.2.2. Giao din truy vn
MetaSEEk tm kim nh da trn bn my tm kim ngun l: VisualSEEk,
WebSEEk, QBIC, v Virage. Mi my tm kim ngun u c cc tnh nng cng nh
cc hn ch ring. VisualSEEk, QBIC v Virage cung cp cc phng thc cho vic
tm kim nh s da trn cc c trng trc quan bng vic s dng cc mu. QBIC v
39

VisualSEEk cho php ty chnh cc tm kim bng vic s dng cc phc tho trc
quan (v d nh ch nh bng mu) hoc a vo mt nh mu, trong khi Virage
cho php ngi dng xc nh trng s quan trng ca mi c trng trong tm
kim. Ngoi ra, QBIC cn cung cp dch v tm kim nh da trn t kha. WebSEEk
l mt my tm kim v phn loi nh bn t ng. N h tr tm kim c da trn ni
dung hin th v da trn vn bn, tuy nhin MetaSEEk ch s dng kh nng tm kim
nh da trn vn bn ca WebSEEk.

Hnh 12. Giao din hin th ca MetaSEEk
MetaSEEk cung cp giao din cho php tm kim nh da trn c ni dung hin
th v t kha. Vi truy vn trc quan da trn ni dung hin th, ngi dng c th
chn bt k nh mu no t cc c s d liu c h tr, hoc a vo mt URL ca
mt nh trn Web. Giao din ca MetaSEEk cho php ngi dng ty chn cc c
trng tm kim. Hai c trng c sn cho ngi dng la chn l: mu sc v kt
cu. Ngi dng c th la chn tm kim nh da trn mu sc hoc da trn kt cu
hoc da trn c hai c trng ny. Nh ni trn, vi mi ty chn s ch c mt
s my tm kim ngun c s dng. B iu vn nhn bit c kh nng tm kim
ca mi c s d liu v s quyt nh gi truy vn ti cc my tm kim no.
Ngoi ra, ngi s dng cn c th ch nh thi gian tm kim ti a, s lng
cc ty chn tm kim, v th loi quan tm. Ngi s dng cng c th iu chnh s
lng cc truy vn c gi n cc cng c tm kim ring r ty thuc vo ti mng.
40

Trong thi gian lu lng mng thp, ngi s dng c th lm tng cc truy vn
ng thi. Vic ch i trong mt khong thi gian ti a ngn khng cho h thng b
chm tr t vic hi p chm tr ca cc my tm kim ngun.
3.2.3. B iu vn
Mi khi nhn c mt truy vn t pha ngi dng, b iu vn chn cc my
tm kim ngun v ty chn tm kim ngun gi cu truy vn n. Mt ty chn
tm kim l mt phng php truy vn trn mt cng c tm kim c th. V d, mt
truy vn gi ti VisualSEEk yu cu tm kim nh da trn kt cu l mt ty chn
tm kim. B iu vn to quyt nh da trn loi truy vn c gi n my tm
kim lp trn v c s d liu cha im s ch kh nng thc thi ca cc truy vn
trong qu kh. Nu ngi dng yu cu mt nh mu ngu nhin hay mt t kha truy
vn, h thng n gin ch t ra cc cu hi ti cc my tm kim m h tr nhng
hnh ng ny (QBIC, Virage v VisualSEEk h tr truy vn mu ngu nhin, QBIC
v WebSEEk h tr truy vn dng t kha).
Vi cc truy vn trc quan da trn ni dung, vic la chn my tm kim ngun
l da trn c s d liu hiu nng. C s d liu ny cha im s ch ra th no l
tt hay th no l xu vi mi ty chn tm kim c thc hin trong qu kh trn
cc my tm kim vi mi nh truy vn. Mt truy vn trc quan c xc nh bi mt
nh, mt nhm cc c trng, v mt ch . Khi nhn c mt truy vn, MetaSEEk
tm kim trong c s d liu hiu nng v tm kim im s hiu nng ca nh truy
vn. Trong cu trc ca c s d liu ni trn, mi hng tng ng vi mt truy vn
c thc hin trc . Mi ct im s tng ng vi mt ty chn tm kim. B
iu vn s quyt nh chn cc ty chn tm kim c im s cao nht m ph hp
vi nhng yu cu ca ngi dng. MetaSEEk nh gi cht lng ca cc kt qu tr
v ca mi ty chn tm kim da trn phn hi ca ngi dng. Mt th tc thm d
t ng c thc hin thit lp im s hiu nng ban u nhm xy dng mt c
s d liu hiu nng da trn mt s mu hun luyn.
i vi cc truy vn mi khng c im s hiu nng c ghi trong c s d
liu, cc tc gi a ra mt gii php n gin nht l chn ngu nhin mt ty chn
tm kim thc hin truy vn. Ngoi ra, h thng cn xt ti mt hng tip cn
khc l lin h cc truy vn mi vi cc truy vn trong qu kh m ta c thng tin
v hiu sut thc thi. Cc nh trong c s d liu c phn thnh cc ch da trn
ni dung hin th ca chng. Mi ch gm cc nhm c trng: mu sc, kt cu v
41

c hai loi c trng trn. Khi truy vn ca ngi dng l mt truy vn mi, h thng
ti nh v v kt hp n vi cc cm tng ng nhn c mt danh sch cc cm
ph hp nht. Cc nh c chn t mt s cm gn ging nht s c hin th cho
ngi dng. B iu vn c th chn cc my tm kim ph hp da trn im s hiu
nng trung bnh ca cm c chn bi ngi s dng. Cui cng, nh mi s c
lu vo c s d liu s dng cho nhng truy vn ln sau.
MetaSEEk s dng thut ton phn cm K-means phn cm cc nh trong c
s d liu. C sau mi khi c 10 nh mi c lu vo c s d liu th h thng s
thc hin thut ton phn cm. Mu sc v kt cu l cc c trng c s dng cho
vic phn cm.
Phn loi theo ch
H thng phn nhm cc nh theo ch rng buc phm vi tm kim. C s
d liu ca cc my tm kim ngun thng c mt s loi ring bit. V d, c s d
liu ca QBIC bao gm phn ln l nh v con ngi, trong khi c s d liu ca
Virage c nhiu th loi khc. Nu ngi dng quan tm tm kim nh ca mt em b,
c nhiu kh nng QBIC s mang li kt qu thch hp hn trong thi gian ngn hn.
H thng tn dng li th ca thc t ny bng cch cung cp kh nng tm kim trong
mt phm vi ca mt ch c th. Nu ngi s dng chn mt ch c th, th
ngi gi nh rng ch tm kim cc nh trong ch .
Cu trc c s d liu
C s d liu ca MetaSEEk cha cc vector c trng (mu sc v kt cu),
im s hiu sut, cc phn cm, v cc ch cho tt c cc nh m c truy vn
trn MetaSEEk. Tt c nhng thng tin ny l cn thit cho b iu vn trong vic la
chn cc my tm kim thch hp cho cc truy vn u vo. C s d liu c t
chc theo cu trc phn cp. u tin cc nh c phn loi theo ch da trn ng
ngha ca chng (v d nh loi vt, con ngi). Vic phn loi nh vo ch no l
do ngi dng thc hin. Thut ton K-means c s dng phn cm cc nh
trong mi ch vo cc lp da trn cc c trng v mu sc, kt cu hoc c hai
loi c trng.
42


Hnh 13. Cu trc phn cp ca c s d liu
Ti tng thp nht ca c s d liu, mi nh c mt bn ghi cha thng tin nh
trong bng 1. Trong ct bn tri l cc ty chn tm kim cn ct bn phi l im
s tng ng i vi mi ty chn. Cc im s ny c cp nht mi khi nh
c truy vn da vo phn hi ca ngi s dng.
Bng 1. V d v bn ghi ca mt nh trong c s d liu

3.2.4. Thnh phn hin th
Mi khi cc kt qu c ly v t cc my tm kim ngun, chng c t chc
li v hin th cho ngi dng bi thnh phn hin th. Qu trnh ny ph thuc vo
truy vn ca ngi s dng: truy vn l mt nh mu ngu nhin, hay truy vn l t
43

kha hay l nh. Nu truy vn l mt nh mu ngu nhin hoc l t kha th cc kt
qu tr v t cc my tm kim ngun s c trn ln v hin th cho ngi s dng
mt cch ngu nhin. Trong trng hp ny, th t hin th ca cc kt qu l khng
quan trng. i vi cc truy vn trc quan da trn ni dung, cc nh c xp hng
trc khi hin th cho ngi dng.
Tm kim nh da trn ni dung hin th tr v mt danh sch cc nh c sp
xp theo th t gn ging nht i vi nh truy vn. MetaSEEk thc hin vic xp
hng nhng nh ny bng cch s dng im s hiu nng trong c s d liu. Cc
nh c tr v bi mi ty chn tm kim s c xen vo trc khi hin th chng
cho ngi s dng. im s hiu nng ca nh truy vn s xc nh th t hin th v
s cc nh trong mi nhm xen vo kt qu ca mi ty chn tm kim. V d, chng
ta gi s rng cc nh nhn c t hai ty chn tm kim vi im s hiu nng l 2
v 1. Thnh phn hin th s hin th 2 nh t ty chn tm kim c im s l 2, v 1
nh t ty chn tm kim vi im s l 1 cho n khi tt c cc nh tr v c hin
th ht.
3.2.5. nh gi
H thng tm kim nh lp trn MetaSEEk c cc tc gi tin hnh mt s
th nghim nh gi hiu sut thc hin ca n [11]. Cc th nghim ny c
thc hin vi cc loi cu truy vn khc nhau v vi mi th nghim th c thc
hin hai ln nh gi s ci thin ca vic s dng im s hiu nng trong tm
kim. Sau cc kt qu thc nghim cng c so snh vi phin bn trc ca
MetaSEEk (phin bn khng phn loi nh theo ch ) v vi mt my tm kim lp
trn khng s dng vic nh gi hiu nng. Kt qu so snh cho thy rng MetaSEEk
c kh nng tm kim tt hn hai h thng cn li.
Qua cc kt qu thc nghim, ta thy rng hiu sut ca mt h thng tm kim
lp trn c th c ci thin rt nhiu nu cng c tm kim tch hp v la chn
thng minh bi vic nh gi hiu nng cho cc lp truy vn khc nhau v s dng
phn hi ca ngi dng trong vic xp hng cc kt qu tr v.
3.3. Xp hng nh trong my tm kim nh lp trn
Sau khi nhn c cc kt qu tr v t cc my tm kim ngun, my tm kim lp
trn cn phi tng hp v sp xp cc kt qu ny thnh mt danh sch nh duy nht v tr
v cho ngi s dng. Danh sch ny c sp xp theo th t nhng nh ph hp vi
44

truy vn ca ngi dng hn th c th hng cao hn. Vic sp xp cc nh nh vy cn
c gi l xp hng li.
Tuy nhin, vic xp hng li cc kt qu ny l mt thch thc ln i vi my tm
kim lp trn bi v tnh khng ng nht gia cc my tm kim ngun. Cc kt qu nhn
c t mi my tm kim ngun thng c xp hng da trn nhng c trng khc
nhau ca nh. Mt s my tm kim nh thng thng tm kim v xp hng nh ch da
trn cc c trng v vn bn ca nh trong khi mt s my tm kim khc tm kim da
vo cc c trng v ni dung hin th. V d Google Image Search, Yahoo Image Search
v Bing tm kim nh da trn vn bn trong khi Byo Image Search tm kim nh da trn
mu sc cn Tiltomo th tm kim da trn mu sc v kt cu. V th, tp cc nh nhn
c t cc my tm kim ngun thng rt a dng. Do kh khn y l lm th
no tng hp cc nh ny trong mt danh sch duy nht v cc nh c sp xp mt
cch hp l. Tuy nhin, kh khn ny cng chnh l mt li th bi v cc nh c tr v
t mt my tm kim ngun thng c xu hng nhm thnh mt cm da theo c trng
tm kim ca my tm kim ngun . Hn na chng ta cn c th tn dng c kt qu
xp hng sn c ca cc nh ny my tm kim ngun. Mt thch thc khc i vi vic
xp hng trong my tm kim nh lp trn chnh l vn thi gian. Bi v qu trnh t
lc nhn truy vn ca ngi dng, gi yu cu v nhn kt qu tr v t cc my tm kim
ngun, xp hng cc kt qu nhn c n lc tr v mt danh sch nh c sp xp
cho ngi dng l mt qu trnh c thc hin trc tuyn nn cc my tm kim nh lp
trn cn phi c cc thut ton xp hng hiu qu, m bo yu cu v mt thi gian.
T nhng phn tch v nhng kh khn v thun li trn, c mt s phng php
xp hng c p dng trong cc my tm kim nh lp trn. Mt phng php
c s dng trong my tm kim nh lp trn MetaSEEk [11] l phn cm cc nh theo
ch v theo cc c trng hin th cng vi vic da vo cc ty chn tm kim v phn
hi ca ngi dng tm ra tp nh thch hp nht. Sau th hng ca mt nh trong
tp nh ny c tnh bng cch kt hp gia th hng ca nh my tm kim ngun
vi nh gi v cht lng ca tp nh nhn c t my tm kim ngun m cha nh
.
Bo Luo v cng s trong nghin cu v vic s dng cc c trng ca nh [14]
cng xut hai phng php xp hng da trn c c trng vn bn v c trng
hin th ca nh. Mt phng php l phn cm cc nh da trn cc c trng v mu
sc, hnh dng t tp nh khi to thu c t cc my tm kim ch da trn vn bn.
Phng php th hai s dng phn hi ca ngi s dng xp hng cc nh. Phng
45

php ny chn ra mt s nh mu t cc cm (cc cm ny c th thu c t vic tm
kim nh da trn vn bn) v hin th cho ngi dng la chn. Da vo mi quan tm
ca ngi s dng, h thng tin hnh tm cc nh gn ging nht vi nh c la
chn v sp xp chng theo th t gim dn v tng ng.
Nhn thy li ch t vic kt hp gia ni dung hin th v vn bn ca nh, trong
kha lun ny, ti s dng thut ton xp hng nh VisualRank cho c hai c trng trn
ca nh nh c cp n chng hai. Tuy nhin, quan tm n vn thi gian
thc hin thut ton, ti phn cc cu truy vn thnh hai trng thi: cc truy vn c v truy
vn mi. Truy vn c l truy vn c truy vn my tm kim lp trn. Truy vn mi
l truy vn cha gp bao gi hoc khng gn ging vi cu truy vn no c trc. i vi
mt truy vn mi, ti tin hnh xp hng ch da trn vn bn ri tr v kt qu cho ngi
dng. Sau , ti xp hng li cho cc nh ny da trn c vn bn v ni dung hin th
s dng cho ln tm kim sau. Qu trnh xp hng li ny c thc hin ngoi tuyn. V
tn dng c li th v mt tc ca vic phn tch v x l vn bn nn thi gian p
ng ca h thng lun mc cho php.
Tm tt chng ba
Kha lun trnh by v m hnh chung my tm kim lp trn, ng thi gii
thiu chi tit mt m hnh my tm kim nh lp trn v mt s phng php xp hng
nh trong my tm kim nh lp trn. Trong chng ny, ti cng a ra mt cch gii
quyt vn thi gian xp hng trong my tm kim nh lp trn. Trong chng tip theo,
kha lun s gii thiu mt m hnh tm kim nh lp trn ng dng thut ton xp hng
nh c trnh by trn v nhng vn lin quan n vic th nghim m hnh ny.

46

Chng 4. Th nghim
4.1. M hnh th nghim
4.1.1. Cch tip cn
Qu trnh kho st v nh gi cc phng php xp hng nh cho thy thut ton
VisualRank [39] [40] l mt thut ton xp hng nh n gin v cho hiu qu kh cao.
Tuy nhin, cch lm ca Jing v Baluja l ch da trn tng ng v ni dung hin th
ca nh. Mt cch trc quan, chng ta c th thy rng cc nh c ni dung hin th gn
ging nhau th thng c ngi dng t tn gn ging nhau, v cc bnh lun cng
thng v ch m n hin th. Do , c th thy rng vng vn bn i km nh c th
m t c phn no ni dung hin th ca nh. V vy, trong kha lun ny, ti s dng
thut ton xp hng nh VisualRank cho c c trng hin th v c trng vn bn ca
nh. Tuy nhin, nhn thy rng c trng hin th ca nh vn phn nh ni dung nh mt
cch chn thc nht nn trong thc nghim cc c trng hin th vn c gn cho mt
trng s cao hn.
Hn na, Y.Jing v S.Baluja [39] [40] ch ra rng khng th tnh ma trn tng ng
cho hng t bc nh trn web. Mt cch gii quyt n gin l ch xy dng ma trn tng
ng cho mt tp N nh tr v u tin ca cc my tm kim thng mi. V th, ti thc
hin p dng thut ton VisulRank cho m hnh my tm kim nh lp trn c xut
trong kha lun. M hnh m kha lun trnh by di y s tm kim nh da trn mt
s my tm kim nh thng thng, sau trch xut N nh tr v u tin t cc my tm
kim ngun ny v s dng thut ton ni trn xp hng li cho chng.
Mt nhc im ln ca vic xp hng da trn ni dung hin th ca nh l thi
gian tnh hng. Bi v vic ti cc nh t Web v, trch xut cc thnh thnh phn c
trng, xy dng th tng ng l tn rt nhiu thi gian. Do , c th p dng
thut ton mt cch hiu qu vo my tm kim lp trn, kha lun s dng cc cch xp
hng khc nhau i vi mi trng thi khc nhau ca cu truy vn ngi dng. Cu truy
vn ca ngi dng c chia thnh hai trng thi: cu truy vn mi v cu truy vn c.
Mt cu truy vn c xem l mi nu n cha c bt k ngi dng no truy vn mt
ln no trn my tm kim. Hay ni cch khc l cu truy vn cha c trong c s d
liu. Ngc li, cu truy l c nu n tn ti trong c s d liu ca my tm kim. i
vi cu truy vn mi, my tm kim xp hng cc nh ch da trn c trng vn bn. Sau
47

, my tm kim lu cu truy vn ny vo c s d liu v tin hnh xp hng li cho cc
nh tm c ng vi cu truy vn da trn c ni dung hin th v ni dung vn bn ca
nh s dng cho cc ln truy vn sau. Vi cch lm nh vy, my tm kim lp trn
lun m bo yu cu v mt thi gian tm kim cho ngi dng.
m bo kt qu tr v cho ngi dng lun c cp nht, sau mt khong thi
gian nht nh, my tm kim s ly tt c cc cu truy vn c trong c s d liu v
gi n cc my tm kim ngun cp nht c s d liu nh. Sau tin hnh tnh hng
li cho tp cc nh ny. Khi tnh li ma trn tng ng cho cc nh, ti u ha vic
tnh ton, ta ch tnh ma trn tng ng cho cc nh mi ti v vi nhau v cho cc nh
mi ti v vi cc nh sn c trong c s d liu. Sau kt hp cc ma trn ny vi
ma trn tng ng ca cc nh c t trc thnh mt ma trn duy nht. Nh vy,
chng ta tit kim c thi gian tnh ma trn tng ng cho cc nh c trong c
s d liu.
Khi s lng cu truy vn ca ngi dng l rt nhiu, vic lu tr cc truy vn tr
thnh mt vn cn phi c quan tm. Nu lu tr tt c cc cu truy vn th chi ph
cho vic lu tr l rt ln v vic xp hng li cho tt c cc cu truy vn c th s tn qu
nhiu thi gian v khng th kim sot ni. Hn na, vic lu tr mi mt cu truy vn rt
t khi c s dng l khng hiu qu. Do , mt gii php c ra l ch lu cc cu
truy vn c cho l ph bin v c s d liu cc cu truy vn s thng xuyn c lm
mi. C th t ra mt ngng v s ln c s dng trong mt khong thi gian ca
mt cu truy vn xc nh xem n c l ph bin hay khng. Trong thi gian u, khi
s lng cc cu truy vn cn t, tt c cc cu truy vn c th u c coi l ph bin.
Nhm mc ch hng ti nhm ngi dng Vit Nam, m hnh my tm kim nh
m kha lun xut c tch hp mt b t in h tr cho cc truy vn ting Vit. Vi
cc truy vn ting Vit, my tm kim s dch n sang ting Anh ri mi gi n cc my
tm kim ngun. Vi vic h tr cho c truy vn ting Vit v ting Anh, m hnh my tm
kim nh m kha lun xut l rt thn thin vi ngi dng Vit Nam.
Di y l m hnh my tm kim nh lp trn p dng phng php v cc k
thut c nu trn.
4.1.2. M hnh xut v cc thnh phn trong m hnh
u vo: Mt truy vn c dng l mt chui cc t kha.
u ra: Mt danh sch cc nh c sp xp theo th t gim dn v ph hp
vi truy vn.
48

M hnh xut

Hnh 14. M hnh xut
Cc thnh phn trong m hnh
Giao din hin th
o L thnh phn giao tip vi ngi dng, thc hin hai chc nng chnh:
Nhn chui t kha truy vn t pha ngi dng gi cho b iu
vn.
49

Nhn danh sch cc nh c sp xp t m un xp hng hoc t
CSDL v hin th chng cho ngi s dng.
B iu vn
o Kim tra xem cu truy vn l ting Anh hay ting Vit. Nu l mt cu truy
vn ting Vit th s dng t in dch n sang ting Anh.
o Tin x l cu truy vn: a v ch thng, loi b t dng v cc k t
c bit, a v t gc.
o Ly cc truy vn c trong CSDL (ng d liu (1)) kim tra xem cu
truy vn nhn c l mt cu truy vn mi hay l mt truy vn c.
Nu l mt truy vn mi:
Chn cc my tm kim ngun s gi yu cu n. Sa i cu
truy vn v dng ph hp vi dng truy vn ca tng my tm
kim ngun c chn ri gi yu cu ti cc my tm kim
ny.
Nu l mt truy vn c: Gi thng bo v id ca truy vn n giao
din hin th (ng d liu (2)).
o Sau mt khong thi gian nht nh, b iu vn s ly cc truy vn c sn
t CSDL v gi yu cu n cc my tm kim ngun cp nht CSDL
(ng d liu (3)).
B x l kt qu
o Nhn kt qu tr v t cc my tm kim ngun, tng hp cc kt qu ny
li thnh mt danh sch duy nht v x l cc kt qu trng lp.
o Trch xut cc nh v cc thng tin cn thit lin quan n cc nh gi
cho m un xp hng.
M un xp hng
o Nu truy vn ca ngi dng l mt truy vn mi:
Nhn cc thng tin cn thit v cc nh t b phn x l kt qu, thc
hin tnh hng cho nh da theo ni dung vn bn ri tr li kt qu
cho thnh phn hin th.
Sau nh ch mc li cho cc nh, kt hp gia cc c trng v
ni dung hin th v c trng vn bn ca nh tnh hng li cho
nh. Lu kt qu tnh hng vo CSDL s dng cho ln truy vn sau
C s d liu
o Lu tr cc nh v cc thng tin v nh. Cc nh trong CSDL c phn
cm theo tp cu hi ngi dng.
50

o Lu tr tp cc cu hi m ngi dng truy vn n my tm kim lp
trn v kt qu xp hng ca cc cu hi ny.
4.2. Mi trng v cc thnh phn trong h thng phn mm
4.2.1. Cu hnh phn cng
Bng 2. Cu hnh phn cng s dng trong thc nghim
Thnh phn Ch s
CPU 1 Pentium IV 3.06 GHz
RAM 1GB
OS WindowsXP Service Pack 2
B nh ngoi 80GB
4.2.2. Cc thnh phn trong h thng phn mm
Cng c phn mm s dng:
Bng 3. Mt s phn mm s dng
STT Tn phn mm Tc gi Ngun
1
eclipse-SDK-
3.4.1-win32
http://www.eclipse.org/downloads
2
XAMPP 1.7.3

http://www.apachefriends.org/en/xampp-
windows.html#522
3
Apache Tomcat
6.0.26

http://tomcat.apache.org/download-
60.cgi
Cc th vin s dng:
Bng 4. Mt s th vin s dng
STT
Tn th vin Tc gi
Ngun
1
Lire-0.8 Caliph & Emir
http://www.semanticmetadata.net/lire/
51

2
Jama-1.0.2 Geoffrey Fox
http://math.nist.gov/javanumerics/jama
3
Json-simple-1.1
Douglas
Crockford
http://www.json.org/java/
4
nusoap
NuSphere &
Dietrich Ayala
http://sourceforge.net/projects/nusoap/
5
google-api-
translate-java-
0.92.jar

http://code.google.com/p/google-api-
translate-java/downloads/list
Ngoi cc cng c trn, ti tin hnh ci t cc m un x l da trn ngn ng
Java, bao gm cc gi phn mm chnh nh sau:
searcher: S dng cho vic thu thp d liu t cc my tm kim nh
Google v Yahoo.
CBIRMetaSearch: Thc hin cc nhim v nh i vi cc thnh phn
ca my tm kim lp trn: x l truy vn, x l kt qu, tnh hng.
Translator: Kim tra ngn ng ca truy vn. Nu l truy vn ting Vit
th gi n n Google Translate dch sang ting Anh.
Ngoi ra, m un giao din c vit da trn ngn ng PHP bao gm mt file
ImageMetaSearch.php hin th giao din cho php ngi dng nhp vo mt chui
truy vn v hin th kt qu.
Ti to mt web service thc hin giao tip gia m un giao din v m un
x l.









52

Giao din chng trnh

Hnh 15. Giao din ca chng trnh
4.3. Xy dng tp d liu
4.3.1. Tp truy vn
to tp truy vn mu phc v cho vic nh gi cht lng h thng, ti tin
hnh tm cc truy vn c s dng thng xuyn nht. Nhn thy rng ngi dng
thng s dng cc th nh tm kim nh mt cch chnh xc. V th ti thc hin
trch rt tp truy vn t cc th nh ph bin m ngi dng hay s dng c
Flickr
1
lit k.
Trong kha lun ny, ti ch trng vo vic tm cc nh c ni dung hin th gn
ging nhau. Qua qu trnh kho st thc t, ti thy rng i vi mt truy vn v s
kin nh l autumn festival hay mt truy vn mang ngha chung chung nh l
architecture th cc nh thuc v nhng ch nh th ny thng rt a dng. i
vi cc nh ny, c trng vn bn thng biu din c ch ca nh nhiu hn

1 http://www.flickr.com/photos/tags/
53

c trng hin th. Do vic s dng c trng hin th xp hng cho cc nh
thuc nhng ch nh vy l khng hiu qu. V vy, trong cc th nh ph bin v
cc th lin quan n chng, ti ch ly cc th v cc vt, ni chn c th lm
cc truy vn mu.
Theo cch lm nh trn, ti trch rt c mt tp 35 truy vn t cc th nh ph
bin v cc th nh lin quan n nhng th ny s dng vo vic nh gi h
thng.
4.3.2. Tp my tm kim ngun
Tp my tm kim ngun m ti s dng gi yu cu n v ly d liu t
l hai my tm kim nh Google v Yahoo. Vic chn hai my tm kim ny tm
kim trn n l v hai l do. Th nht, Google v Yahoo l hai my tm kim nh da
trn vn bn ln nht hin nay v c cht lng tm kim kh tt. Th hai, v c hai
my tm kim ny u nhn u vo l mt t kha truy vn, do vic truy vn trn
chng l rt d dng.
4.3.3. T in
B t in c s dng trong thc nghim l b cng c Google dch. Google
dch l mt cng c trc tuyn min ph ca Google, h tr pht hin ngn ng v
chc nng dch a ngn ng (trong c dch t ting Vit sang ting Anh). B cng
c ny tng i d s dng v c cht lng dch t ting Vit sang ting Anh kh
tt.
4.4. Quy trnh, cc phng n th nghim
Quy trnh th nghim c tin hnh nh sau:
Thc hin truy vn: Ln lt thc hin cc truy vn mu vo my tm kim.
Mi cu truy vn c thc hin hai ln nh gi cht lng ca hai phng php
xp hng da trn vn bn v xp hng da trn ni dung hin th v ni dung vn bn.
Thu thp d liu: Vi mi truy vn, h thng trch rt 64 nh tr v u tin t
my tm kim nh Google
1
v 50 nh tr v u tin t my tm kim nh Yahoo
2
. Sau
tng hp cc nh ny trong mt danh sch duy nht v tin hnh xp hng li cho
tp cc nh ny.

1 http://www.google.com/uds/samples/apidocs/image.html
2 http://images.search.yahoo.com/
54

Xp hng: Qu trnh xp hng c chia thnh hai giai on:
Giai on 1: i vi truy vn mi, xp hng da trn c trng vn bn. Giai
on ny c thc hin trc tuyn.
S dng o khong cch gia 2 xu k t tnh tng ng cho cc
chui vn bn. Cc c trng vn bn c s dng trong kha lun ny l: tn file
nh, nhan nh (title) v vng vn bn nh i km m t nh (content). Qua qu trnh
thc nghim, trng s cho tn file nh l 0.3, nhan nh l 0.1 v trng s cho vng
vn bn i km nh l 0.6 cho kt qu xp hng tt nht.
Thc hin thut ton visualRank cho cc o tng ng da trn vn bn vi
s vng lp l 100, h s hm l d = 0.85.
Ngoi ra, qua kho st thc t, ti nhn thy rng th hng ca nh do my tm
kim ngun xp hng cng c mt tm quan trng rt ln, v hn na cht lng tm
kim ca Google tt hn hn cht lng tm kim ca Yahoo. V th, vi mi nh ti
cng thm mt im s th hng c (l th hng do cc my tm kim ngun tnh
c) vi t l l 0.2 cho im s th hng c v 0.8 cho im s mi tnh c da
trn o tng ng gia cc nh. Cc h s trn c c t qu trnh thc nghim.
Giai on 2: Xp hng li cho tp cc nh. Giai on ny c thc hin ngoi
tuyn.
Ti cc nh v v loi b cc nh trng lp ri lu vo c s d liu.
S dng Lire trch xut cc c trng hin th ca nh, nh ch mc cho nh
da vo cc c trng ny. Cc c trng hin th c s dng l: mu sc v c
trng cnh (edge).
Tnh tng ng gia cc nh da trn cc c trng ni trn.
Kt hp o tng ng da trn c trng vn bn v o tng ng da
trn ni dung hin th vi t l: 0.3 cho o da trn c trng vn bn v 0.7 cho
o da trn ni dung hin th. Vi h s t l ny s cho kt qu xp hng tt nht.
Thc hin cc tnh ton tip theo nh giai on 1 i vi o tng ng tng
hp.
4.5. Kt qu th nghim v nh gi
Kha lun s dng chnh xc trung bnh (Average Precision) [4] nh gi
kt qu xp hng ca h thng so vi kt qu xp hng ca hai my tm kim ngun
55

Google v Yahoo. Kha lun cng so snh kt qu gia hai ln xp hng ca cng mt
truy vn. Ti th nghim vi tp 35 truy vn v sau nh gi chnh xc cho 50
nh tr v u tin.
Gi s ta c 5 i tng l: a, b, c, d, e
Trong a, b, c l cc i tng ph hp v d, e l cc i tng khng ph
hp.
Mt xp hng ca cc i tng cn nh gi l: c, a, d, b, e
chnh xc trung bnh c nh ngha nh sau:

AP =
P@K I(K)
n
k=1
I(])
n
]=1

Trong :
n l s i tng c xt.
P@K =
Mutch@K
K
(Match@K = s cc i tng ph hp K v tr
u tin)
I(K) = 1 nu i tng v tr K l ph hp, ngc li I(K) = 0
V d: P@1 = 1/1, P@2 = 2/2, P@3 = 2/3, P@4 = 3/4. Th chnh xc trung
bnh l:
AP =
1
1
1 +
2
2
1 +
S
4
1
S
= u.92
Gi tr trung bnh trn m xp hng (vi bi ton tm kim th l gi tr trung
bnh ca AP trn cc truy vn):
HAP =
AP

m
=1
m

Bng thng k chnh xc ca 50 nh u tin ca mi truy vn trn cc my
tm kim cho thy h thng c chnh xc trung bnh kh cao (MAP=81.2%). c
bit l i vi cc truy vn v mt vt th c hnh dng, mu sc xc nh nh
candle (AP=100%), guitar (AP=90.1%), iphone (AP=93.0%). Ngoi ra,
chnh xc ca h thng khi s dng thut ton xp hng ch da trn c trng vn bn
56

cng kh cao (MAP=79.7%) trong khi MAP ca Google l 76.1% v ca Yahoo l
66.8%. iu cho thy rng h thng hot ng tt cho c truy vn mi v c.
Tuy nhin, i vi cc truy vn m i tng tm kim khng r rng nh truy
vn cloud, wave th kt qu xp hng ca h thng cha thc s tt. i vi
wave, chnh xc ca h thng khi xp hng da trn ni dung hin th l 43.0%
trong khi chnh xc khi xp hng da trn ni dung vn bn l 60.7% v chnh
xc ca Google l 55.5 %.
Bng 5. chnh xc trung bnh trn 35 truy vn
Google Yahoo MS_Text MS_Content
ball 53.8 24.0 71.8 76.0
beach 95.5 40.4 97.4 88.5
bicycle 71.5 68.3 86.0 88.8
bike 53.5 41.1 81.2 79.0
bird 70.0 60.1 66.8 82.8
bridge 91.3 85.5 81.7 91.8
cake 76.8 92.0 84.9 92.3
candle 89.2 84.0 94.9 100
car 92.6 76.9 91.1 94.2
cat 97.2 81.5 86.2 97.1
christmas tree 95.7 91.3 100 96.3
church 69.1 34.7 65.2 76.9
cloud 56.9 49.6 42.5 40.8
cloud gate 86.9 55.4 73.1 70.5
cup 33.1 51.4 39.4 52.0
drums 87.7 70.2 95.5 90.5
duck 70.4 72.8 79.0 82.8
feathers 56.0 57.3 65.0 63.7
guitar 76.2 73.0 80.2 90.1
iphone 95.4 96.6 96.3 93.0
kids 51.2 82.0 70.9 75.1
kitten 83.8 93.9 91.4 82.9
lake 93.1 65.5 95.8 87.7
leaves 84.3 80.1 82.6 95.0
lemon 70.9 38.7 79.4 79.5
monkey 86.2 83.1 89.2 95.6
railway 61.2 92.5 72.4 68.0
river 72.7 66.5 69.7 78.3
road 78.9 81.2 91.3 83.0
snow 87.6 91.7 86.8 80.3
sun 70.1 45.1 70.9 73.6
57

sunrise 85.2 17.6 91.1 78.6
train 92.5 86.5 78.1 85.6
tree 70.6 74.6 84.4 88.4
wave 55.5 34.1 60.7 43.0
MAP 76.1 66.8 79.7 81.2


Hnh 16. Biu so snh chnh xc trung bnh gia cc h thng
nh gi kh nng tm kim v xp hng ca h thng i vi cc t kha
ting Vit, ti th nghim vi 5 truy vn ting Vit v o chnh xc ca 50 kt qu
u tin ca mi truy vn. Cc truy vn ting vit c chn l: Bc H, qu to,
con ong, my bay, hoa hng.
0%
20%
40%
60%
80%
100%
Sun Guitar Bicycle Cat Car Leaves
Google
Yahoo
MS_Text
MS_Content
58


Hnh 17. Biu chnh xc mc K ca mt s truy vn ting Vit
Biu trn th hin chnh xc mc K ca mt s truy vn ting Vit khi
c thc hin trn h thng tm kim nh lp trn. Biu cho thy h thng xp
hng kh chnh xc cho 20 nh u. Tuy chnh xc trung bnh cho 50 nh u tin
khng thc s tt nhng ngi dng thng ch quan tm 10 n 20 kt qu u tin.
Do tp 20 nh u l quan trng.
nh gi tc thc thi ca h thng, ti o thi gian xp hng ca cc truy
vn th nghim. Thi gian xp hng trung bnh cho mi truy vn l 40 giy. Khong
thi gian ny bao gm thi gian trch xut cc thnh phn c trng, tm v x l cc
nh trng lp, tnh ma trn tng ng da trn ni dung hin th v ni dung vn bn,
tnh hng cho cc nh v ghi kt qu vo file. Ti cng o thi gian phn hi ca h
thng i vi cc truy vn mi. Thi gian ny c tnh t lc h thng nhn c cu
truy vn n lc tr li kt qu cho ngi dng. Thi gian hi p trung bnh cho mi
truy vn mi l 20 giy. Nh vy, c th thy rng thi gian thc thi ca h thng l
trong mc cho php i vi mt my tm kim nh.
0%
20%
40%
60%
80%
100%
P@5 P@10 P@20 P@30 P@40 P@50
Bc H
Qu to
Conong
My bay
Hoa Hng
59


Hnh 18. 10 kt qu u tin ca truy vn sun trong cc my tm kim
60

Kt lun
Vi lng d liu nh a dng v phong ph trn Internet, nhu cu v mt h
thng xp hng nh l rt cn thit. Tuy nhng nghin cu v tm kim v xp hng
nh trn Web c quan tm t lu, nhng lnh vc ny vn cn nhiu vn cn
phi gii quyt. Nm bt c nhu cu , kha lun tn hnh nghin cu mt
thut ton xp hng nh da trn cc vn bn i km nh v chnh ni dung hin th
ca nh v tin hnh p dng th nghim trn mt m hnh my tm kim nh lp trn.
Cc kt qu chnh t c
Tm hiu cc thut ton xp hng trang Web v cc thut ton xp hng nh
in hnh. T xut p dng thut ton VisualRank cho c c trng vn
bn v c trng hin th ca nh trong xp hng.
a ra m hnh my tm kim nh lp trn p dng th nghim thut ton
xut. M hnh my tm kim ny quan tm n trng thi cu hi ngi
dng v h tr cc truy vn ting Vit. Do , nhng nghin cu ny l rt
hu ch cho ngi dng Vit Nam.
Tin hnh th nghim m hnh vi tp 35 cu truy vn c trch rt t cc
th ph bin trn Flickr. Kt qu ca m hnh l kh quan i vi c hai
phng php xp hng nh c s dng. chnh xc ca phng php
xp hng ch da trn ni dung vn bn l 79.7% v chnh xc ca phng
php xp hng da trn c ni dung hin th v ni dung vn bn l 81.2%,
tt hn so vi chnh xc ca Google (76.1%) v ca Yahoo (66.8%). Kha
lun cng th nghim vi mt s cu truy vn ting Vit. Kt qu th
nghim cho thy m hnh c th xp hng kh tt cho tp 20 nh u tin. T
nhng kt qu ban u cho thy tnh ng n ca m hnh.
Mt s vn cn tip tc gii quyt
Tuy m hnh bc u t c mt s kt qu kh quan trn tp d liu
th nghim, nhng i vi cc truy vn v s kin hoc v cc i tng
khng c th th thut ton xp hng cha gii quyt c tt.
Hn na, vn thi gian xp hng li v khng gian lu tr nh cng cn
c quan tm khi c s d liu ca h thng c m rng. Cn c mt gii
php thch hp va c th lu tr c d liu cho cng nhiu cu truy
61

vn cng tt, va c th thc hin xp hng li cho tt c cc cu truy vn
ny.
Mt vn khc l i vi cc truy vn tn ring (tn ngi, tn a danh,)
bng ting Vit th vic dch cc truy vn ny sang ting anh s lm cho kt
qu tm kim khng cn ng na. Hn na, nu kt qu dch ca t in
khng chnh xc th s dn n nhiu sai lch trong vic tm kim. Do ,
nu tm kim trc tip bng ting Vit th c th s c nhng kt qu tt hn.
Hng nghin cu tip theo
Trong thi gian ti, ngoi vic tip tc gii quyt cc vn cn tn ti, ti nh
hng mt s nghin cu tip theo:
Nghin cu thm v cc thut ton trch xut cc thnh phn c trng nh
nng cao hiu qu trong vic tnh tng ng gia cc nh.
Nghin cu cc phng php x l ting Vit tm kim nh trc tip bng
ting Vit.
S dng phn hi ca ngi dng nng cao cht lng h thng.


62

Ti liu tham kho
Ting Vit
[1] Th Diu Ngc, Nguyn Hoi Nam, Nguyn Thu Trang, Nguyn Yn Ngc
(2004). Gii php tnh hng trang Modified Adaptive PageRank trong my tm kim.
Chuyn san Cc cng trnh nghin cu v CNTT v Truyn thng, Tp ch BCVT,
14: 65-71, 4-2005.
[2] Nguyn Hoi Nam (2004). Thut ton tnh hng trang v xy dng m un th
nghim. Kha lun i hc, Trng HKHTN, HQGHN.
[3] Nguyn Thu Trang (2006). Link spam vi th Web v hng trang Web. Kha
lun i hc, Trng HCN, HQGHN.
[4] Nguyn Thu Trang (2009). Hc xp hng trong tnh hng i tng v phn cm
ti liu. Lun vn Thc s, Trng HCN, HQGHN.
[5] Nguyn Hong Trung (2009). Xy dng search engine. Lun vn Thc s, Trng
HCN, HQGHN.

Ting Anh
[6] Mehmet S. Aktas, Mehmet A. Nacar, Filippo Menczer (2004). Personalizing
PageRank based on domain profiles. WebKDD 2004: 83-90.
[7] Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, Panayiotis Tsaparas
(2005). Link analysis ranking: algorithms, theory, and experiments. ACM Trans.
Inter. Tech., 5(1):231-297.
[8] Amy N.Langville and Carl D.Meyer (2005). Deeper inside pagerank. Internet
Mathematics Journal, 1(3):335-380.
[9] Amy N.Langville, Carl D. Meyer (2004). A Reodering for the PageRank problem.
SIAM J. Sci. Comput., 27(6): 2112-2120.
[10] Anselm Spoerri (2004). RankSpiral: Toward Enhancing Search Results
Visualizations. IEEE Symposium on Information Visualization: 215.18.
[11] Benitez A.B., Beigi M., Shih-Fu Chang (1998). Using relevance feedback in
content-based image metasearch. IEEE Internet Computing, 2(4): 59-69.
63

[12] B. Uygar Oztekin, George Karypis, Vipin Kumar (2002). Expert agreement
and content based reranking in a meta search environment using Mearf. WWW
2002: 333-344.
[13] Baoning Wu and Brian D. Davison (2005). Identifying link farm spam pages.
WWW (Special interest tracks and posters) 2005: 820-829.
[14] Bo Luo, Xianogang Wang, and Xiaoou Tang (2003). A World Wide Web Based
Image Search Engine Using Text and Image Content Features. IS&T/SPIE
Electronic Imaging 2003, Internet Imaging IV, 5018: 123-130.
[15] Chik Ching Yiu, Ip Che Yin (2002). Image Ranking Schemes Using Link-
Structure Analysis Algorithm. WWW2002, http://www2002.org/CDROM/poster/
114/
[16] Cam-Tu Nguyen, Xuan-Hieu Phan, Susumu Horiguchi, Thu-Trang Nguyen,
Quang-Thuy Ha (2009). Web Search Clustering and Labeling with Hidden
Topics. ACM Trans. Asian Lang. Inf. Process. 8(3): 1-40.
[17] Eva Horster, Malcolm Slaney, Marc Aurelio Ranzato, Kilian Weinberger
(2009). Unsupervised image ranking. LS-MMRM '09: 81-88.
[18] Eric J. Glover (2001). Using Extra-Topical User Preferences To Improve Web-
Based Metasearch. PhD Thesis, The University of Michigan.
[19] G. Park, Y. Baek, and H. Lee (2003). Majority based ranking approach in web
image retrieval. CIVR 2003: 111-120.
[20] Hsinchun Chen, Haiyan Fan, Michael Chau, and Daniel Zeng (2001).
MetaSpider: Meta-Searching and Categorization on the Web. JASIST,
52(13):11341147.
[21] Herv Jgou, Matthijs Douze, Cordelia Schmid (2010). Product quantization
for nearest neighbor search. 2010 IEEE TPAMI, http://www.irisa.fr/texmex/
people/jegou/publications.php
[22] Herve Jegou, Matthijs Douze, Cordelia Schmid (2008). Recent Advances in
Large Scale Image Search. ETVC 2008: 305-326.
[23] Jon M. Kleinberg (1999). Authoritative Sources in a Hyperlinked Environment.
J.ACM, 46(5): 604-632.
64

[24] Kamarul Hawari Ghazali (2007). Feature Extraction technique using SIFT
keypoints descriptors. The International Conference on Electrical and Engineering
and Informatics Institut technology, Bandung, Indonesia, June 17-19, 2007.
[25] Lowe David (2004). Distinctive image features from scale-invariant keypoints.
Inter. J. Computer Vision 2004, 60(2):91110.
[26] Liangliang Cao, Andrey Del Pozo, Xin Jin, Jiebo Luo, Jiawei Han and
Thomas S. Huang (2010). RankCompete: simultaneous ranking and clustering of
web photos. WWW 2010: 1071-1072.
[27] L.S. Kennedy and M. Naaman (2008). Generating diverse and representative
image search results for landmarks. ACM Multimedia 2008: 349-358.
[28] Manoj M., Elizabeth Jacob (2008). Information retrieval on Internet using meta-
search engines: A review. J. Scientific & Industrial Research, 67(10):739-746.
[29] Mitsuru Ambai, Yuichi Yoshida (2009). Multiclass VisualRank: Image Ranking
Method in Clustered Subsets Based on Visual Features. SIGIR 2009: 732-733.
[30] Page, L., Brin, S., Motwani, R. and Winograd, T. (1998). The PageRank
citation ranking: bringing order to the Web. Technical report, Stanford University.
[31] Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang (2008). Image
Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing
Surveys, 40(2), April 2008.
[32] Sepandar Kamvar, Taher Haveliwala, and Gene Golub (2003). Adaptive
Methods for the Computation of PageRank. Technical report, Stanford University.
[33] Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, Shipeng Li (2009).
Descriptive Visual Words and Visual Phrases for Image Applications. ACM
Multimedia 2009: 75-8484.
[34] Shuhui Wang, Quingming Huang, Shuqiang Jiang, Lei Qin, Qi Tian (2009).
Visual ContextRank for web image re-ranking. The First ACM workshop on
Large-scale multimedia retrieval and mining: 121-128.
[35] Taher H. Haveliwala (2002). Topic-sensitive PageRank. Technical report,
Stanford University. May 711, 2002, Honolulu, Hawaii, USA.
65

[36] T.L. Berg, A.C. Berg (2009). Finding iconic images. The 2nd Internet Vision
Workshop at Conference on Computer Vision and Pattern Recognition (CVPR):1-
8.
[37] Viswanathan, M., Chang, C.-K., Moon, J.-H. Patlolla, A., (2009). Goggle (or
Gist on the Google Phone): A Content-Based Image Retrieval System for the
gPhone. CSCI-546 Project. http://ilab.usc.edu/~kai/projects/cs546-Spring2009-
Google.pdf
[38] Xinmei Tian, Dacheng Tao (2010). Active Reranking for Web Image Search.
IEEE Transactions on Image Processing, 19(3): 805-820 (2010).
[39] Yushi Jing, Shumeet Baluja (2008). Pagerank for product image search,
WWW08:307-316.
[40] Yushi Jing, Shumeet Baluja (2008). VisualRank: Applying PageRank to Large-
Scale Image Search. IEEE Trans. Pattern Anal. Mach. Intell., 30(11): 1877-1890.
[41] Z. Gyongyi and H. Garcia-Molina (2005). Web Spam Taxonomy. AIRWeb
2005: 39-47.
[42] Z. Gyongyi, H. Garcia-Molina, and J. Pendersen (2004). Combating Web
Spam with TrustRank. VLDB 2004: 576-587.

Vous aimerez peut-être aussi