Académique Documents
Professionnel Documents
Culture Documents
,
, . , ,
. , ,
.
Word Frequencies in Written and Spoken
English (Leech et al. 2001) .
, .
,
. , ,
, , ,
, , .
: ..
(1963), .. (1977), . (1993) .,
(400 1 )
:
.
,
, ( 1999, . 2003, . 1996),
. ,
(Josselson 1953), ( 1970),
( . 2008). , ,
; ,
,
,
.
. ,
Davies 2005 Davies & Gardner 2010.
, , ().
.
,
.
i
,
19502007 . ,
, 92 . .
(http://www.ruscorpora.ru)
( 2003, 2005, 20062008 .), 2001
.
XVIII XXI ( ), , ,
, , ,
, ,
.
,
, . .
.
,
, (British National Corpus),
(Corpus del espaol), (esk nrodn
korpus) . , (,
, . .) .
.
(, ),
, .
( 2005).
(, ..),
(, ..), .
, :
( )1, (
, , ..), (, , ..),
, , (, ..)
(, ..). , , , , ,
( 100 ).
54 , :
.
. ( ,
), ( , , ..) .
.
.
: , ,
1
.
, , - .
.
, ,
, - .
ii
( ).
, , . 1.
. 1.
, ,
39.04%
45 150 317
35 150 521
2 418
42.21%
48 818 173
39 739 644
27 390
16.96%
19 618 518
15 478 151
7 495
. .
11.30%
13 067 152
3 994
1.62%
1 872 482
1 075
1.49%
1 727 363
133
1.44%
1 664 804
488
0.57%
659 707
1 232
0.48%
556 291
439
0.26%
295 206
134
0.88%
1 017 568
758 407
1 005
( .. )
0.90%
1 037 468
827 580
61
100%
91 954 303
38 369
, ,
(), , , ,
. . ,
, .
, . ; 5%
.
(400 . ,
):
. , , 1970
( . 1972), , 16001700
400 . . ,
iii
:
(. . ),
(. . ).
2 ,
150 ,
( . Sharoff 2006). , ,
, ( 200500
), .
(, , )
. (
), 2.
. 2. ( , ipm)
202
609
364
1094
138
1058
436
756
428
818
69
15
11
499
421
250
282
292
193
110
75
78
415
58
632
242
595
135
503
91
650
110
,
. ,
,
.
,
. .
, .
,
, ,
.3 , ,
, .
2
.
, , .
3
(Church 2000), - whelk
problem (Kilgarriff 1997). (19831989 .). ,
, 1989
, ,
. Whelk ,
.
iv
2 ,
, . ,
, ,
.
,
. , ( BNC ),
. , ., ,
,
, Cieri & Liberman 2002),
.
, .
, , (),
, , .
,
.
, 25 .
, .
,
(. , , 1970, ), (),
(/, /), ( , ) . .
(. 5 ),
91 982 416 , ,
, ,
. ,
115 642 044 ( [ , -- ] ).
686 566 (, ), 1 729 928
, 564 555 70 931 ,
. 270 498 , 203 185 0
, 106 874 . 16.5%
, 100 37%, 1 000 60%, 2 000 69%, 10 000 85% (.
. 6.7).
4. 1.
, ipm
(instances per million words). ,
, . ,
55 400 . , 364
39 653 , ipm
v
4. 2. R (range) (D)
, .
, , ,
, (.
). ,
, .
(ARF, Average Reduced Frequency),
(ermak & Ken 2005).
(, ,
, Lyne 1985) D, . (Juillands
D, . Juilland et al. 1970; .
Gries 2008).
D
:
D = 100 (1
)
n 1
, ,
(. . , n),
.
n (
, 100 , 90 ).
,
(, ) .
vi
R (range) , .
/ 0 (
) 1 ( ).
, D , , 100,
, , 0.4
, (R=100),
5381.4 ipm, , 97.
100 , 395.0
ipm,
, 76.
, 10.2 ipm, 916
( ) 3
9 , 9.
( 1) ipm, R
D, (), .
,
. , ,
. ,
,
(, ),
. , ,
(
). , , R
. ,
, ( ,
47 ). ,
, , R=71. ,
.
D ,
, , ,
, (Lyne 1986).
, ,
( 25 ipm), D 46,
78, 97, ,
( )
.
D R (range)
, :
. , :
(R=91) ,
400 (D=28).
, D
, , D
. . :
4
D
(. . ). , ,
,
.
4. 3. LLscore ( )
. ,
. ,
, , , , , ,
, . ,
5 .
(log
likelihood), :
b
+b
d
c+d
G2 (LLscore)
:
= 2( ln(
b
a+b
a+b
) + b ln(
)); E1 =
; E2 = d
E1
E2
c+d
c+d
, b, c, d , E1 E2
(. Rayson & Garside 2000).
(
),
. , ,
10 , , 5
500 .
, .
,
.
15.31, 99% ,
(Rayson &
Garside, 2000).
, , .
( ipm)
( ).
(15 ipm 10ipm ), (a) (a+b)
, .
, .
5
2003: 17-19 .
viii
, ipm
( 10:1).
. 3. LLscore
ipm
300
1000
30
100
30
100
30
1000
20 000 000 100 000 000 20 000 000 100 000 000 2 000 000 10 000 000 2 000 000 100 000 000
15
10
1.5
15
10
15
E1
200
20
20
20
E2
800
80
80
980
LL
56.34
5.63
5.63
4.43
10
(300
, )
(
15.31).
( D)
.
, ,
(Kilgarriff, 2005). , 195060
(
),
( , , ,
).
5
5.1
, :
,
. (Zipf 1935) (r, ) (f):
f kr,
k , (
), ,
( , ,
; . . 1975).
1:
.
ix
1000
900
800
700
600
500
400
300
200
100
0
100
10000
. 1: ( ).
,
. ,
20 000 ,
30 000, .
,
. ,
, .
100
5 (ipm),
13 000 (
460 ). , ,
, , . ,
,
, , ?
,
, , 40 .
2.6 ipm ( 2,
20 000 ) 0.4 ipm ( 1, 50 000 ,
33 ).
5.2
( )
. : ,
(. , ,
, ) ;
( , , , , , , , ).
x
,
( . 2005),
( 1977).
, .
. :
s (, , , ),
a (, , ),
num (, , ),
anum (, , )
v (, ),
adv (, ); (, ,
) (, ),
spro (, ),
apro (, ),
advpro (, );
(, ),
pr (, ),
conj (, ),
part (, , ),
intj (, ),
init (., .) 6.
,
,
.. . , ( 1977)
(. , , , , , , ;
) ( .
., .). ,
,
. .
,
.
, , (. ,
, , ),
(. , ).
,
( .. ).
(. /)
(. , ).
.
, , ,
, (,
, , ..,
6
Mystem Dialing ,
.
, , .
.
xi
, ,
(. ).
,
, ,
(), (),
(). Mystem
3% 45% .
,
.. .. (. . 2007),
.
5.3
, (.
, ,
, ). ,
.
( 1977, 1993)
, . ,
100 .
,
, (ermak et al. 2004).
,
. ( 4,8 . ,
5% ) Dialing ( 2004);
.
Mystem ( & 1998).
,
(
.. , .. ., , . & 2005).
93.81%.7
,
, 20 ,
.
, +
. ,
,
, :
;
+ + + . 3 000
93.07%.
97-98%.
xii
1) ,
, . , , ,
, . (.
, ).
2) , , , .
, , , ,
, , , , ,
, . :
, ,
. , ,
. ,
, ;
, , .
3) pluralia tantum, ,
, . , , ,
, , , , ,
, , , .
.
5.4
1 2, 1 2, , . .
, ( ), ,
. , ,
, (. , ).
, , :
. 4.
Lemma
PoS
s
v
F(ipm)
Doc
32.6
8.7
100
95
93
93
952
511
. , .
, , , ,
. , ,
, ,
, ( ), ( ),
( ),
.
,
, : , . . ,
, . , (,
. , .), ,
.
xiii
, ,
',
.
(. ).
.
, ,
VS .
, , (
/, /), 8
,
, . .
:
1. ( )
2. ( )
3. ( )
3.1..
3.1..
3.2..
3.2..
3.3..
3.3..
3.4..
3.4..
4.
5.
5.1.
5.2.
5.3.
5.4.
5.5. (, , , )
5.6.
5.7.
6.
7.
( 1) , PoS,
F(ipm), R (range), D
() Doc, . 50 000
() .
(. , ),
(*). 1 , , ipm
xiv
PoS
s
v
v
adv
F(ipm)
0.5
0.4
1.0
0.7
R
15
18
51
41
D
63
72
85
84
Doc
22
25
76
54
, ( 2),
Rank, , PoS, F(ipm)
(19501969 , 19701989 , 19902007 )
8. 20 000
.
. 6. 2 ( )
Lemma
PoS
s
s
F(ipm)
32.5
32.5
19501960
.
197019901980
2000
6.4
2.1
11.0
4.0
15.7
16.0
19501960
19701980
22.2
10.4
23.9
74.5
19902000
64.5
52.1
. 7.
,
19501969
.
19701989
19902007
5 642 070
7 818 865
21 756 323
309
585
674 566
509
2 725 968
623
1 524
34 950 394
26 264
, 19902000 (.
. 7),
60 .
, , ,
,
.
,
, , 1975-2003 , 1900-2000- .
xv
,
( 1). 2.6 ipm,
.
( 3)
, ,
. 5 000
. F(ipm).
, .
. 8. 3.4 ( :
)
Lemma
PoS
s
a
F(sp)
22.7
26.5
(. . 4). , 9
, , , , . ,
.
, F(all) ipm,
, ipm, LLscore.
. 9. 3.4 ( )
Lemma
PoS
part
part
part
F(all)
1114.6
787.5
1785.1
F(sp)
17208.0
11847.0
15698.6
LL
50672
34394
32662
( 4)
5 ipm, , (
20 ). .
, , ,
. , :
. 10. 4 ( )
Word
F(ipm)
3504.1
631.5
5.5
276.9
45.7
, , ,
,
.
,
, , , . .
xvi
, , ipm , 100,
1000, 100000 .. , .
5 ( )
: , , , ( . .
), , , (,
, , ). F(ipm)
( ) Rank. 1
. , .
. 11. 5.7 (
: )
Lemma
PoS
pr
conj
F(sp)
147.3
134.9
( 6)
, , ,
. 6.1
F(abs) (%)
. 6.26.5 , ,
. ;
F(abs) Rank. 6.6
( , ).
6.7 : (Rank)
(Coverage). , ,
( 1) 3.6% , . . 3.6%
, 12 6.7% ,
110 16.6% , 93%
150000.
6.8 . 1100
, 101200 . .,
NT(im), NT(n)
NT(nf). ,
. 6.9 :
(L), (Example), (N)
ipm (F) (all) (im),
(n), (nf) (sp).
.
( 7).
,
,
. 1993 ,
.
xvii
,
ipm .
( Doc, R D)
.
. ,
, 150 (1.6 ipm)
50 .
,
90 , . , , , , .
, , , .. , ,
, , ,
, .., (,
, ), (15).
, , , , ,
.
7 , 2 500
. 1,
F(ipm), R D Doc.
. 12. 7 ( )
Lemma
F(ipm)
9.1
52.4
12.0
115.9
11.3
R
72
99
90
100
57
D
88
67
87
91
82
Doc
372
522
275
3387
305
,
(. , ),
(*).
(, ,
, , /
).
(), (), (), (), /, / ..
, .
, F(ipm).
***
. .. (), (Universitetet i Troms, ),
(University of Leeds, ), .
.
, , ,
xviii
,
. .. ,
, ,
.. .
.. , .. ,
.. , .. , .. , .. , ..
,
.. .. , .. , .. , C.. , .. ,
.. ,
.. , .. , .. ,
.. .. ,
, .. ,
.
.. .
( 66).
.
. .. (http://dict.ruslang.ru).
.., .. , .. (1975). //
. 2. 1. . 920. http://kudrinbi.ru/public/442/.
.., .. , .. (.) (1996).
. 4 . : .
.. (1977). : . .; 4 .:
.: , 2003.
.. (.) (1977). . .: .
(.) (1993). (Lnngren,
Lennart. The Frequency Dictionary of Modern Russian). Acta Univ. Ups., Studia Slavica Upsaliensia
Uppsala 32. Uppsala.
.., .., .. (2007).
// .. (.),
2007. : . . . 118125.
20032005: 20032005:
. .: , 2005.
xix
20062008: 20062008. .: ,
2009.
.., .. , .. (1972). . .:
.
.. (2005). ? //
20032005. . .: . . 620.
.. (1999). ( .. )
// 99
. , 1999. . . 2. . 230236.
C.O. (2005). //
20032005. . .: . .
6288.
.., .. (1998).
//
'98 . , 1998. .2. . 547552.
.. (2004). www.aot.ru //
:
2004. . http://www.dialog-21.ru/Archive/2004/Sokirko.pdf.
.., .. (2005).
(
) // 2005.
.: ndex. . 8094.
.. (1970). . .: .
.., .., .. (2003).
. .: .
.., .., .. (2008).
(1990 ). .: .
.. (2003). //
. 2. 5. . 819.
.. (1963). .
.
ermk, Frantiek & Michal Ken (2005). New generation corpusbased frequency dictionaries: The
case of Czech // International Journal of Corpus Linguistics, 10. P. 453467.
ermk, Frantiek, Michal Ken et al. (2004). Frekvenn slovnk etiny. Praha: NLN.
xx
Church, Kenneth W. (2000). Empirical estimates of adaptation: the chance of two Noriegas is closer to
p/2 than p2 // Proceedings of the 17th conference on Computational linguistics. Saarbrucken,
Germany, 2000. P. 180186.
Cieri, Christopher & Mark Liberman (2002). Language resources creation and distribution at the
Linguistic Data Consortium // Proceedings of LREC02. Las Palmas, Spain, 2002. C. 13271333.
Davies, Mark (2005). A Frequency Dictionary of Spanish: Core Vocabulary for Learners. London
N.Y.: Routledge.
Davies, Mark & Dee Gardner (2010). A Frequency Dictionary of American English: Word Sketches,
Collocates, and Thematic Lists. LondonN.Y.: Routledge. http://www.wordfrequency.info/
Gries, Stefan Th. (2008). Dispersions and adjusted frequencies in corpora // International Journal of
Corpus Linguistics 13, 4. P. 403437.
Josselson Harry H. (1953). The Russian word count and frequency analysis of grammatical categories
of Standard Literary Russian. Detroit: Wayne University Press.
Juilland, Alphonse, Dorothy Brodin & Catherine Davidovitch (1970). Frequency dictionary of French
words. The HagueParis: Mouton.
Kilgarriff, Adam (1997). Putting frequencies in the dictionary // International Journal of
Lexicography, 10(2). P. 135155.
Kilgarriff, Adam (2005). Language is never ever ever random // Corpus Linguistics and Linguistic
Theory 1 (2): 263276. http://www.kilgarriff.co.uk/Publications/2005-K-lineer.pdf
Leech, Geoffrey, Paul Rayson & Andrew Wilson (2001). Word Frequencies in Written and Spoken
English: based on the British National Corpus. Longman, London.
Lyne, Anthony A. (1986). In Praise of Juilland's 'D'; a contribution to the empirical evaluation of
various measures of dispersion applied to word frequencies // Ch. Muller (ed.) Methodes quantitatives
et informatiques dans l'etude des textes. GeneveParis. P. 588595.
Lyne, Anthony A. (1985). The vocabulary of French business correspondence: word frequencies,
collocations and problems of lexicometric method. Genve: Slatkine, Paris: Champion. (Travaux de
linguistique quantitative, 23).
Rayson, Paul & Roger Garside (2000). Comparing corpora using frequency profiling // Proceedings of
the Comparing Corpora Workshop at ACL 2000. Hong Kong, 2000. P. 16.
Sharoff, Serge (2006). Creating generalpurpose corpora using automated search engine queries //
Baroni, Marco, Silvia Bernardini (eds.): WaCky! Working papers on the Web as Corpus. Bologna:
Gedit. P. 6398. http://wackybook.sslmit.unibo.it.
Zipf, George Kingsley (1935). The PsychoBiology of Language: An Introduction to Dynamic
Philology. Boston: Houghton Mifflin.
xxi