Vous êtes sur la page 1sur 21

.. , ..

,
, . , ,
. , ,
.
Word Frequencies in Written and Spoken
English (Leech et al. 2001) .
, .
,
. , ,
, , ,
, , .
: ..
(1963), .. (1977), . (1993) .,
(400 1 )
:
.
,
, ( 1999, . 2003, . 1996),
. ,
(Josselson 1953), ( 1970),
( . 2008). , ,
; ,
,
,
.
. ,
Davies 2005 Davies & Gardner 2010.
, , ().
.
,
.
i

,
19502007 . ,
, 92 . .
(http://www.ruscorpora.ru)
( 2003, 2005, 20062008 .), 2001
.
XVIII XXI ( ), , ,
, , ,
, ,
.
,
, . .
.
,
, (British National Corpus),
(Corpus del espaol), (esk nrodn
korpus) . , (,
, . .) .
.
(, ),
, .
( 2005).
(, ..),
(, ..), .
, :
( )1, (
, , ..), (, , ..),
, , (, ..)
(, ..). , , , , ,
( 100 ).
54 , :
.
. ( ,
), ( , , ..) .

.
.
: , ,
1

.
, , - .
.
, ,
, - .
ii

( ).
, , . 1.
. 1.

, ,

39.04%

45 150 317

35 150 521

2 418

42.21%

48 818 173

39 739 644

27 390

16.96%

19 618 518

15 478 151

7 495

. .

11.30%

13 067 152

3 994

1.62%

1 872 482

1 075

1.49%

1 727 363

133

1.44%

1 664 804

488

0.57%

659 707

1 232

0.48%

556 291

439

0.26%

295 206

134

0.88%

1 017 568

758 407

1 005

( .. )

0.90%

1 037 468

827 580

61

100%

115 642 044

91 954 303

38 369

, ,
(), , , ,
. . ,
, .

, . ; 5%
.


(400 . ,
):
. , , 1970
( . 1972), , 16001700
400 . . ,
iii

:
(. . ),

(. . ).
2 ,
150 ,
( . Sharoff 2006). , ,
, ( 200500
), .
(, , )
. (

), 2.
. 2. ( , ipm)

202
609

364
1094

138
1058

436
756

428
818

69

15

11

499

421

250

282

292

193

110

75

78

415
58

632
242

595
135

503
91

650
110

,
. ,
,
.
,
. .
, .
,
, ,
.3 , ,
, .
2

.
, , .
3
(Church 2000), - whelk
problem (Kilgarriff 1997). (19831989 .). ,
, 1989
, ,
. Whelk ,
.
iv

2 ,
, . ,
, ,
.
,
. , ( BNC ),
. , ., ,
,
, Cieri & Liberman 2002),

.
, .
, , (),
, , .
,
.
, 25 .
, .
,
(. , , 1970, ), (),
(/, /), ( , ) . .
(. 5 ),
91 982 416 , ,
, ,
. ,
115 642 044 ( [ , -- ] ).

686 566 (, ), 1 729 928
, 564 555 70 931 ,
. 270 498 , 203 185 0
, 106 874 . 16.5%
, 100 37%, 1 000 60%, 2 000 69%, 10 000 85% (.
. 6.7).

4. 1.
, ipm
(instances per million words). ,
, . ,
55 400 . , 364
39 653 , ipm
v

137.5, 364.0 435.6, . ipm


92 (
92 ).
, F(ipm) 92, ,
805.8 ipm x 92 = 74 134 .
, , .
1, 2 ..,
10 000 . ,
,
,
(. 2). ipm
1, , , 1 000
(, , ), 120 ipm.
6.7. 6: , 1 000
0.6094, , 1 1 000 (
) 60.94% ; 50 000
93% .
(. 4), ,
, (. 6).

4. 2. R (range) (D)

, .
, , ,
, (.
). ,
, .

(ARF, Average Reduced Frequency),
(ermak & Ken 2005).
(, ,
, Lyne 1985) D, . (Juillands
D, . Juilland et al. 1970; .
Gries 2008).
D
:

D = 100 (1
)
n 1
, ,
(. . , n),
.
n (
, 100 , 90 ).
,
(, ) .
vi

R (range) , .
/ 0 (
) 1 ( ).
, D , , 100,
, , 0.4
, (R=100),
5381.4 ipm, , 97.
100 , 395.0
ipm,
, 76.
, 10.2 ipm, 916
( ) 3
9 , 9.
( 1) ipm, R
D, (), .
,
. , ,
. ,
,
(, ),
. , ,
(
). , , R
. ,
, ( ,
47 ). ,
, , R=71. ,
.
D ,
, , ,
, (Lyne 1986).
, ,
( 25 ipm), D 46,
78, 97, ,
( )
.
D R (range)
, :
. , :
(R=91) ,
400 (D=28).
, D
, , D
. . :
4

Leech et al. 2001.


100. .
vii

D
(. . ). , ,
,
.

4. 3. LLscore ( )

. ,
. ,
, , , , , ,
, . ,
5 .
(log
likelihood), :


b
+b
d
c+d

G2 (LLscore)
:

= 2( ln(

b
a+b
a+b
) + b ln(
)); E1 =
; E2 = d
E1
E2
c+d
c+d

, b, c, d , E1 E2
(. Rayson & Garside 2000).
(
),
. , ,
10 , , 5
500 .
, .
,
.
15.31, 99% ,
(Rayson &
Garside, 2000).
, , .
( ipm)
( ).
(15 ipm 10ipm ), (a) (a+b)
, .
, .
5

2003: 17-19 .
viii

, ipm
( 10:1).
. 3. LLscore

ipm

300

1000

30

100

30

100

30

1000

20 000 000 100 000 000 20 000 000 100 000 000 2 000 000 10 000 000 2 000 000 100 000 000
15

10

1.5

15

10

15

E1

200

20

20

20

E2

800

80

80

980

LL

56.34

5.63

5.63

4.43

10

(300
, )
(
15.31).
( D)
.
, ,
(Kilgarriff, 2005). , 195060
(
),

( , , ,
).

5
5.1

, :
,
. (Zipf 1935) (r, ) (f):

f kr,

k , (
), ,
( , ,
; . . 1975).
1:

.

ix

1000
900
800
700
600
500
400
300
200
100
0
100

10000

. 1: ( ).

,
. ,
20 000 ,
30 000, .
,
. ,
, .
100
5 (ipm),
13 000 (
460 ). , ,
, , . ,
,
, , ?
,

, , 40 .
2.6 ipm ( 2,
20 000 ) 0.4 ipm ( 1, 50 000 ,
33 ).

5.2

( )
. : ,
(. , ,
, ) ;
( , , , , , , , ).
x

,
( . 2005),
( 1977).
, .
. :
s (, , , ),
a (, , ),
num (, , ),
anum (, , )
v (, ),
adv (, ); (, ,
) (, ),
spro (, ),
apro (, ),
advpro (, );
(, ),
pr (, ),
conj (, ),
part (, , ),
intj (, ),
init (., .) 6.
,
,
.. . , ( 1977)
(. , , , , , , ;
) ( .
., .). ,
,
. .
,
.
, , (. ,
, , ),
(. , ).
,
( .. ).
(. /)
(. , ).
.
, , ,
, (,
, , ..,
6

Mystem Dialing ,
.
, , .
.
xi

, ,
(. ).
,
, ,
(), (),
(). Mystem
3% 45% .
,
.. .. (. . 2007),
.

5.3


, (.
, ,
, ). ,
.
( 1977, 1993)
, . ,
100 .
,
, (ermak et al. 2004).

,
. ( 4,8 . ,
5% ) Dialing ( 2004);
.
Mystem ( & 1998).

,
(
.. , .. ., , . & 2005).
93.81%.7
,
, 20 ,
.
, +
. ,
,
, :

;
+ + + . 3 000
93.07%.
97-98%.
xii

1) ,
, . , , ,
, . (.
, ).
2) , , , .
, , , ,
, , , , ,
, . :
, ,
. , ,
. ,
, ;
, , .
3) pluralia tantum, ,
, . , , ,
, , , , ,
, , , .

.

5.4

1 2, 1 2, , . .
, ( ), ,
. , ,
, (. , ).
, , :
. 4.
Lemma

PoS

s
v

F(ipm)

Doc

32.6
8.7

100
95

93
93

952
511

. , .
, , , ,
. , ,
, ,
, ( ), ( ),
( ),
.
,
, : , . . ,
, . , (,
. , .), ,
.

xiii

, ,
',
.
(. ).
.
, ,
VS .
, , (
/, /), 8
,
, . .

:
1. ( )
2. ( )
3. ( )
3.1..
3.1..
3.2..
3.2..
3.3..
3.3..
3.4..
3.4..
4.
5.
5.1.
5.2.
5.3.
5.4.
5.5. (, , , )
5.6.
5.7.
6.

7.
( 1) , PoS,
F(ipm), R (range), D
() Doc, . 50 000
() .
(. , ),
(*). 1 , , ipm

xiv

, 100, 1000, 100000 .. ,


.
. 5. 1 ( )
Lemma

PoS
s
v
v
adv

F(ipm)
0.5
0.4
1.0
0.7

R
15
18
51
41

D
63
72
85
84

Doc
22
25
76
54

, ( 2),
Rank, , PoS, F(ipm)
(19501969 , 19701989 , 19902007 )
8. 20 000
.
. 6. 2 ( )
Lemma

PoS

s
s

F(ipm)
32.5
32.5

19501960

.
197019901980
2000

6.4
2.1

11.0
4.0

15.7
16.0

19501960

19701980

22.2
10.4

23.9
74.5

19902000

64.5
52.1

. 7.
,

19501969
.

19701989

19902007

5 642 070

7 818 865

21 756 323

309

585

674 566

509

2 725 968
623

1 524

34 950 394
26 264

, 19902000 (.
. 7),
60 .
, , ,
,
.

,
, , 1975-2003 , 1900-2000- .
xv

,
( 1). 2.6 ipm,
.
( 3)
, ,
. 5 000
. F(ipm).
, .
. 8. 3.4 ( :
)
Lemma

PoS
s
a

F(sp)
22.7
26.5



(. . 4). , 9
, , , , . ,
.
, F(all) ipm,
, ipm, LLscore.
. 9. 3.4 ( )
Lemma

PoS
part
part
part

F(all)
1114.6
787.5
1785.1

F(sp)
17208.0
11847.0
15698.6

LL
50672
34394
32662

( 4)
5 ipm, , (
20 ). .
, , ,
. , :
. 10. 4 ( )
Word

F(ipm)
3504.1
631.5
5.5
276.9
45.7

, , ,
,
.
,
, , , . .
xvi


, , ipm , 100,
1000, 100000 .. , .
5 ( )
: , , , ( . .
), , , (,
, , ). F(ipm)
( ) Rank. 1
. , .
. 11. 5.7 (
: )
Lemma

PoS
pr
conj

F(sp)
147.3
134.9

( 6)
, , ,
. 6.1
F(abs) (%)

. 6.26.5 , ,
. ;
F(abs) Rank. 6.6
( , ).
6.7 : (Rank)
(Coverage). , ,
( 1) 3.6% , . . 3.6%
, 12 6.7% ,
110 16.6% , 93%
150000.
6.8 . 1100
, 101200 . .,
NT(im), NT(n)
NT(nf). ,
. 6.9 :
(L), (Example), (N)
ipm (F) (all) (im),
(n), (nf) (sp).
.
( 7).
,
,
. 1993 ,

.
xvii

,
ipm .
( Doc, R D)
.
. ,
, 150 (1.6 ipm)
50 .

,
90 , . , , , , .
, , , .. , ,
, , ,
, .., (,
, ), (15).
, , , , ,
.
7 , 2 500
. 1,
F(ipm), R D Doc.
. 12. 7 ( )
Lemma

F(ipm)
9.1
52.4
12.0
115.9
11.3

R
72
99
90
100
57

D
88
67
87
91
82

Doc
372
522
275
3387
305

,
(. , ),
(*).
(, ,
, , /
).
(), (), (), (), /, / ..
, .
, F(ipm).

***

. .. (), (Universitetet i Troms, ),
(University of Leeds, ), .
.
, , ,
xviii

,

. .. ,
, ,
.. .
.. , .. ,
.. , .. , .. , .. , ..
,

.. .. , .. , .. , C.. , .. ,
.. ,
.. , .. , .. ,
.. .. ,
, .. ,
.
.. .

( 66).

.

. .. (http://dict.ruslang.ru).


.., .. , .. (1975). //
. 2. 1. . 920. http://kudrinbi.ru/public/442/.
.., .. , .. (.) (1996).
. 4 . : .
.. (1977). : . .; 4 .:
.: , 2003.
.. (.) (1977). . .: .
(.) (1993). (Lnngren,
Lennart. The Frequency Dictionary of Modern Russian). Acta Univ. Ups., Studia Slavica Upsaliensia
Uppsala 32. Uppsala.
.., .., .. (2007).
// .. (.),
2007. : . . . 118125.
20032005: 20032005:
. .: , 2005.
xix

20062008: 20062008. .: ,
2009.
.., .. , .. (1972). . .:
.
.. (2005). ? //
20032005. . .: . . 620.
.. (1999). ( .. )
// 99
. , 1999. . . 2. . 230236.
C.O. (2005). //
20032005. . .: . .
6288.
.., .. (1998).
//
'98 . , 1998. .2. . 547552.
.. (2004). www.aot.ru //
:
2004. . http://www.dialog-21.ru/Archive/2004/Sokirko.pdf.
.., .. (2005).
(
) // 2005.
.: ndex. . 8094.
.. (1970). . .: .
.., .., .. (2003).
. .: .
.., .., .. (2008).
(1990 ). .: .
.. (2003). //
. 2. 5. . 819.
.. (1963). .
.
ermk, Frantiek & Michal Ken (2005). New generation corpusbased frequency dictionaries: The
case of Czech // International Journal of Corpus Linguistics, 10. P. 453467.
ermk, Frantiek, Michal Ken et al. (2004). Frekvenn slovnk etiny. Praha: NLN.

xx

Church, Kenneth W. (2000). Empirical estimates of adaptation: the chance of two Noriegas is closer to
p/2 than p2 // Proceedings of the 17th conference on Computational linguistics. Saarbrucken,
Germany, 2000. P. 180186.
Cieri, Christopher & Mark Liberman (2002). Language resources creation and distribution at the
Linguistic Data Consortium // Proceedings of LREC02. Las Palmas, Spain, 2002. C. 13271333.
Davies, Mark (2005). A Frequency Dictionary of Spanish: Core Vocabulary for Learners. London
N.Y.: Routledge.
Davies, Mark & Dee Gardner (2010). A Frequency Dictionary of American English: Word Sketches,
Collocates, and Thematic Lists. LondonN.Y.: Routledge. http://www.wordfrequency.info/
Gries, Stefan Th. (2008). Dispersions and adjusted frequencies in corpora // International Journal of
Corpus Linguistics 13, 4. P. 403437.
Josselson Harry H. (1953). The Russian word count and frequency analysis of grammatical categories
of Standard Literary Russian. Detroit: Wayne University Press.
Juilland, Alphonse, Dorothy Brodin & Catherine Davidovitch (1970). Frequency dictionary of French
words. The HagueParis: Mouton.
Kilgarriff, Adam (1997). Putting frequencies in the dictionary // International Journal of
Lexicography, 10(2). P. 135155.
Kilgarriff, Adam (2005). Language is never ever ever random // Corpus Linguistics and Linguistic
Theory 1 (2): 263276. http://www.kilgarriff.co.uk/Publications/2005-K-lineer.pdf
Leech, Geoffrey, Paul Rayson & Andrew Wilson (2001). Word Frequencies in Written and Spoken
English: based on the British National Corpus. Longman, London.
Lyne, Anthony A. (1986). In Praise of Juilland's 'D'; a contribution to the empirical evaluation of
various measures of dispersion applied to word frequencies // Ch. Muller (ed.) Methodes quantitatives
et informatiques dans l'etude des textes. GeneveParis. P. 588595.
Lyne, Anthony A. (1985). The vocabulary of French business correspondence: word frequencies,
collocations and problems of lexicometric method. Genve: Slatkine, Paris: Champion. (Travaux de
linguistique quantitative, 23).
Rayson, Paul & Roger Garside (2000). Comparing corpora using frequency profiling // Proceedings of
the Comparing Corpora Workshop at ACL 2000. Hong Kong, 2000. P. 16.
Sharoff, Serge (2006). Creating generalpurpose corpora using automated search engine queries //
Baroni, Marco, Silvia Bernardini (eds.): WaCky! Working papers on the Web as Corpus. Bologna:
Gedit. P. 6398. http://wackybook.sslmit.unibo.it.
Zipf, George Kingsley (1935). The PsychoBiology of Language: An Introduction to Dynamic
Philology. Boston: Houghton Mifflin.

xxi

Vous aimerez peut-être aussi