Vous êtes sur la page 1sur 53

String Patterns: Searching for Interesting Words and Numbers

Department of Mathematics Amherst College, Amherst, Massachusetts Thursday, October 6, 2011


Roger Bilisoly, PhD Associate Professor of Statistics Central Connecticut State University

Overview of Talk

String Patterns and Examples


Unusual words, squares, and primes Including square anagrams

Anagrams of Words and Numbers

Birthday Problem and Pangrams


Analyzing Dickens A Christmas Carol.

1. String Patterns
Regular expressions (also called regexes) are used to find string patterns. A variety of software packages has them implemented, e.g., Mathematica, Perl, SAS, Emacs, and so forth. Well use them to find interesting words (from wordlists available on the web) and interesting numbers (e.g., squares with unusual digit patterns).

Perl Regexes
For a wordlist, one word per line: /cat/ would match cat cats scatter but NOT Cat /[cC]at/ would match cat Cat or Catcher /cat/i would match cat CaT or sCaTtEr i stands for case insensitive /cat|dog/ would match either cat or dog Well see examples of more complex string patterns.

See Chapter 2 of Bilisoly (2008b).

Example: Some Unusual Words


Are there other words like bookkeeper with three double letters in a row?
Essentially no: {bookkeeper, bookkeepers, bookkeeping, bookkeepings}

Are there words containing mile?


{besmiled, besmiles, camomiles, facsimiles, homiletic, outsmiled, outsmiles, similes, smiled, smiler, smilers, smiles}
wordlist = Import["c:\CROSSWD.TXT","Lines"]; threepair = Pick[wordlist, StringMatchQ[wordlist, RegularExpression[".*(.)\\1(.)\\2(.)\\3.*"]]] milewords = Pick[wordlist, StringMatchQ[wordlist, RegularExpression[".+mile.+"]]]

Word Graphs
before b-e-f | | r-o
concern e-c | |\ r-n-o

state s-t-a | e
decency d-e-c |/| n y

within w-i-t |\| n h


rather r-a-t \ | e-h

prepared p-r || a e-d


prospers p-r ||\ e s-o

eagle e-a | | l-g

Each node is a distinct letter, and each edge connects letters that are adjacent in the word. Graphs are directed: the arrows are understood. From Section 24 of Eckler (1996).

Searching for word graphs

Linear words have no branches, e.g., A-M-H-E-R-S-T.


o o o

What is longest such word? Answer: ambidextrously has 14 letters lycanthropies, metalworkings, multibranched, unpredictably are only examples with 13 letters

How many square cyclic words are there? (like EAGLE)


Need to match regular expression /(.)...\1/ and have 4 distinct letters. Latter can be checked by taking intersection. o 417 such words, including dazed (and other 4 letter weak verbs starting with d in the past tense), sails (and other 4 letter nouns starting with s, etc.
o

Longest cyclic words are 12 letters long: spaceflights, speculations, subharmonics, subordinates, switchblades, switchboards, sympathizers.

C(3) through C(8) Alphabets


area bomb chic dead ease fief gong high impi ---kick leal maim noun ouzo pump ---roar saws text unau ---whew ---------aroma blurb comic dread eagle ----going hatch iambi ----knock local maxim nylon outdo plump ----razor stars theft --------widow xerox yolky ----asthma benumb cosmic demand excuse -----gaming health incubi -----kopeck lawful medium napkin overdo pickup -----rather shirts throat ----------window -----yearly -----amnesia brewpub chronic dogsled eclipse ------glowing hawkish intagli ------kinfolk logical midterm newborn oregano parsnip ------regular sailors tourist ------------whipsaw ------------------asphyxia -------catholic disabled earphone -------gambling hyacinth --------------kinsfolk lightful mealworm nitrogen obligato pawnshop -------roadster sardines tolerant --------------withdraw -------yeastily -------angostura --------chromatic diagnosed enjoyable --------gathering haircloth --------------------------------mechanism ----------------playgroup --------regulator seafronts therapist ----------------worldview -------------------------

Cyclic Word Mathematica Code

Applying String Patterns to Square Integers (Squares)


Numbers are strings, too, so amenable to regexes. Well apply regexes to find some unusual squares. We will also investigate the randomness of the digits of squares.

Squares without Doubled Adjacent Digits


Suppose the digits in a square (in base 10) are random. Then P(adjacent digits are unequal) = 9/10. So P(100 digit square has no adjacent digits equal) = 0.9^99 = 0.0000295127, or about 30 in a million. We can check this estimate by stochastic computer searches.
51737187749414391248343906418265954222307660356158 ^2 = 2676736596218154562635394063470527614352180564931651707698543214192825076469507142492903267408520964 92247034353159048872216907430912506658722672391337 ^2 = 8509515346952905702167423230637367421917540367430367521058492897148214941569637843428692738072647569 46684912798986257941402247358809860358456884854826 ^2 = 2179481083048950920786370528301507294579140187693485325858417852926259858127813068545985375095490276 72400570183600937943662908692744405525154232178778 ^2 = 5241842562910525153020949368218058035430952649615191319089815789036984216473806049461870608953573284 etc.

Code to Compute Counts of Such Squares


Do[ total = 0; nsteps = 1000000; ndigits = 100; start = Ceiling[Sqrt[10^(ndigits-1)]]; stop = Floor[Sqrt[10^(ndigits)]]; Do[ square = Random[Integer, {start,stop}]^2; match = StringCases[ToString[square], RegularExpression["(.)\\1"]]; If[Length[match] == 0, ++total, Null], {i,1,nsteps} ] Print[total], {nreps,1,50} 50 repetitions of searching ] a million squares.

Are the Digits of Squares Random? The initial digits are not.
The limiting proportion of digits 1 k 9 is given by: (Sqrt[k+1] - Sqrt[k] + Sqrt[10(k+1)] Sqrt[10k])/9.
1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 Lower 100 142 174 200 224 245 265 283 300 317 448 548 633 708 775 837 895 949 Upper 141 173 200 223 244 264 282 300 316 447 547 632 707 774 836 894 948 999 Lower^2 10000 20164 30276 40000 50176 60025 70225 80089 90000 100489 200704 300304 400689 501264 600625 700569 801025 900601 Upper^2 19881 29929 40000 49729 59536 69696 79524 90000 99856 199809 299209 399424 499849 599076 698896 799236 898704 998001

Digit 1 2 3 4 5 6 7 8 9 Total =

Prob. 19.16% 14.70% 12.39% 10.92% 9.87% 9.08% 8.45% 7.93% 7.50% 100.00%

Aside: Benfords Law


Benfords Law says that initial digits often follow the following probability distribution:
Log[10, (k + 1)/k] for 1 k 9.

{an} satisfies Benfords law iff Log[10, an] (mod 1) is uniformly distributed. See http://en.wikipedia.org/wiki/Benford's_law. Benfords Law does not fit the distribution of initial digits of squares.

Digit 1 2 3 4 5 6 7 8 9 Total

Prob. 30.10% 17.61% 12.49% 9.69% 7.92% 6.69% 5.80% 5.12% 4.58% 100.00%

The final digits of squares are not random.


Final digit can only be: {0,1,4,5,6,9} (only 60% of possible one digit endings are allowed) Final two digits can only be:
{00,01,04,09,16,21,24,25,29,36,41,44,49,56,61,64,69,76, 81,84,89,96} (only 22% of possible two digit endings allowed)

Final three digits can only be:


{000,001,004,009,016,024,025,036,041,044,049,056,064,076 ,081,084,089,096,100,104,116,121,124,129,136,144,156,161 ,164,169,176,184,196,201,204,209,216,224,225,236,241,244 ,249,256,264,276,281,284,289,296,304,316,321,324,329,336 ,344,356,361,364,369,376,384,396,400,401,404,409,416,424 ,436,441,444,449,456,464,476,481,484,489,496,500,504,516 ,521,524,529,536,544,556,561,564,569,576,584,596,600,601 ,604,609,616,624,625,636,641,644,649,656,664,676,681,684 ,689,696,704,716,721,724,729,736,744,756,761,764,769,776 ,784,796,801,804,809,816,824,836,841,844,849,856,864,876 ,881,884,889,896,900,904,916,921,924,929,936,944,956,961 ,964,969,976,984,996} (only 15.9% allowed)

The percentages are decreasing, but to what value?

Walter Penneys Theorem


Pick n digits at random, and let P(n) = Probability that these n digits are the final digits of a square. Theorem Penney (1960): As n , P(n) 5/72 6.94%.
Penney shows that P(n) = (2n-1 + 4)(5n+1 + 7)/(36*10n) for n even and P(n) = (2n-1 + 5)(5n+1 + 11)/(36*10n) for n odd.
n P(n) .6000000 .2200000 .1590000 .1044000 .0912100 .0781320 .0748719 P(n)/P(n-1) .6000 .3667 .7227 .6566 .8736 .8564 .9590

For a proof see Walter Penney (1960) On the Final Digits of Squares. Also see Walter Stangl (1996) Counting Squares in Zn

1 2 3 4 5 6 7

Equi-Pandigital Primes
An equi-pandigital number in base b contain each digit from 0 through (b-1) exactly the same number of times. Theorem. For b > 3, there are no equi-pandigital primes. Proof. Let n be an equi-pandigital number in base b. Then mod (b 1) n is congruent to the sum of its digits because bn 1n = 1. Let r be the # of repetitions of 0, 1, 2, , b 1, which sum to b(b 1)/2. So we have: If b is even, then n 0 (mod b 1) since b/2 is an integer, so (b 1) divides n. If b is odd, then (b 1)/2 is an integer, so either n 0 or (b 1)/2. In both cases, (b 1)/2 divides n. For b > 3, (b 1) and (b 1)/2 are nontrivial, so n is not prime. QED Remark 1: Finding a base 10 equi-pandigital prime will take some trickery. Remark 2: 102, 1001012, 1010012, 100010112, etc. are prime, as are 1023, 2013, 1000122123, 1000221123, etc.

Equi-Pandigital Gaussian Primes


Theorem 9.15 of Deskins (1964): Let p be a prime in Z, then all Gaussian primes in Z[i] fall into one of the following three cases, up to units. (a) If p 3 (mod 4), then p is a Gaussian prime. (b) If p 1 (mod 4), then p = (a + b i)(a b i) is the Gaussian prime factorization, where a, b is the unique (up to order) solution to p = a2 + b2. (c) If p = 2, then 2 = (1 + i)(1 i) is the factorization into Gaussian primes. Proof: See Deskins (1964). Remark: Brillharts algorithm can find a and b for p 1 (mod 4). See Williams (1995).
Lets make an estimate of the number of (5,5)-digit pandigital primes. Since P(m Z is prime) 1/log(m) (just differentiate the logarithmic integral), we need to multiply this by the number of (a + b i)s satisfying a > b and condition (b) above. Hence 105 > a > b > 104.5, which implies that 2(1010) > a2 + b2. Pick two digits from 1 through 9 to use as a and b, and the rest of the digits can be put in any order, so the total number of expected Gaussian primes is approximately:

Search Results
By computer search, there are 69774 equi-pandigital Gaussian primes of the form a + b i, a > b > 0. Here are some interesting ones:
Pandigital Gaussian prime 96530 + 87421i 20468 + 13597i 98765 + 10234i 60143 + 59872i 86420 + 79513i 20864 + 13579i 97531 + 82604i Distinguishing property Max norm Min norm Max real imaginary parts Min real imaginary parts Largest real part with all even digits Smallest imaginary part with all odd digits Largest real part with all odd digits

All the Equi-Pandigital Gaussian Primes

2. Anagrams of Words and Numbers


An anagram of a word is a non-identity permutation of that words letters.
E.g., Amherst is an anagram of hamster. One word anagrams are sometimes called transpositions in wordplay. In wordplay, some require an anagram to have a related meaning to the original word.

Well also consider anagrams of numbers. In what follows, initial zeros are forbidden.
E.g., 132 = 169, 142 = 196, and 312 = 961 are anagrams of each other (in base 10).

How to find English word anagrams Step 1: Obtain a wordlist.


There are now a variety of sources available:
American Cryptogram Association at http://cryptogram.org/cdb/words/words.html National Puzzlers League at http://www.puzzlers.org/dokuwiki/doku.php?id=solving:wordlists: about:start Grady Wards Moby word lists (in public domain) http://icon.shef.ac.uk/Moby/

The above wordlists include all the inflected forms of words: nouns with both singular and plural forms, adjectives with comparative forms, verbs with all conjugated forms, etc.

Step 2: Read in the wordlist and store it in a hash.


The hash key equals the letters of the word sorted in alphabetical order. Examples: aah -> aah aahed -> aadeh aahing -> aaghin aardvark -> aaadkrrv Perl program from Section 3.7.2 of Bilisoly (2008b):
open(WORDS, "CROSSWD.TXT") or die; while (<WORDS>) { chomp; @letters = split(//); $key = join('',sort(@letters)); if ( exists($dictionary{$key}) ) { $dictionary{$key} .= ",$_"; } else { $dictionary{$key} = $_; } } foreach $key (sort keys %dictionary) { print "$key, $dictionary{$key}\n"; }

If key already exists, then an anagram has been discovered. Example: evil, live, vile, veil all have the key eilv.

aa aah aahed aahing aahs aal aalii aaliis aals aardvark aardvarks aardwolf aardwolves aas aasvogel aasvogels aba abaca abacas abaci aback abacus abacuses

Step 3: Print out the hash with the keys sorted in alphabetical order.
The result (see right) is an anagram dictionary. Invaluable for word games such as Scrabble and Jumble: just sort the letters at hand and check if they form a word. Looking for entries with two or more commas reveals word anagrams.
Most words do not have anagrams.
aa, aa aaaaabbcdrr, abracadabra aaaabcceelrstu, baccalaureates aaaabcceelrtu, baccalaureate aaaabdilmorss, ambassadorial aaaabenn, anabaena aaaabenns, anabaenas aaaaccdiiklllsy, lackadaisically aaaaccdiiklls, lackadaisical aaaaccrr, caracara aaaaccrrs, caracaras aaaacgnr, caragana aaaacgnrs, caraganas aaaacmnrst, catamarans aaaacmnrt, catamaran

Numbers are formed from the alphabet {0,1,2,3,4,5,6,7,8,9}.

The program above can easily be modified to find anagrams of a set of numbers. In recreational mathematics, it is well known that 122 = 144, 212 = 441; and 132 = 169, 312 = 961, 142 = 196. Unlike words, it turns out that it is easy to find two or more squares that are anagrams. For example, the following 87 squares are anagrams of each other:
1026753849, 1042385796, 1098524736, 1237069584, 1248703569, 1278563049, 1285437609, 1382054976, 1436789025, 1503267984, 1532487609, 1547320896, 1643897025, 1827049536, 1927385604, 1937408256, 2076351489, 2081549376, 2170348569, 2386517904, 2431870596, 2435718609, 2571098436, 2913408576, 3015986724, 3074258916, 3082914576, 3089247561, 3094251876, 3195867024, 3285697041, 3412078569, 3416987025, 3428570916, 3528716409, 3719048256, 3791480625, 3827401956, 3928657041, 3964087521, 3975428601, 3985270641, 4307821956, 4308215769, 4369871025, 4392508176, 4580176329, 4728350169, 4730825961, 4832057169, 5102673489, 5273809641, 5739426081, 5783146209, 5803697124, 5982403716, 6095237184, 6154873209, 6457890321, 6471398025, 6597013284, 6714983025, 7042398561, 7165283904, 7285134609, 7351862049, 7362154809, 7408561329, 7680594321, 7854036129, 7935068241, 7946831025, 7984316025, 8014367529, 8125940736, 8127563409, 8135679204, 8326197504, 8391476025, 8503421796, 8967143025, 9054283716, 9351276804, 9560732841, 9614783025, 9761835204, 9814072356.

A Pattern Emerges

# Digits 1 2 3 4 5 6 7 8 9 10

# Squares # Anasquares Proportion 3 0 0.00% 6 0 0.00% 22 7 31.82% 68 13 19.12% 217 86 39.63% 683 293 42.90% 2163 1212 56.03% 6837 4699 68.73% 21623 17380 80.38% 68377 60623 88.66%

In fact, looking at n-digit squares, it seems that as n increases, the proportion of squares with square anagrams (lets call these anasquares) keeps increasing. What is the limit?
The above table is Table 1 from Bilisoly (2008a). Also see http://oeis.org/A177952.

The limit is 100%!


Let Sd,b be the set of squares with exactly d digits when written in base b. Define a pattern of a number n to be the digits of n in base b sorted from least to greatest. Note that a pattern is a hash key.
Theorem (Bilisoly, 2008a): The proportion of anasquares in Sd,b 1 as d and for b fixed. Proof: A lower bound to the number of anasquares occurs when as many as possible patterns correspond to exactly 1 square. To find this lower bound, we count the number of patterns and d-digit squares. First, thinking back to the Perl program, the hash key of a number is obtained by sorting its digits. Let di be the number of times the digit i appears in a square. Then this hash key can be represented by:

* *... * | * *... * | ... | * *... *


d0 d1 db
1

End of Proof
The number of distinct hash keys is:
d b 1 d (d b 1)(d b 2)...(d (b 1)! 1)

Second, the number of d digit squares is:

bd 1

bd

bd / 2 b(d

1) / 2

b d / 2 (1 1 / b ).

Hence the number of d-digit squares is exponential (in d), but the number of patterns is a polynomial (in d), so the proportion of anasquares is bounded below by the following, which 1 as d (and b is fixed.)

b d / 2 (1 1 / b ) max

(d

b 1)(d b 2)...(d 1) 1 (b 1)! ,0 d/2 b (1 1 / b )


QED

3. Birthday Problem and Pangrams


The basic birthday problem is famous: For n people, assuming all days are equally likely, what is the probability that at least two people share the same birthday? The following are related:
Let Nshared = number of people such that 1 birthday appears at least 2 times. Let Nall = number of people such that all 365 birthdays appear at least once.

Note E(Nshared) = 24.6166 > the usual # of people quoted. Why?


P(22 people, 2 share) = 0.475695 P(23 people, 2 share) = 0.507297

Results from: Flajolet, Gardy, and Thimonier (1992)


Suppose all days are not equally likely, then let pi = P(day i is a birthday). Corollary (The Birthday Problem) We need j = 1 day to appear at least k = 2 times.

E ( N shared )

365 0 i 1

(1 pi t ) exp( t )dt

Corollary (The Coupon Problem) We need all letters to appear at least once.
365 i 1

E ( N all )

(1 exp( pi t )) dt

Application to Birthdays
What is the expected number of people needed so that 2 people share a birthday? Mathematica gives E(Nshared) = 24.6166, which assumes each day is equally likely. Note E(Nall) = 2364.65. What is the expected number of people born in 1978 needed so that 2 people share a birthday? Mathematica gives E(Nshared) = 24.5262 and note E(Nall) = 2435.14. Plot of Julian Day vs. Proportion of births on that day for 1978. Which days does the lower band represent?
Data Source: Todd Swansons Home Page: http://www.math.hope.edu/swanson/da ta/birthdays.txt

Pangrammatic Windows
The Spirit dropped beneath it, so that the extinguisher covered its whole form; but though Scrooge pressed it down with all his force, he could not hide the light: which streamed from under it, in an unbroken flood upon the ground. He was conscious of being exhausted, and overcome by an irresistible drowsiness; and, further, of being in his own bedroom. He gave the cap a parting squeeze, in which his hand relaxed; and had barely time to reel to bed, before he sank into a heavy sleep. AWAKING in the middle of a prodigiously tough snore, and sitting up in bed to get his thoughts together, Scrooge had no occasion to be told that the bell was again upon the stroke of One. He felt that he was restored to consciousness in the right nick of time, for the especial purpose of holding a conference with the second messenger dispatched to him through Jacob Marley's intervention.

This text is from Charles Dickens A Christmas Carol. The blue portion is a pangrammatic window, i.e., it contains each letter of the alphabet at least once. There are 679 letters in color. The search started with The Spirit and the window could be shortened by dropping letters from the beginning.

Pangrams in A Christmas Carol


Letter Frequencies in Christmas Carol
a b c d e 9308 1943 3035 5674 14850 2433 2979 8368 8294 113 1031 4553 2840 7960 9690 0.076892 0.016051 0.025072 0.046872 0.122674 0.020099 0.024609 0.069127 0.068515 0.000933 0.008517 0.037612 0.023461 0.065756 0.080048

Well search A Christmas Carol for pangrams by selecting random starting positions. Then we compare this to independently generated letters using the letter frequencies of this novel. The counts and the proportions are listed to the right. Of course, letters are not independent, but the question is this: How does the actual pangram lengths differ from the simulated independent pangram lengths?

f g h i j k l m n o

p
q r s t

2119
97 7031 7900 10869

0.017505
0.000801 0.058082 0.065261 0.089787

u
v w x y z

3335
1022 3096 131 2298 84

0.02755
0.008443 0.025576 0.001082 0.018983 0.000694

Pangram Lengths
The left histogram shows lengths of pangrams found in A Christmas Carol using random starting points. The right histogram shows lengths of pangrams found in a simulated string of independent letters using the proportions found in A Christmas Carol.

N = 1000

N = 1000 Theoretical mean = 2473.8

Note the long right tail

Figures from Bilisoly (2009)

Concordancing words with the letter z reveals


out after dark in a breezy spot--say Saint Pau d-stone, Scrooge! a squeezing, wrenching, graspin The cold within him froze his old features, n e court outside, go wheezing up and down, beatin 'em through a round dozen of months presented e chattering in its frozen head up there. The d a great fire in a brazier, round which a part eir eyes before the blaze in rapture. The wat Scrooge seized the ruler with such n the gloom. Half-a-dozen gas-lamps out of th her-beds, Abrahams, Belshazzars, Apostles putting ring at those fixed glazed eyes, in silence fo the vision's stony gaze from himself. from other regions, Ebenezer Scrooge, and is con pe of my procuring, Ebenezer." Exchange pay to Mr. Ebenezer Scrooge or his orde st have sunk into a doze unconsciously, and er a long way below freezing; that he was clad b The Spirit gazed upon him mildly. It

"Why, it's old ess his heart; it's o Old "Yo ho, there! ho, my boys!" said e, Dick. Christmas, ters up," cried old illi-ho!" cried old -ho, Dick! Chirrup, ared away, with old aches. In came Mrs. came the three Miss brought about, old overley." Then old to dance with Mrs. ah, four times--old , and so would Mrs. eared to issue from next. And when old d Fezziwig and Mrs. gain to your place; ke up. Mr. and Mrs. hearts in praise of

Fezziwig! Bless his heart; i Fezziwig alive again!" Fezziwig laid down his pen, Ebenezer! Dick!" Fezziwig. "No more work to-n Ebenezer! Let's have the shu Fezziwig, with a sharp clap Fezziwig, skipping down from Ebenezer!" Fezziwig looking on. It was Fezziwig, one vast substanti Fezziwigs, beaming and lovabl Fezziwig, clapping his hands Fezziwig stood out to dance Fezziwig. Top couple, too; w Fezziwig would have been a m Fezziwig. As to her, she was Fezziwig's calves. They shon Fezziwig and Mrs. Fezziwig h Fezziwig had gone all throug Fezziwig "cut"--cut so deftl Fezziwig took their stations Fezziwig: and when he had do

Middle section has 43 of the 84 zs, but represents only 3 of 83 pages of Dickens (1986).

luence over him, he e the cap a parting ore and centre of a ore alarming than a ; and such a mighty , half thawed, half ught fire, and were anding his gigantic chit, kissing her a erness and flavour, , so hard and firm, e flickering of the g grew but moss and of endeavouring to relents," she said, grave his own name, er they've sold the re?--Not the little it. It's twice the e passed the door a

seized the extinguisher-ca squeeze, in which his hand blaze of ruddy light, whi dozen ghosts, as he was p blaze went roaring up the frozen, whose heavier part blazing away to their dear size, he could accommoda dozen times, and taking o size and cheapness, were blazing in half of half-a-q blaze showed preparations furze, and coarse rank gr seize you, which would ha amazed, "there is! Nothing EBENEZER SCROOGE. prize Turkey that was han prize Turkey: the big one size of Tiny Tim. Joe Mi dozen times, before he ha

Pangram Lengths: Fezziwig Effect

N = 1000 Actual pangrams.

N = 1000 Simulated pangrams. Theoretical Mean = 3620.5

Endpoint with Fezziwig

100,000 Simulated Pangram Lengths

Best fit lognormal distribution shown.

References
Roger Bilisoly (2008a). Anasquares: Square Anagrams of Squares. Mathematical Gazette, 92, 58-63. Roger Bilisoly (2008b). Practical Text Mining with Perl, Wiley. Roger Bilisoly (2009). Two Language-based Examples for Use in the Statistics Classroom. American Statistical Association Proceedings of the Joint Statistical Meetings, Section on Statistical Education. Gunnar Blom, Lars Holst, and Dennis Sandell (1993). Problems and Snapshots from the World of Probability, Springer. W. E. Deskins (1964). Abstract Algebra, MacMillan. Charles Dickens (1986). A Christmas Carol, Bantam. Philippe Flajolet, Daniele Gardy, and Loys Thimonier (1992). Birthday Paradox, Coupon Collectors, Caching Algorithms and Self-Organinzing Search. Discrete Applied Mathematics, 39, 207-229. Walter Penney (1960). On the Final Digits of Squares. The American Mathematical Monthly, Vol. 67, No. 10, pp. 1000-1002. Walter Stangl (1996). Counting Squares in Zn. Mathematics Magazine, Vol. 69, No. 4, pp. 285189. Kenneth Williams (1995). "Some Refinements of an Algorithm of Brillhart," Canadian Mathematical Society Conference Proceedings, Volume 15, 409-416. Available at http://www.math.carleton.ca/~williams/papers/pdf/202.pdf.

Web References
Benfords Law
http://mathworld.wolfram.com/BenfordsLaw.html http://en.wikipedia.org/wiki/Benford's_law http://www.asahi-net.or.jp/~KC2H-MSM/mathland/math02/math0203.htm

Squares with 3 distinct digits Counterexample to Ed Pegg

http://mathworld.wolfram.com/Baxter-HickersonFunction.html
http://cryptogram.org/ http://www.puzzlers.org/ http://icon.shef.ac.uk/Moby/ http://oeis.org/A177952. http://www.math.hope.edu/swanson/data/birthdays.txt http://wordways.com/

American Cryptogram Association National Puzzlers Association

Moby Word Lists


Anasquare counts 1978 birthday data Word Ways

Wordplay References

Tony Augarde (1994). The Oxford A to Z of Word Games, Oxford. Tony Augarde (2003). The Oxford Guide to Word Games, Oxford. o Has historical information.
Dmitri Borgmann (1967). Beyond Language, Scribners.

Ross Eckler (1979). Word Recreations, Dover. o Most examples originally appeared in Word Ways. Ross Eckler (1996). Making the Alphabet Dance, St. Martin's. o Most examples originally appeared in Word Ways. Dave Morice (1997). Alphabet Avenue, Chicago Review Press. Dave Morice (2001). The Dictionary of Word Play, Teachers and Writers Collaborative. Warren F. Motte, Jr. (1998). Oulipo: A Primer of Potential Literature, Dalkey Archive. o Oulipo stands for Ouvroir de Litterature Potentielle, which is a group of writers, mathematicians, and other people interested in literary structures.

The Key Wordplay Resource: Word Ways: A Journal of Recreational Linguistics

Established by Dmitri Borgmann in 1968


o

He is author of Language on Vacation and Beyond Language

Bought by A. Ross Eckler, Jr. in 1968. He was editor and publisher from 1968-2006.
o o o

PhD in mathematics from Princeton, 1954 Worked at Bell Labs, 1954-84 Published Word Recreations (1979), Names and Games: Onomastics and Recreational Linguistics (1986), Making the Alphabet Dance (1996)

Current editor is Jeremiah Farrell, professor emeritus of mathematics at Butler University

Online at http://wordways.com/

Open question: What are the upper and lower bounds of this plot? Points are squares in base 10 with 12 or less digits. This is Figure 2 of Bilisoly (2008a).

Brillhart Alogithm (See Slide 18)

Let us generalize the birthday problem.


Let represent an alphabet of size na.
For birthdays let = {d1, d2,, d365}, so na equals 365. Let pi = P(di occurs), so that each day need not be equally likely.

Define Njk = number of letters drawn from (with replacement) so that there are j distinct letters that each appear at least k times. Let ek(t) = kth order Taylor series expansion of exp(t). Theorem 1 of Flajolet, Gardy and Thimonier (1992) states: Product of 1st degree
polynomials in x

j 1

E ( N jk )
l 0

[x ]

na i 1

(ek 1 ( pi t ) x(exp( pi t ) ek 1 ( pi t ))) exp( t )dt

Corollary (The Birthday Problem) We need j = 1 day to appear at least k = 2 times. Note that the sum has only one term, and N12 = Nshared.

E ( N12 )

365 0 i 1

(1 pi t ) exp( t )dt
See Corollary 1 of Flajolet et al. (1992)

Generalized Coupon Collectors Problem


Theorem 2 of Flajolet, Gardy and Thimonier (1992) The expected number of letters drawn to get the complete alphabet, , is given below. Their proof follows fairly easily from Theorem 1.

N all

na i 1

(1 exp( pi t )) dt

For uniformly likely birthdays, 2364.65 people are needed on average to get all 365 days to appear. For 1978, we expect to need 2435.14 people.
Pangrams have = {a, b, c, , z}, na = 26, and pi determined by frequencies found in a text sample.

Example of Mathematica 8 code to find 14 letter words with no multiple edges and diameter = 2.

Squares Having only Three Distinct Digits: Investigated by Hisanori Mishima


Largest known sporadic example: 81401637345465395512991484^2 = 6626226562522666562566262626266252566552622656522256 However, there are an infinite number of patterned 3-digit squares.

97 997

9409 994009

1235 12335 123335

1525225 152152225 15211522225

9997
99997 999997

99940009
9999400009 999994000009

1233335
12333335

1521115222225
152111152222225

See http://www.asahi-net.or.jp/~KC2H-MSM/mathland/math02/math0203.htm

How many sporadic solutions?

Assume that digits of squares are independent.


Binomial[10,3] (3/10)^n P(n-digit square with 3 distinct digits) 10^(n/2)(1 - 1/Sqrt[10]) number of n-digit squares Expected # n-digit squares with 3 distinct digits Binomial[10,3] (3/10)^n * 10^(n/2)(1 - 1/Sqrt[10]) = constant * (3/10)n But n (3/10)n converges, which suggests a finite # of solutions. Sum = 360(1-1/10)(3+10) 1517.

However, the analogous argument for squares with 4 distinct digits results in n (4/10)n, which diverges.

Three Examples of 150-digit Squares with exactly 9 distinct digits.


This square has no 1s. 590286760507408218847058025821601275020644462041449539546951025992081988403 ^2 = 348438459630330307258664735742002640854590982059075240585330803623545707923 270640840935682702648690975443382535993405344806539300344597863650226490409

This square has no 8s. 705635480731670264258949343062158505097813112879657762505544377338770578675 ^2 = 497921431667415396779522503575462537630442531656110964327039001763590201921 651000116765556629356041206321334237073176796236102099130225925794364755625

This square has no 7s. 624228579548317386188320909320329013264935555613585034560249848356296289668 ^2 = 389661319524910006943311531238425925016482192128500480080280569984852381981 266389000443336616268805696010466681530390083280245809868986959183363550224

Squares with 9 Distinct Digits


How common is an n-digit square with only 9 distinct digits? Binomial[10,9] (9/10)^n probability of square with 9 digits For n = 150, this gives 1.36891 E-6 1 in a million. Hence a computer program checking 5,000,000 random 150-digit squares should find 5,000,000*1.36891 E-6 = 6.84 such squares. This was done 30 times, and the counts are given in the histogram.

Mean = 7.27 SD = 2.65

Ed Peggs Failed Conjecture


In April of 1999, on sci.math, Ed Pegg conjectured that there are only finitely many cubes without the digit 0. D. Hickerson found a counterexample and a few days later Lew Baxter found the example given below.
baxter[n_] := (2 10^(5 n)-10^(4 n)+2 10^(3 n)+10^(2 n)+10^n+1)/3 Do[Print[{baxter[i],baxter[i]^3}],{i,1,5}] {64037, 262598918898653} {6634003367, 291962492648791178822648631863} {666334000333667, 295852962482593148779111778815593148629851963} {66663334000033336667, 296251862962481592598148777911117778814892598148629651852963} {6666633334000003333366667, 296291851962962481492592648148777791111177778814822592648148629631851862963}

Function given at http://mathworld.wolfram.com/Baxter-HickersonFunction.html

Miscellaneous Example: Number Words and Numbers Graph


Let A = 1, B = 2, C = 3, ..., Z = 26. Let s(word) = Sum of its alphabetic values o Example: s(bad) = 2 + 1 + 4 = 7 Let nn(number) = its number name o Example: nn(3) = three Consider the dynamical system of composing s and nn o That is, iterate n -> nn(n) -> f(nn(n)) -> nn(f(nn(n)), etc. o Example: 1, 34, 160, 205, 174, 278, 291, 253, 254, 258, 247, 281, 240, 216, 228, 288, 255, 240 1 becomes a 5-cycle, so what else can happen? o Answer first published by Dmitri Borgmann in 1967 in Beyond Language.

Vous aimerez peut-être aussi