Howto

-- DOCUMENT IN DEVELOPMENT --
PROCESSES TO
DO INFLECTIONS
PREPARE DICTIONARY ADDITIONS
UPGRADE LATIN DICTLINE
CHECK LATIN DICTLINE
MAINTAIN LATIN DICTLINE
CHECK DICTLINE FOR ENGLISH SPELLING
GENERATE WORDS SYSTEM
PREPARE LATIN DICTIONARY PHASE
PREPARE ENGLISH DICTIONARY PHASE
OTHER FORMS OF DICTIONARY
DICTPAGE
Like a paper dictionary
LISTALL
All words that DICTLINE and INFLECTS can generate
For spellcheckers
Will not catch ADDONS and TRICKS words
TOOLS
CHECK.ADB
DUPS.ADB
DICTORD.ADB
FIXORD.ADB
LINEDICT.ADB
LISTORD.ADB
DICTPAGE.ADB
DICTFLAG.ADB
INVERT.ADB
INVSTEMS.ADB
ONERS.ADB
CCC.ADB
SLASH.ADB
PATCH.ADB
SORTER.ADB
------------------- DO INFLECTIONS ----------------------
INFLECTS.LAT contains the inflections in human-readable form
with comments, and in useful order.
This is the input for MAKEINFL, which produces INFLECTS.SEC.
(LINE_INF uses INFLECTS.LAT input to produce INFLECTS.LIN,

clean and ordered, but still readable.
Run
LINE_INF
which produces
INFLECTS.LIN
and INFLECTS.SEC)
----------------------------------------------------------
------------PREPARE DICTIONARY ADDITIONS----------------
----------------------------------------------------------
This process is to prepare a submission of new dictionary entries
for inclusion in DICTLINE. The normal starting point is a text file
in DICTLINE (LIN) form, the full entry on one line, spaced appropriately.
The other likely form is an edit file (ED) in which the entry is broken
into three lines
STEMS
PART and TRAN
MEAN
For this form, spacing is not important, as long as there are spaces
seperating individual elements.
This is transformed into LIN form by the program LINEDICT
LINEDICT.IN (ED form) -> LINEDICT.OUT (LIN form)
The inverse of this, LIN to ED, is useful to produce a more easily

editable file (3 lines per entry so it is all on one screen)
LISTDICT.IN (LIN DICTLINE form) -> LISTDICT.OUT (ED form)
Having a LIN form, one can create a DICTLINE.SPE and do checking on that.
Besides running CHECK to validate syntax, one can run DICTORD and create
a file in which leading words are in dictionary entry form. One can then
run this against the existing WORDS and DICTLINE to check for overlap.
DICTORD makes # file in long format
DICTORD.IN -> DICTORD.OUT
Takes DICTLINE form, puts # and dictionary form at begining.
This file can be sorted to produce word order of paper dictionary.
SORTER on (1 300) (with or without U for I/J U/V conversion)
One can then run WORDS against this file using DEV (!) parameters
DO_ONLY_INITIAL_WORD and FOR_WORD_LIST_CHECK,
and (#) parameters
HAVE_OUTPUT_FILE, WRITE_OUTPUT_TO_FILE, WRITE_UNKNOWNS_TO_FILE
The output provides for a check whether the new submissions
are duplucated in the existing dictionary, and even if the forms are
are the meanings the same.
After editorial review in light of the WORDS run, the new submission
is ready for inclusion by the usual process with CHECK and SPELLCHECK.
----------------------------------------------------------
----------------UPGRADE DICTIONARY ----------------------
----------------------------------------------------------
This is a variation of the additions process.
This process is to prepare a section of DICTLINE for upgrade.
A section (aboout 100 entries) is extracted and ordered alphabetically
It is then put in a form for convenient editing and compared to
the OLD and L+S. Entries are checked and additions are made.
The edit form is returned to DICTLINE form and inserted in
place of the extracted section.
Much the same process is involved in preparing an independent submission
of new entries.
DICTORD makes # file in long format

DICTORD.IN -> DICTORD.OUT
Takes DICTLINE form, puts # and dictionary form at begining,
a file that can be sorted to produce word order of paper dictionary
SORTER on (1 300)
LISTORD Takes # (DICTORD) long format to ED file
(3 lines per entry so it is all on one screen)
LISTORD.IN -> LISTORD.OUT
Edit
FIXORD produces clean ED file

LINEDICT makes long format (LINE_DIC/IN/OUT)
----------------------------------------------------------
-------ADDING A BLOCK OF NEW ENTRIES TO DICTIONARY -------
----------------------------------------------------------
This may be in association with the upgrade process or from
a block of new entries submitted by a developer or user.
The format may be strange. It is usually easiest to reduce/edit
it down ro the 3 line ED form, because that has no column restrictions.
From there one does the usual, making LINEICT format and preparing the addition.
One quirk is that there may be entries duplicate of the current DICTLINE.
This is so even if the supplier was working from and checking his current DICTLI
NE,
because there may have been later additions to the master.
While DUPS will catch these, that is a big effort for a full DICTLINE.
One would rather check just the new input.
Take the input and DICTORD. This gives a format with the dictionary entry
word first. Run the current WORDS aginst that with NO FIXES/TRICKS and
FIRST_WORD and FOR_WORDLIST parameters. And not UNKNOWN in the output
should be examined.
Then run CHECK and spellcheck the English.
----------------------------------------------------------
------------PREPARE DICTIONARY (DICTLINE) WITH ADDITIONS-----------
----------------------------------------------------------
Save present copies of DICTLINE.GEN, DICTLINE.SPE, DICT.LOC,
and whateverelse, in case you foul up and have to redo.
Add DICT.LOC to DICTLINE.GEN
Copy DICT.LOC LINEDICT.IN
Run LINEDICT
Copy LINEDICT.OUT+DICTLINE.GEN DICTLINE.NEW
Or if there is a SPE that you want to integrate
COPY DICTLINE.GEN+DICTLINE.SPE DICTLINE.NEW
Or any other and combiination.
Sort DICTLINE.NEW in the normal fashion (to check for duplicates)

SORTER
DICTLINE.NEW -- Or whatever you call it
1 75 -- STEMS
77 24 P -- PART
111 80 -- MEAN -- To order |'s
101 10 -- TRAN
DICTLINE.SOR -- Where to put result
Check the sort for oddities and any blank lines.
(Look for long/run-on lines.)
Then run CHECK and examine CHECK.OUT
Run
CHECK
to produce
CHECK.OUT
Examine CHECK.OUT and make any corrections required
(The easiest way is to edit CHECK.IN and rerun as necessary.
Then copy the final CHECK.IN to DICTLINE.)
Errors are cites by line number in CHECK.IN.
Edit examining CHECK.OUT from the bottom, so that changes do not
affect the numbering of the rest of CHECK.IN
CHECK is very fussy. The hits are primarily warnings to look for
the possiblity of error. Most will not be wrong. In fact, over
one percent of correct lines will trigger some warning, more false
positives than real errors.
This make a full run and edit of DICTLINE a considerable burden.
Sort the fixed CHECK.IN again if there have been any changes in order.
Check for duplicates in columns 1..100
(DUPS checks for '|' in column 111 so that it does not give
hits on lines known to be continuations, provided the sort is in order.)
COPY CHECK.IN DUPS.IN
Run DUPS
1 100
Examine DUPS.OUT and fix DUPS.IN (again from the bottom).
Resort if necessary.
Copy the final product to DICTLINE.GEN
This only checks DICTLINE for syntax,
----------------------------------------------------------
----------CHECK DICTLINE FOR ENGLISH SPELLING-------------
----------------------------------------------------------
To check DICTLINE further, one can check the spelling of MEAN.
The fixed format of DICTLINE facilitates this process.
Just running DICTLINE through a spellchecker is impossible,
since all lines contain Latin stems, which will fail not only
an English spellchecker, but a Latin spellchecker as well
(since they are just stems, not proper words).
The process is to extract the MEAN portion, spellcheck this,
and reassemble, making sure to preserve the exact line order.
I use two personal tools, SLASH and PATCH.
Run SLASH on DICTLINE
SLASH takes a file and cuts it into two, lines or columns.
In this case we want to separate the first 110 columns from the rest.
SLASH
c -- Rows or columns
110 -- How many in first
LEFT. -- Name of left file
RIGHT. -- Name of right file
-- Or whatever you want to call them
Save LEFT for later and work on RIGHT, which is only MEANs.
There is one additional complication.
Some MEANs have a translation example element [... => ...]
This will contain some Latin (the left half) as well as English.
The rest I do with editors, but I suppose I should make tools.
Introduce 80 blanks in front of any [
SLASH out the first 80 columns, giving the MEAN omitting the []
Spellcheck that
In the [] file, left justify and add 80 blanks before the =
SLASH out the first 80 columns and spellcheck
Reassemble the three parts of MEAN
Eliminate blanks, leaving a simple MEAN/RIGHT.
PATCH LEFT. and RIGHT together to give DICTLINE.
___________________________________________
To Prepare English Dictionary
__________________________________________
The first part of the following procedure is only for those
starting from scratch. If porting with a full package,
EWDSLIST.GEN will be provided and you can skip down.
---------------------------------------------------------
Preparing the dictionary for the English mode also
involves checks on the syntax of MEAN.
Run MAKEEWDS against DICTLINE.GEN
(There may be some errors cited. Correct as appropriate.)
This extracts the English words from DICTLINE MEAN (G or S)
Makes EWDSLIST.GEN (or .SPE)
Make sure that if running from DICTLINE.GEN that the extra ESSE line
is added. If we start from DICTFILE.GEN, it is already in.
type EWDS_RECORD is
record
W : EWORD; 1
AUX : AUXWORD; 40
N : INTEGER; 50
POFS : PART_OF_SPEECH_TYPE := X; 62
end record;
Ah 1 INTERJ
Aulus 2 N
Roman 2 N
praenomen 2 N
abbreviated 2 N
__________________________________________________
Sort EWDSLIST making a revised version (same name)

1 24 A
1 24 C
51 6 R
75 2 N D
(Run ONERS on ONERS.IN if you want to see FREQ)

(Sort ONERS.OUT 1 11 D; 13 99)
_____________________________________________________
If you are supplied with EWDSLIST.GEN as part of a port package,
the above process is not done.
_____________________________________________________
Run MAKE_EWDSFILE against EWDSLIST.GEN

(This also removes some duplicates, entries in which the
key word appears more than once.)
producing EWDSFILE.GEN
(At present these will act to produce a EWDSFILE.SPE, but
WORDS is not yet setup to use that - only English on GEN for now.)
----------------------------------------------------------
------------PREPARE WORDS SYSTEM-------------------------
----------------------------------------------------------
If using GNAT, otherwise compile with your favorite compiler
gnatmake -O3 words
gnatmake -O3 makedict
gnatmake -O3 makestem
gnatmake -O3 makeewds
gnatmake -O3 makeefil
gnatmake -O3 makeinfl
This produces executables (.EXE files) for

WORDS
MAKEDICT
MAKESTEM
MAKEEWDS
MAKEEFIL
MAKEINFL
(You may also need my SORTER to prepare the data if you are modifing data.
gnatmake -O3 sorter)
(If you have modified DICTLINE, SORTER sort
1 75 -- STEMS
77 24 P -- PART
111 80 -- MEAN
101 10 -- TRAN
Actually the order of DICTLINE is not important for the programs;
it is only a convenience for the human user.)
Run MAKEDICT against the DICTLINE.GEN - When it asks for dictionary, reply G f
or GENERAL
This produces DICTFILE.GEN
("against" means that the data file and the program are in the same folder/subdi
rectory.)
(This assumes that you are using the presorted STEMFILE.GEN
which comes with distribution and matches that DICTLINE.GEN.
Otherwise make and run WAKEDICT (Identical to MAKEDICT with
PORTING parameter set in source). This produces DICTFILE.GEN
and a STEMLIST.GEN, which has to be sorter by SORTER.
MAKE ABSOLUTELY SURE YOU ARE USING THE RIGHT MAKEDICT/WAKEDICT!
Invoke SORTER to sort the stems with I/J and U/V equivalence
and replace initial STEMLIST with the sorted one.
SORTER
STEMLIST.GEN -- Input
1 18 U
20 24 P
1 18 C
1 56 A
58 1 D
STEMLIST.GEN -- Output
The output file is also STEMLIST.GEN - Enter/CR for the name works.)
(All SORTER parameters are based on the layout of WORDS 1.97E.
Later versions may have further/expanded fields.)
Run MAKESTEM against STEMLIST.GEN (with dictionary "G") produces STEMFILE.GEN an
d INDXFILE.GEN
The same procedures can generate DICTFILE.SPE and STEMFILE.SPE (input S)
if there is a SPECIAL dictionary, DICTLINE.SPE
For the English part, if you use the presorted EWDSLIST.GEN

run MAKEEFIL aginst it.
(This assumes that you are using the presorted EWDSLIST.GEN
which comes with distribution and matches that DICTLINE.GEN.
Otherwise make and run MAKEEWDS against DICTLINE.GEN
This produces EWSDLIST.GEN which has to be sorted by SORTER.
Check the begining of EWDSLIST with an editor.
If there are any strange lines, remove them.
Invoke SORTER. The input file is EWSDLIST.GEN.
The sort fields are
SORTER
EWDSLIST.GEN
1 24 A -- Main word
1 24 C -- Main word for CAPS
51 6 R -- Part of Speech
72 5 N D -- RANK
58 1 D -- FREQ
EWSDLIST.GEN -- Store
The output file is also EWDSLIST.GEN - Enter/CR for the name works.)
(For this distribution, there is no facility for English from a SPECIAL dictiona
ry -
there is no D_K field yet)
Run MAKEEFIL against the sorted EWDSLIST.GEN producing EWDSFILE.GEN
Run MAKEINFL against INFLECTS.LAT producing INFLECTS.SEC

Along with ADDONS.LAT and UNIQUES.LAT,
this is the entire set of data for WORDS.
WORDS.EXE
INFLECTS.SEC
ADDONS.LAT
UNIQUES.LAT
DICTFILE.GEN
STEMFILE.GEN
INDXFILE.GEN
EWDSFILE.GEN
-- And whatever .SPE as appropriate
(If you go through the process and have a working WORDS but it
gives the wrong output, the most likely source of error is
a missing or improper sort.)
--------------------------------------------------------------
Viewing WORD.STA
A view to see what ADDONS and TRICKS were used
Sort WORD.STA on
1 12 -- The STAT name
55 25 -- STAT details
32 20 -- Word in question
16 10 -- Line number
------------------------------------------------------------------
------------------PREPARING DICTPAGE------------------------------
------------------------------------------------------------------
Preparing DICTPAGE, the listing as of a paper dictionary.
IMPORTANT NOTE
During the process, you may find it useful to edit some entries. Feel free to d
o so.
But remember that you have to keep the separate files (.TXT) and reassemble at t
he end
into a new DICTLINE.
For a release, ideally DICTPAGE is done before the final DICTLINE,

because in the process there may be some editing of entries.
To first order, this is accomplished by running DICTPAGE
against DICTLINE, producing a listing of DICTLINE with each
entry preceeded by # and the DICTIONARY_FORM.
DICTPAGE is a simple modification of DICTORD to produce a
more readable output.
Some polishing of this process gives a better product.
Extracting a few groups of entries for special handling
will simplify the process.
1) Use the regular DICTLINE sort.

Those entries with first stem zzz may give an output
which sorts to #-. But it is likely the second term which
you want to represent this entry. For this and other reasons
these entries will require some hand editing, so extract them
from their place at the end of the regular DICTLINE, run DICTPAGE
on them, sort output on full line, and process seperately.
(About 30 entries, but half handled completely by DICTPAGE)
It is likely that this set has not changed much since the last run,
so check to see if you have to do it over.
2)Sort remaining DICTLINE on (77, 8), (110, 80), (1, 75). Extract ADJ 2 X.
Many Greek adjectives are handled in DICTLINE in two or three parts
(ADJ 2, X by gender. The full declension is the
sum of these partials. (The Greek adjective form 3 6 is handled in the
regular process and does not have to be extracted.) Extract these ADJ declension
s
from a sort of DICTLINE by PART. Sort this output on stem and meaning to group
the constituent parts, run DICTPAGE and polish by hand edit to make
a single paper entry from the parts. (About 150 entries, half that
after editing, not too hard, but a program could do the modification.)
It is very likely that this has not changed.
3)The qu-/aliqu- PRONOUN/PACKON (PRON/PACK 1) are yet more complicated
than the Greek adjectives, and are handled in the same manner.
Extract them, sort on meaning, DICTPAGE, and polish output by hand.
Also PRON 5 (only 8 of these). Both of these are sufficiently
unchanging that one could archive the final edit and reuse on a later run.
4)The rest are automatically done by DICTPAGE.
5)UNIQUES are a special case, handled by UNIQPAGE. This processes UNIQUES.LAT
(as UNIQPAGE.IN) into a raw form compatible with the regular PAGE material
(UNIQPAGE.OUT which is copied into UNIQPAGE.pg), added to, and sorted with.
The various phases are assembled into a whole and sorted on the lead,
producing DICTPAGE.RAW
DICTPAGE.RAW is ZIPped to provide a source for others to process for their purpo
ses.
DICTPAGE.RAW is processes herein by PAGE2HTM to give (withthe addition of PREAMB
LE.txt
and an end BODY) to give the presentation form DICTPAGE.HTM
The process:
First do a SORT of DICTLINE on STEM to find zzz stems
SORTER
DICTLINE.GEN -- Or whatever
1 75 -- STEMS
77 24 P -- PART
DICTLINE.TXT -- Where to put result
Extract the zzz stems from the end of the file into ZZZ.TXT leaving DICTLINE.NOZ
Sort these
SORTER
ZZZ.TXT
77 24 P -- PART
1 75 -- STEMS
101 10 -- TRAN
ZZZ.TXT -- Where to put result
Extract the PRON 5 to a PRON5.TXT -- More to come
Now sort the rest

SORTER
DICTLINE.NOZ
77 24 P -- PART
1 75 -- STEMS
101 10 -- TRAN
DICTLINE.NOZ -- Where to put result
Now extract from DICTLINE.NOZ the remaining PRON 5, the Greek adjectives,
and the qui/alqui PRON/PACK 1, giving
ZZZ.TXT
GKADJ.TXT
PRON1.TXT
PRON5.TXT
After those are removed, the remaining is REST.TXT.
Run DICTPAGE on each of these 5 files

(Copy them to DICTPAGE.IN, run DICTPAGE, copy DICTPAGE.OUT to the appropriate fi
le .PG)
----------------ZZZ
Process the remaining (less PRON 5) ZZZ.TXT with DICTPAGE
(Copy ZZZ.TXT to DICTPAGE.IN, run DICTPAGE, copy DICTPAGE.OUT to ZZZ.PG)
Most of them will be handled. Hand edit the rest.
Some should be expanded (archaic forms in one stem need to be filled out).
Some should be modified (e.g., the plurals).
Some should be trimmed (adjectives with no positive).
There are some kludges (artificial entries which generate irregular forms)
here. Some may just be excluded from the .PG .
----------------GKADJ
Sort GKADJ to get the various parts together for a multiple entry
SORTER
GKDAJ.TXT
1 75 -- STEMS
101 10 -- TRAN
77 24 P -- PART
GKADJ.TXT -- Where to put result
Run DICTPAGE and edit. This edit is straightforward but tedious.
I should prepare a procedure to do this automatically, but have not yet.
It is likely that there are few or no changes
from the previous run and those results can be used/modified.
The product is GKADJ.PG

----------------PRON1
This must be hand edited. However it may not change much between versions.
----------------PRON5
Very small.
----------------UNIQUES
UNIQUES are treated by UNIQPAGE.EXE, giving UNIQPAGE.PG
----------------
----------------
The resulting files (with extensions appropriate to the phase of the operation,
ending in .PG) are
GKADJ
PRON1
PRON5
REST
UNIQPAGE
ZZZ
----------------FINISH
Assemble the 6 .PG files to DICTPAGE.PG and sort to produce DICTPAGE.RAW
SORTER
DICTPAGE.PG
1 300 C -- Everything
1 300 A -- For Caps
DICTPAGE.RAW -- Where to put result
Then process with PAGE2HTM ans add PREAMBLE.TXT at begining and end BODY at end
to get DICTPAGE.HTM
---------------------------------------------------------------------
------------------------------------------------------------------
----------------------THE SHORT FORM------------------------------
------------------------------------------------------------------
------ SORT DICTLINE
SORTER
DICTLINE.GEN
1 75 -- STEMS
77 24 P -- PART
101 10 -- TRAN
DICTLINE.GEN -- Where to put result
WAKEDICT/MAKEDICT
------ SORT STEMLIST IF NOT PROVIDED
SORTER
STEMLIST.GEN -- Input
1 18 U
20 24 P
1 18 A
1 56 C
STEMLIST.GEN -- Output
MAKESTEM
MAKEEWDS
------ SORT EWDSLIST
SORTER
EWDSLIST.GEN
1 24 A -- Main word
1 24 C -- Main word for CAPS
51 6 R -- Part of Speech
72 5 N D -- RANK
58 1 D -- FREQ
EWSDLIST.GEN -- Output
MAKEEFIL

Howto

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Howto

Transféré par

Droits d'auteur :

Formats disponibles

-- DOCUMENT IN DEVELOPMENT --

(LINE_INF uses INFLECTS.LAT input to produce INFLECTS.LIN,

The inverse of this, LIN to ED, is useful to produce a more easily

DICTORD makes # file in long format

FIXORD produces clean ED file

Sort DICTLINE.NEW in the normal fashion (to check for duplicates)

Sort EWDSLIST making a revised version (same name)

(Run ONERS on ONERS.IN if you want to see FREQ)

Run MAKE_EWDSFILE against EWDSLIST.GEN

This produces executables (.EXE files) for

For the English part, if you use the presorted EWDSLIST.GEN

Run MAKEINFL against INFLECTS.LAT producing INFLECTS.SEC

A view to see what ADDONS and TRICKS were used

For a release, ideally DICTPAGE is done before the final DICTLINE,

1) Use the regular DICTLINE sort.

Now sort the rest

Run DICTPAGE on each of these 5 files

The product is GKADJ.PG

Vous aimerez peut-être aussi