Académique Documents
Professionnel Documents
Culture Documents
==============
By Przemyslaw Czerpak (druzus/at/priv.onet.pl)
Hi All,
I collected this text from notes I created when I was analyzing
Clipper PP and then I was updating it in few places. Sorry but
I do not have enough energy to check it and update. It's much
shorter then I planed and it does not contain many important
things I encoded in new PP code. Sorry you will have to look at
new code because now I do not want to think about PP any more.
After last days I hate PP and I'd be very happy if I could forget
about it for at least few days. I spend much more time on it then
I planed and I'm really frustrated with the brain off job I was
making in last days.
----------------------------------------------------------------------------1. Clipper's PP is a lexer which divides source code into tokens
and then operates on these tokens and not on text data. This is the
main reason why current [x]Harbour PP cannot be Clipper compatible.
Tokenization is the fundamental condition which implicates a lot of
Clipper PP behavior and as long as we do not replicate it then we
will never be able to be Clipper compatible them.
Even such simple code cannot be well preprocessed and compiler by
current [x]Harbour PP:
#define a -1
? 1-a
and it cannot be fixed with current code without breaking some
other things, f.e. match markers which depends on number of spaces
between tokens. So at start we have to forget about updating current PP.
It will never be Clipper compatible and cannot be because it's not a
lexer.
2. During dividing input data to tokens and later in finding match patterns
Clipper PP always try to allocate the biggest possible set of input data
as a given type even if it can break some possible other method of
input data serialization. This can be seen in wild match marker <*marker*>
behavior or optional clause in match pattern, operator tokenization, etc.
It greatly simplify the code though introduce some limitations,
f.e.:
#xcommand CMD [FROM] FROM <*x*> => ? #<x>
or:
#xcommand CMD <x,...> , <id> => ? #<x>
or:
#xcommand CMD <*x*> END => ? #<x>
are accepted by Clipper PP but they cannot match any line.
3. Preprocessor should extract all quoted strings and create separated
tokens from them. The string tokens contents cannot be modified later
by any rules. Quoting by [] create string tokens when it's not just
after keyword, macro or one of closing brackets: ) } ]
We will have to change it to keep working already existing extensions
like accessing string characters with [] operator so I suggest to change
this condition and not create string token when it follows also constant
value of any type - not only strings. It will be usable for scalar
classes and overloading [] operator, f.e. someone can create LOGICAL
class where:
.T.[1] => ".T.", .T.[2] => "TRUE", .T.[3] => "YES"
The opening square bracket '[' has to be closed with ']' in the same line.
Such quoting has very high priority like normal string quoting. f.e:
? [ ; // /* ]
should generate:
QOut( " ; // /* " )
This implicates one important thing: PP has to read whole physical
line from file, then convert it to tokens and if necessary (';' is the
last token after preprocessing) read next line(s).
There is also one exception to the above. When Clipper PP finds '['
character and previous token is keyword or macro then it always checks
for closing bracket and if in scanned text it will find odd numbers
of other text delimiters ('") then ignore the type of previous token
and always creates strings. This behavior breaks some valid code. F.e.
Clipper cannot compile code like:
x := a[ f("]") ] $ "test"
or:
x := a[ f( "'" ) ] $ "test"
If it find closing ']' without odd number of other text delimiters
then it creates differ token then for other opening square brackets '['
open_array_index which has differ meaning in later preprocessing
and allow to convert group of tokens inside to string by compiler.
If something is not recognized by preprocessor as string token or
open_array_index then it should never become string token. It doesn't
matter how it will be preprocessed later, f.e.:
#define O1 [
#define O2 ]
? O1 b O2
should generate:
QOut( [ b ] )
not:
QOut( " b " )
but:
#command A <x> => ? <x>
A [ b ]
generate also:
QOut( [ b ] )
and in this case Clipper compiler makes conversion to string.
It means that only at initial line preprocessing preprocessor decides
what can or cannot be string token. I think that we do not have to
exactly replicate this behavior and we should allow string conversion
also when '[' is not marked as open_array_index in final preprocessor
pass which will create string token from the group of tokens inside '['
and ']' tokens using the initial stringify condition which checks type
of token before.
In fact with new PP such operation will be done by still existing
lexer after preprocessing and converting the preprocessed token to string
which is then once again divided into tokens by FLEX or SIMPLEX. It's
redundant and because neither FLEX nor SIMPLEX are MT safe and both
have limitations like maximum line size we will not be able to fully
benefit from the new code (read below about it).
4. # directives tokenization.
In #define directive strings in result pattern cannot be quoted by [].
They always will be used as array index or (in #[x]command
and #[x]translate) as optional expression (when not quoted by '\').
Characters like [] are not allowed in #define match pattern.
Quoting by [] in #[x]command and #[x]translate match pattern
produce optional clause. The left square bracket can be quoted by \ to
disable this special meaning and in such case Clipper PP
marker was the regular one then it's converted to stringify dump otherwise
the marker type is unchanged.
When substitution is done then optional parts are repeated as many times
as the biggest number of accepted multiple matched expressions in the match
markers which are in the processed optional part. After each repeating
tokens are shifted but only if marker accepted more then one value.
This is the only one condition. The type or state of marker is unimportant.
The above shows that there is no correlation between type of match
marker and type of result marker. The type of conversion depends only
on contents of marked expression(s) and type of result marker.
Clipper does not support nested optional result patterns. I can add such
support but I do not know if it's necessary. To keep the base rules used
by Clipper PP the external optional pattern should be repeated as many
times as maximum number of repeating in one of its nested optional
patterns. It can be usable in some seldom cases for someone who knows
what will happen but IMHO in most cases it will create problems so probably
refusing such expressions is the best choice.
In optional clauses you can observe one Clipper bug I do not want to
replicate. When Clipper PP finds '[' then it will take all other tokens
until first unquoted ']'. If it finds it then preprocess tokens inside
as new result pattern but sets flag that other nested clauses are
forbidden. But when it extracts tokens for new optional result pattern
then it strips quote characters so when optional pattern is preprocessed
then all '[' tokens even properly quoted in source code will cause C2073
error. Clipper also does not respect the context of preprocessed tokens
when it looks for optional pattern so it will break restricted match
markers which contains ']' token. For me it's nothing more then to pure
implementation which should be fixed.
Some dipper tests shows also other bugs in Clipper PP when matched tokens
ends with ','.
In such case the blockify result marker does not create empty codeblock
for the last token when for all empty expressions before they does.
The same is with normal and smart stringify result markers but here it's
also yet another problem when there is more commas at the end. The last
one is converted to the string token with comma inside "," ;-)
I do not think we should replicate such behaviors though it seems to
be quite easy because they look like simple bugs which can appear in
the most trivial implementation of some conditions.
In general I think that many of Clipper PP behaviors even the documented
ones was not intentionally designed. Just simply someone in the past
created preprocessor and then the same person or probably someone else
documented - more or less precisely - some side effects and even bugs
of this implementation as expected behavior.
17. Storing real expression strings for later stringify operation in PP
output and stringify result patterns.
*
*
*
*
#define
If token(s) can be substituted then substitute
Next
While anything substituted
Do
For each token check if it match:
#[x]translate
If token(s) can be substituted then substitute
Next
While anything substituted
If anything substituted
continue
Do While 1-st token match some #[x]command pattern
substitute
EndDo
While anything substituted
Output processed token until the last one or ; token
If 1-st token is '#'
continue
Remove all tokens in the list until the last one or ; token
break
While True
EndDo
Output EOL
The above algorithm is differ then the one used by [x]Harbour and this is
the next reason why we are not Clipper compatible in substitution precedence.
This code illustrate the problem:
#define
RULE( p )
? "define value", p
#translate RULE(<p>) => ? "translate value", <p>
#command RULE(<p>) => ? "command value", <p>
#define
DEF( p )
RULE( p )
#translate TRS(<p>) => RULE(<p>)
#command CMD(<p>) => RULE(<p>)
proc main()
DEF("def")
TRS("trs")
CMD("cmd")
return
Compile it by Clipper and [x]Harbour and compare the results.
Next important thing is that Clipper preprocess all indirect #directive body.
It means that in Clipper is not possible to execute indirect #undef DEFNAME
because if DEFNAME is already defined then it will be preprocessed and as
result we will have #undef <DEFNAME_value> before PP execute this #
directive. We can replicate this behavior but personally I do not like it.
for me it's a limitation not a feature and I do not want to replicate it.