Vous êtes sur la page 1sur 2

Chapter 7 PHP Objects (Code operation optimising and Classes)

PHP File I/O and String Regular Expression engine

Types of files

Before doing anything further it is wise to explain a small problem that is a difference between file handling in Windows/DOS systems
and UNIX systems e.g. Linx. Because two data handling methods of file read/write interaction occur.Windows/DOS requires you to remember
to open binary files or any file not read/written by text read/write methods using the "b" setting. In PERL the file handle for the open() function
is declared as "(binmode)" and as such "b" is the representation to the fopen() by setting in PHP to do a binary read.You must use "b" as you
also must in PERL use "(binmode)" or damage to the server OS will result in windows/DOS. Often this is explained as opening binary files but
that is irrelevant as i said before text and data are bytes and the server does not have much of a clue except file extensions on their file names,
and only uses them for its server application specified purpose.The purpose of programming is to customise and commit efficiency by customised
application interaction and operation. It is only relevant the read/write method you use not the files type/MIME type at any time.
While using a unix server it does not bother it whether the read is binary or text it is decided as good practice to always use "b" setting in all scripts
as applicable to the read/write method not the files data composition.
So there are binary and text files, well actually there are more.Other files are closer to addresses and are referred to in web programming and
networking as resources or URL's(universal resource Location) or URI's(universal resource Identity).Here we are more interested with the URI
Identity because another type of file is for networking and called a socket(for other purposes a port). As confusing as this previous another type
of file(an associative pair of them) also of importance is part of the actual server hardware but quite different to a socket but not particularly
different and are called Standard Input and Standard Output. The main difference is that Standard Input/Output is for the internal interaction of
data transfer of data and commands with the CPU, and the application processes sending the information.Sockets and ports are for extension of
the filing system into other machines, an activity called , networking (e.g. routing and bridging) and also extension to external devices e.g. a printer.

Testing, Opening and closing files in PHP

Because PHP is heavily constructed for web servers (not that it doesn't contain many other script language uses (including some lower level
language functions)) the method of opening and read/writing a file shown here will be conservative of its security requirements by coding. Often
you will have trouble setting the policy for the file to be read or created on a web server because of the settings of the PHP security levels.
The extra problem comes in that unlike both the PHP manuals example of opening a file using PHP or using a script for e.g. administrative or
personal use internally in a machine ,opening and reading for security reasons requires two to three tests to be performed before the complete
read or write action can be done.
First the file to be opened must be tested for its existence.Then it must be tested for its ability to be read/written.This latter is not the same as
testing whether or not the read/write occurred.
One important point about file handling must be remembered, and that is the mode of fopen() that is set in the second argument to the function.
In the script i have set it to "r+" because it allows reading and writing but does not truncate(delete file contents) the file when opened.The warning
here is to always be sure of the setting given to fopen() because some of the settings can truncate the file. The example below is actually more
suitable for binary files, to operate on text it is recommended to use file() or file_get_contents() functions for reading text.

<?php
#################### web-read-write.php ################################
$dirjoin='/server/asite/www'; # $_SERVER['DOCUMENT_ROOT'] or $DOCUMENT_ROOT whichever operates
# attachment_to_enc essentially a runtime instance declared variable (almost global scope)
$attachment_to_enc=$dirjoin."wreadfile.txt";
clearstatcache();
if(file_exists($attachment_to_enc)){
if(is_file($attachment_to_enc)){
if(is_readable($attachment_to_enc)){
if(is_writable($attachment_to_enc)){
/* START THE OPENING SEQUENCE */
if(!$knob=fopen($attachment_to_enc,"r+")){
exit("CANNOT OPEN FILE FOPEN");
}else{
/* BELONGS TO THE FOPEN IF NOW TO DO SOME READING */
$data=fread($knob,filesize($attachment_to_enc));
## filesize returns a long integer and can be used before opening a file
print($data);
/* prn the files contents since i know its a text file */
fseek($knob,0,SEEK_END); ## DO NOT USE http:// or ftp:// protocol in fopen() when using fseek()
if(!fwrite($knob,("more content"."\n\n"))){
exit("CANNOT WRITE TO FILE FWRITE");
}else{
/* belongs to the fwrite IF */
}
if(!fclose($knob)){
exit("CANNOT CLOSE FILE FCLOSE");
}else{}
/* BELONGS TO THE FOPEN IF NOW TO DO SOME READING */
}
/* writeable IF */
}else{
exit("SETTING FOR WRITABLE IS OFF");
}
}else{
/* BELONGS TO THE read write test IF */
print('SETTING FOR READ PERMS IS OFF');
}
}else{
exit("the file is not a normal file");
}
}else{
exit("the file does not exist");
}
?>

Regular Expressions (specialised matching of text sequences accurately)

PHP regular expressions are not much different in syntax from PERL regular expressions at the level of the Pattern
used inside the template entry. As i wrote before, PHP does much of its operations based on pre defined function
calls not in semantic syntax.PHP has its own minor difference to PERL in the templates and the PERL templates useable
do not support a limmited set of PERL's problem expression handler pragmas for regular expressions. To start i will go
through the common set of symbols used in most script engines for constructing regular expression templates. Remember
these are not the same as the language operators but similar or quite activity specific.
! negation (reversal/inverse of current operation value) similar to boolean is not.
* find 0 or as many much as wanted
? greed operator (find in as short a set as possible)
+ find once or more
. any character except new line or space (dependent on language and expression engine configuration)
() grouping operator (much as in maths)
[] class operator (makes a set of characters to find p/single input character e.g. [aCdeFk] )
- range operator (from logically ordered set from-to e.g. a group of classes of ([0-9][a-z][A-Z]) )
| pipe operator to use in a group (meaning both but either new line or space singularly in the group (\s|\n) )
^ start of input of searched string
$ end of input of searched string
\ escape character
{} expression find quantity e.g. {2} find twice {4,9} find 4 at least 9 at most.
\w a word character
\d a digit
\W a non word character
\D a non digit character
Two more things to know about PHP regular expressions. The return results are placed in an array of your naming and
the template is fed as a string to the regular expression function not as a bareword symbol set.
The escape character is important in regular expressions and use extremely often to build any regular expression
template.Effectively to place any of the above characters into a string to be searched for requires escaping the
character inside the string. e.g. to match "$" sign as a searched string requires using "\$" as the template.
Thankfully since PHP 4.3.3 \Q \E set of literalisers are available to put in the search string. \Q is the start of the
literalisation and \E ends the literalisation in the template.This set has been available in PERL regular expression
syntax for some time.In PHP it is a little tricky to use. e.g. to make the template for a variable containging
e.g the dollars sign again.

<?php
#################### expigate.php ################################
/* PERL compatible function type not the forward slahes(regexp delimeters) at each end in the template string */
$input="$ is a standard economic unit symbol"; # contains a meta character
$template="/";
$template.='\Q$\E';
$template.="/"; # makes /\Q$\E/
if(preg_match($template,$input)){
print('$ '."matched<p>");
}else{
print('$ '."did not match<p>");
}
# same again but a constructiion problem
$input="$ is a standard economic unit symbol"; # contains a meta character
$dollar="$"; # this is likely to be how a literalised template will gain its main input pattern parts (a a fed $variable)
$template="/\Q$dollar\E/"; # makes /\Q$\E/
if(preg_match($template,$input)){
print('$ '."matched<p>");
}else{
print('$ '."did not match<p>");
}
/* PHP reg exp types until use of quotemeta() function are all dangerous/frivolous happy go lucky though they operate this way with metas */
$template_php='$';
if(eregi($template_php,$input)){
print('$ '."matched<p>");
}else{ print('$ '."did not match<p>");
}
if(eregi("\$",$input)){ # except
print('$ '."matched<p>");
}else{ print('$ '."did not match<p>");
}
$template_php="$";
$template_php=quotemeta($template_php);
if(eregi($template_php,$input)){ # eregi() is the non case sensitive function for match testing
print('$ '."matched<p>");
}else{
print('$ '."did not match<p>");
}
?>

preg_match() and preg_match_all() relate to first singular find and global(all around) find through a searched string.
Above php uses a string conditioning function to escape all meta characters passed to the template in a string.
Other string operation functions exist of varied levels of accuracy and specialised job but the main functions are to
operate on character case(upper/lower), ignore case through all, replace substring by regular expression , or test for
matches.
preg_match_all() used in the next script example contains four arguments to its call. The last is a changeable flag to
know what to aquire and how to arrange (PREG_PATTERN_ORDER) the results into the returned multi-dimensional array.
One of the tools PHP has is the return on the second array of any parenthesised subpatterns.
In the script below is the use of the range operator in the square brackets for the template and also use of the pipe
operator and grouping for the an either choice.

<?php
######################## unexpigated.php ##########################
$input="$ is a standard economic unit symbol"; # contains a meta character
$template_php="f-g"; # is the equivalent of using (f|g) ## ## can't use brackets on [f-g] for class here because of quotemeta() function
$template_php=quotemeta($template_php);
if(eregi('['.$template_php.']',$input)){
print('[f-g] '."matched<p>");
}else{
print('[f-g] '."did not match<p>");
}
/* as (f|g) */
$template_php="f|g"; ## can't use parenthesis for group (f|g) here because of quotemeta() function
$template_php=quotemeta($template_php);
if(eregi('('.$template_php.')',$input)){
print('(f|g) '."matched<p>");
}else{
print('(f|g) '."did not match<p>");
}
/* now as (d|e) with letters known to be present in the string */
$template_php="d|e";
$template_php=quotemeta($template_php);
if(eregi('('.$template_php.')',$input)){
print('(d|e) '."matched<p>");
}else{
print('(d|e) '."did not match<p>");
}
/* now as [d-e] */
$input="$ is a standard economic unit symbol"; # contains a meta character
$template_php="d-e"; # is the equivalent of (f|g)
$template_php=quotemeta($template_php);
if(eregi('['.$template_php.']',$input)){
print('[d-e] '."matched<p>");
}else{
print('[d-e] '."did not match<p>");
}
/* now with letters known to be present in the string */
$input="$ is a standard economic unit symbol"; # contains a meta character
$template_php="a"; #
$template_php=quotemeta($template_php);
$ar;
if(eregi(($template_php."+"),$input)){
preg_match_all(('/'.$template_php.'/'),$input,$ar,PREG_PATTERN_ORDER);
print('a '."matched and the input string is ".'$ar[0][0]: '.$ar[0][0]."<br>and there are ".($cst=(string)(count($ar[0])))." number of matches");
}else{
print('a '."did not match<p>");
}
/* now a little more complex */
$input="$ is a standard economic unit symbol"; # contains a meta character
$template_php='a'; #
$template_php=quotemeta($template_php);
$ar;
if(eregi('.'.$template_php.'.'."+",$input)){
preg_match_all(('/(.{1}'.$template_php.'.{1})/'),$input,$ar,PREG_PATTERN_ORDER);
$cast=(string)(count($ar[0]));
print('<p><p>a '."matched ".$cast." times and the input string is
<p>".$input.'<br>preg_match_all((\'/(.{1}\'.$template_php.\'.{1})/\'),$input,$ar,PREG_PATTERN_ORDER)<br>');
for($ot=0;($ot < $cast);$ot++){
print('$ar[0]['.$ot.'] is:'." ".$ar[0][$ot].':: First left template parenthisised sub-pattern match->$ar[1]['.$ot.'] is:'." ".$ar[1][$ot].'<br>');
}
# enfr
}else{
print('a '."did not match<p>");
}
# $ is a standard economic unit symbol
# $ar[0][0] is: a $ar[1][0] is: a
# $ar[0][1] is: tan $ar[1][1] is: tan
# $ar[0][2] is: dar $ar[1][2] is: dar
#
# $ar[0][] are extracted matches $ar[1][] set are first parenthesised matches in the template

?>

One final operator to know here in some detail is the greed ? operator. The greed operator(or else known as the
quantifier). Its existence is to limit the amount of match characters an expression will contain counter balanced
against other quantifying operators such as + or * . The way it operates is as in the problem below where if the
template uses the string to match from an a "a.*a" to any number of characters of no definite recognition it will
reach the the last "a" whether others are between it or not as a match.

<?php
######################## shlurp.php ##########################
$input="$ is a standard economic unit symbol"; # prints match of ----a standa---- as far as it can reach
if(preg_match("/a.*a+/",$input)){
preg_match_all("/a.*a+/",$input,$ar,PREG_PATTERN_ORDER);
$cnt=count($ar[0]);
$up=0;
print($input.'<br>');
while($up<$cnt):
print("match is: ".$ar[0][$up].'<br>exp: a.*a+<br>');
$up++;
endwhile;
}
#
print('<p>now with the greed operator<p>');
# prints match of ----a sta---- the shortest possible (to match case unsensitive use "/a.*?a+/i" the i flag in the expression for the PERL compatibles)
if(preg_match("/a.*?a+/",$input)){
preg_match_all("/a.*?a+/",$input,$ar,PREG_PATTERN_ORDER);
$cnt=count($ar[0]);
$up=0;
print($input.'<br>');
while($up<$cnt):
print("match is: ".$ar[0][$up].'<br>exp: a.*?a+<br>');
$up++;
endwhile;
}
?>

Vous aimerez peut-être aussi