Vous êtes sur la page 1sur 44

(Do not be afraid of)

PHP Compiler Internals


Sebastian Bergmann
June 13th 2009
Who I Am

 Sebastian Bergmann
 Involved in the PHP
project since 2000
 Creator of PHPUnit
 Co-Founder and
Principal Consultant
with thePHP.cc
Under PHP's Hood

Extensions

(date, dom, gd, json, mysql, pcre, pdo, reflection, session, standard, …)

PHP Core Zend Engine

Request Management Compilation and Execution


File and Network Operations Memory and Resource Allocation

Server API (SAPI)

(mod_php, FastCGI, CLI, ...)

This slide contains material by Sara Golemon


How PHP executes code
 Lexical Analysis
Converts the source from a sequence of characters into a
sequence of tokens
How PHP executes code
 Lexical Analysis
 Syntax Analysis
Analyzes a sequence of tokens to determine their grammatical
structure
How PHP executes code
 Lexical Analysis
 Syntax Analysis
 Bytecode Generation
Generate bytecode based on the information gathered by
analyzing the sourcecode
How PHP executes code
 Lexical Analysis
 Syntax Analysis
 Bytecode Generation
 Bytecode Execution
Lexical Analysis
Scan a sequence of characters
1 <?php
2 if (TRUE) {
3 print '*';
4 }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php T_OPEN_TAG
2 if (TRUE) {
3 print '*';
4 }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php T_OPEN_TAG
2 if (TRUE) { T_IF
T_WHITESPACE
(
T_STRING
)
T_WHITESPACE
{
T_WHITESPACE
3 print '*';
4 }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php T_OPEN_TAG
2 if (TRUE) { T_IF
T_WHITESPACE
(
T_STRING
)
T_WHITESPACE
{
T_WHITESPACE
3 print '*'; T_PRINT
T_WHITESPACE
T_CONSTANT_ENCAPSED_STRING
;
4 }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php T_OPEN_TAG
2 if (TRUE) { T_IF
T_WHITESPACE
(
T_STRING
)
T_WHITESPACE
{
T_WHITESPACE
3 print '*'; T_PRINT
T_WHITESPACE
T_CONSTANT_ENCAPSED_STRING
;
T_WHITESPACE
4 } }
5 ?>
Lexical Analysis
Scan a sequence of characters
1 <?php T_OPEN_TAG
2 if (TRUE) { T_IF
T_WHITESPACE
(
T_STRING
)
T_WHITESPACE
{
T_WHITESPACE
3 print '*'; T_PRINT
T_WHITESPACE
T_CONSTANT_ENCAPSED_STRING
;
T_WHITESPACE
4 } }
T_WHITESPACE
5 ?> T_CLOSE_TAG
Lexical Analysis
Scan a sequence of characters
T_OPEN_TAG <?php
T_IF if
T_WHITESPACE
(
T_STRING TRUE
)
T_WHITESPACE
{
T_WHITESPACE
T_PRINT print
T_WHITESPACE
T_CONSTANT_ENCAPSED_STRING '*'
;
T_WHITESPACE
}
T_WHITESPACE
T_CLOSE_TAG ?>
Lexical Analysis
Scan a sequence of characters
Lexical Analysis
Scanner Generators

 You do not want to write a scanner by


hand
 At least when the code for the scanner should
be efficient and maintainable
 Tools such as flex or re2c generate the
code for a scanner from a set of rules

<ST_IN_SCRIPTING>"if"
"if" { {
return T_IF;
}
Lexical Analysis
PHP Tokens

 T_ABSTRACT  T_CONCAT_EQUAL  T_ELSE  T_FUNCTION

 T_AND_EQUAL  T_CONST  T_ELSEIF  T_FUNC_C

 T_ARRAY  T_CONSTANT_ENCAPSED_STRING  T_EMPTY  T_GLOBAL

 T_ARRAY_CAST  T_CONTINUE  T_ENCAPSED_AND_WHITESPACE  T_GOTO

 T_AS  T_CURLY_OPEN  T_ENDDECLARE  T_HALT_COMPILER

 T_BAD_CHARACTER  T_DEC  T_ENDFOR  T_IF

 T_BOOLEAN_AND  T_DECLARE  T_ENDFOREACH  T_IMPLEMENTS

 T_BOOLEAN_OR  T_DEFAULT  T_ENDIF  T_INC

 T_BOOL_CAST  T_DIR  T_ENDSWITCH  T_INCLUDE

 T_BREAK  T_DIV_EQUAL  T_ENDWHILE  T_INCLUDE_ONCE

 T_CASE  T_DNUMBER  T_END_HEREDOC  T_INLINE_HTML

 T_CATCH  T_DOC_COMMENT  T_EVAL  T_INSTANCEOF

 T_CHARACTER  T_DO  T_EXIT  T_INT_CAST

 T_CLASS  T_DOLLAR_OPEN_CURLY_BRACES  T_EXTENDS  T_INTERFACE

 T_CLASS_C  T_DOUBLE_ARROW  T_FILE  T_ISSET

 T_CLONE  T_DOUBLE_CAST  T_FINAL  T_IS_EQUAL

 T_CLOSE_TAG  T_DOUBLE_COLON  T_FOR  T_IS_GREATER_OR_EQUAL

 T_COMMENT  T_ECHO  T_FOREACH  T_IS_IDENTICAL


Lexical Analysis
PHP Tokens

 T_IS_NOT_EQUAL  T_OBJECT_CAST  T_SR_EQUAL

 T_IS_NOT_IDENTICAL  T_OBJECT_OPERATOR  T_START_HEREDOC

 T_IS_SMALLER_OR_EQUAL  T_OLD_FUNCTION  T_STATIC

 T_LINE  T_OPEN_TAG  T_STRING

 T_LIST  T_OPEN_TAG_WITH_ECHO  T_STRING_CAST

 T_LNUMBER  T_OR_EQUAL  T_STRING_VARNAME

 T_LOGICAL_AND  T_PAAMAYIM_NEKUDOTAYIM  T_SWITCH

 T_LOGICAL_OR  T_PLUS_EQUAL  T_THROW

 T_LOGICAL_XOR  T_PRINT  T_TRY

 T_METHOD_C  T_PRIVATE  T_UNSET

 T_MINUS_EQUAL  T_PUBLIC  T_UNSET_CAST

 T_ML_COMMENT  T_PROTECTED  T_USE

 T_MOD_EQUAL  T_REQUIRE  T_VAR

 T_MUL_EQUAL  T_REQUIRE_ONCE  T_VARIABLE

 T_NAMESPACE  T_RETURN  T_WHILE

 T_NS_C  T_SL  T_WHITESPACE

 T_NEW  T_SL_EQUAL  T_XOR_EQUAL

 T_NUM_STRING  T_SR
Syntax Analysis
Analyze a sequence of tokens
Syntax Analysis
Parser Generators

 You do not want to write a parser by hand


 At least when the code for the scanner should
be efficient and maintainable
 Tools such as bison or lemon generate
the code for a parser from a set of rules

T_IF '(' expr ')' { ... }


statement { ... }
elseif_list else_single { ... }
PHP Bytecode
Disassembling with vld
1 <?php
2 if (TRUE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % php -dextension=vld.so -dvld.active=1 -dvld.execute=0 if.php
filename: /home/sb/if.php
function name: (null)
number of ops: 8
compiled vars: none
line # op fetch ext return operands
-------------------------------------------------------------------------------
2 0 EXT_STMT
1 JMPZ true, ->6
3 2 EXT_STMT
3 PRINT ~0 '%2A'
4 FREE ~0
4 5 JMP ->6
6 6 EXT_STMT
7 RETURN 1
PHP Bytecode
Disassembling with bytekit-cli
1 <?php
2 if (TRUE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % bytekit if.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename: /home/sb/if.php
Function: main
Number of oplines: 8

line # opcode result operands


-----------------------------------------------------------------------------
2 0 EXT_STMT
1 JMPZ true, ->6

3 2 EXT_STMT
3 PRINT ~0 '*'
4 FREE ~0
4 5 JMP ->6

6 6 EXT_STMT
7 RETURN 1
PHP Bytecode
Bytecode visualization with bytekit-cli
1 <?php
2 if (TRUE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % bytekit --graph /tmp --format svg if.php
PHP Bytecode
Disassembling with bytekit-cli
1 <?php
2 $a = 1;
3 $b = 2;
4 print $a + $b;
5 ?>
sb@thinkpad ~ % bytekit add.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename: /home/sb/add.php
Function: main
Number of oplines: 10
Compiled variables: !0 = $a, !1 = $b

line # opcode result operands


-----------------------------------------------------------------------------
2 0 EXT_STMT
1 ASSIGN !0, 1
3 2 EXT_STMT
3 ASSIGN !1, 2
4 4 EXT_STMT
5 ADD ~2 !0, !1
6 PRINT ~3 ~2
7 FREE ~3
6 8 EXT_STMT
9 RETURN 1
PHP Bytecode
List of Opcodes

 NOP  IS_NOT_EQUAL  POST_INC  ADD_VAR  UNSET_DIM

 ADD  IS_SMALLER  POST_DEC  BEGIN_SILENCE  UNSET_OBJ

 SUB  IS_SMALLER_OR_EQUAL  ASSIGN  END_SILENCE  FE_RESET

 MUL  CAST  ASSIGN_REF  INIT_FCALL_BY_NAME  FE_FETCH

 DIV  QM_ASSIGN  ECHO  DO_FCALL  EXIT

 MOD  ASSIGN_ADD  PRINT  DO_FCALL_BY_NAME  FETCH_R

 SL  ASSIGN_SUB  JMPZ  RETURN  FETCH_DIM_R

 SR  ASSIGN_MUL  JMPNZ  RECV  FETCH_OBJ_R

 CONCAT  ASSIGN_DIV  JMPZNZ  RECV_INIT  FETCH_W

 BW_OR  ASSIGN_MOD  JMPZ_EX  SEND_VAL  FETCH_DIM_W

 BW_AND  ASSIGN_SL  JMPNZ_EX  SEND_VAR  FETCH_OBJ_W

 BW_XOR  ASSIGN_SR  CASE  SEND_REF  FETCH_RW

 BW_NOT  ASSIGN_CONCAT  SWITCH_FREE  NEW  FETCH_DIM_RW

 BOOL_NOT  ASSIGN_BW_OR  BRK  FREE  FETCH_OBJ_RW

 BOOL_XOR  ASSIGN_BW_AND  BOOL  INIT_ARRAY  FETCH_IS

 IS_IDENTICAL  ASSIGN_BW_XOR  INIT_STRING  ADD_ARRAY_ELEMENT  FETCH_DIM_IS

 IS_NOT_IDENTICAL  PRE_INC  ADD_CHAR  INCLUDE_OR_EVAL  FETCH_OBJ_IS

 IS_EQUAL  PRE_DEC  ADD_STRING  UNSET_VAR  FETCH_FUNC_ARG


PHP Bytecode
List of Opcodes

 FETCH_DIM_FUNC_ARG  INIT_STATIC_METHOD_CALL

 FETCH_OBJ_FUNC_ARG  ISSET_ISEMPTY_VAR

 FETCH_UNSET  ISSET_ISEMPTY_DIM_OBJ

 FETCH_DIM_UNSET  PRE_INC_OBJ

 FETCH_OBJ_UNSET  PRE_DEC_OBJ

 FETCH_DIM_TMP_VAR  POST_INC_OBJ

 FETCH_CONSTANT  POST_DEC_OBJ

 EXT_STMT  ASSIGN_OBJ

 EXT_FCALL_BEGIN  INSTANCEOF

 EXT_FCALL_END  DECLARE_CLASS

 EXT_NOP  DECLARE_INHERITED_CLASS

 TICKS  DECLARE_FUNCTION

 SEND_VAR_NO_REF  RAISE_ABSTRACT_ERROR

 CATCH  ADD_INTERFACE

 THROW  VERIFY_ABSTRACT_CLASS

 FETCH_CLASS  ASSIGN_DIM

 CLONE  ISSET_ISEMPTY_PROP_OBJ

 INIT_METHOD_CALL  HANDLE_EXCEPTION
Extending the Compiler
Test First!
Zend/tests/unless.phpt
--TEST--
unless statement
--FILE--
<?php
unless (FALSE) {
print 'unless FALSE is TRUE, this is printed';
}

unless (TRUE) {
print 'unless TRUE is TRUE, this is printed';
}
?>
--EXPECT--
unless FALSE is TRUE, this is printed
Extending the Compiler
 Add token for unless to the scanner
 Add rule for unless to the parser
 Generate bytecode for unless in the compiler
 Add token for unless to ext/tokenizer
Add unless scanner token
Zend/zend_language_scanner.l
<ST_IN_SCRIPTING>"if" {
return T_IF;
}

<ST_IN_SCRIPTING>"unless" {
return T_UNLESS;
}

<ST_IN_SCRIPTING>"elseif" {
return T_ELSEIF;
}

<ST_IN_SCRIPTING>"endif" {
return T_ENDIF;
}

<ST_IN_SCRIPTING>"else" {
return T_ELSE;
}
Add unless parser rule
Zend/zend_language_parser.y
%token T_NAMESPACE
%token T_NS_C
%token T_DIR
%token T_NS_SEPARATOR
%token T_UNLESS
.
.
unticked_statement:
'{' inner_statement_list '}'
| T_IF '(' expr ')' {
.
.
| T_UNLESS '(' expr ')' {
zend_do_unless_cond(&$3, &$4 TSRMLS_CC);
} statement {
zend_do_if_after_statement(&$4, 1 TSRMLS_CC);
} {
zend_do_if_end(TSRMLS_C);
}
.
.
How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
typedef struct _znode {
int op_type;
union {
zval constant;

zend_uint var;
zend_uint opline_num;
zend_op_array *op_array;
zend_op *jmp_addr;
struct {
zend_uint var;
zend_uint type;
} EA;
} u;
} } znode;

zend_do_if_cond() is called when an if statement is compiled


How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);

struct _zend_op {
opcode_handler_t handler;
znode result;
znode op1;
znode op2;
ulong extended_value;
uint lineno;
zend_uchar opcode;
} };

Allocate a new opline in the current oparray


How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;

Set the opcode of the new opline to JMPZ (jump if zero)


How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;
opline->op1 = *cond;

Set the first operand of the new opline to the if condition


How if is compiled
Zend/zend_compile.c
void zend_do_if_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int if_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPZ;
opline->op1 = *cond;
closing_bracket_token->u.opline_num =
if_cond_op_number;
SET_UNUSED(opline->op2);
INC_BPC(CG(active_op_array));
}

Perform book keeping tasks such as marking the second operand of the
new opline as unused or incrementing the backpatching counter for the
current oparray
Add unless to compiler
Zend/zend_compile.c
void zend_do_unless_cond
(const znode *cond, znode *closing_bracket_token TSRMLS_DC)
{
int unless_cond_op_number =
get_next_op_number(CG(active_op_array));
zend_op *opline =
get_next_op(CG(active_op_array) TSRMLS_CC);

opline->opcode = ZEND_JMPNZ;
opline->op1 = *cond;
closing_bracket_token->u.opline_num =
unless_cond_op_number;
SET_UNUSED(opline->op2);
INC_BPC(CG(active_op_array));
}

All we have to do to generate code for the unless statement, as


compared to generate code for the if statement, is to use the JMPNZ
(jump if not zero) opcode instead of the JMPZ (jump if zero) opcode
Add unless to compiler
The generated bytecode
1 <?php
2 unless (FALSE) {
3 print '*';
4 }
5 ?>
sb@thinkpad ~ % bytekit unless.php
bytekit-cli 1.0.0 by Sebastian Bergmann.

Filename: /home/sb/unless.php
Function: main
Number of oplines: 8

line # opcode result operands


-----------------------------------------------------------------------------
2 0 EXT_STMT
1 JMPNZ true, ->6

3 2 EXT_STMT
3 PRINT ~0 '*'
4 FREE ~0
4 5 JMP ->6

6 6 EXT_STMT
7 RETURN 1
Run the test
sb@thinkpad php-5.3-unless % make test TESTS=Zend/tests/unless.phpt

Build complete.
Don't forget to run 'make test'.

=====================================================================
PHP : /usr/local/src/php/php-5.3-unless/sapi/cli/php
PHP_SAPI : cli
PHP_VERSION : 5.3.0RC3-dev
ZEND_VERSION: 2.3.0
PHP_OS : Linux 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/Linux
INI actual : /usr/local/src/php/php-5.3-unless/tmp-php.ini
More .INIs :
CWD : /usr/local/src/php/php-5.3-unless
Extra dirs :
VALGRIND : Not used
=====================================================================
Running selected tests.
PASS unless statement [Zend/tests/unless.phpt]
=====================================================================
Number of tests : 1 1
Tests skipped : 0 ( 0.0%) --------
Tests warned : 0 ( 0.0%) ( 0.0%)
Tests failed : 0 ( 0.0%) ( 0.0%)
Expected fail : 0 ( 0.0%) ( 0.0%)
Tests passed : 1 (100.0%) (100.0%)
---------------------------------------------------------------------
Time taken : 0 seconds
=====================================================================
Add unless to ext/tokenizer
ext/tokenizer/tokenizer_data.c

sb@thinkpad tokenizer % ./tokenizer_data_gen.sh


Wrote tokenizer_data.c
The End

Thank you for your interest!

These slides will be linked soon from


http://sebastian-bergmann.de/

You can vote for this talk on


http://joind.in/582
Acknowledgements
 Thomas Lee, whose Python Language Internals presentation at
OSDC 2008 inspired this presentation
 Stefan Esser for creating the Bytekit extension that provides
PHP bytecode access and analysis features
 Derick Rethans, David Soria Parra, and Scott MacVicar for reviewing
these slides
References

 http://www.php.net/manual/en/tokens.php
 http://www.zapt.info/opcodes.html
 Sara Golemon: ”Extending and Embedding PHP”

 http://derickrethans.nl/vld.php
 http://bytekit.org/
 http://github.com/sebastianbergmann/bytekit-cli/
License
  This presentation material is published under the Attribution-Share Alike 3.0 Unported
license.
  You are free:
✔ to Share – to copy, distribute and transmit the work.
✔ to Remix – to adapt the work.
  Under the following conditions:
● Attribution. You must attribute the work in the manner specified by the author or
licensor (but not in any way that suggests that they endorse you or your use of the
work).
● Share Alike. If you alter, transform, or build upon this work, you may distribute the
resulting work only under the same, similar or a compatible license.
  For any reuse or distribution, you must make clear to others the license terms of this
work.
  Any of the above conditions can be waived if you get permission from the copyright
holder.
  Nothing in this license impairs or restricts the author's moral rights.

Vous aimerez peut-être aussi