Pin: Intel's Dynamic Binary Instrumentation Engine Pin Tutorial

Software & Services Group
1
Pin: Intels Dynamic Binary
Instrumentation Engine

Pin Tutorial
Intel Corporation
Presented By:

Tevi Devor

CGO ISPASS 2012

2
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED, BY
ESTOPPEL OR OTHERWISE, ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS
DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES
RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF
ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Performance tests and ratings are measured using specific computer systems and/or components and
reflect the approximate performance of Intel products as measured by those tests. Any difference in
system hardware or software design or configuration may affect actual performance. Buyers should
consult other sources of information to evaluate the performance of systems or components they are
considering purchasing. For more information on performance tests and on the performance of Intel
products, reference www.intel.com/software/products.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

*Other names and brands may be claimed as the property of others.

Copyright 2010. Intel Corporation.

3
Intel compilers, associated libraries and associated development tools may include or utilize options
that optimize for instruction sets that are available in both Intel and non-Intel microprocessors (for
example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition,
certain compiler options for Intel compilers, including some that are not specific to Intel micro-
architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler
options, including the instruction sets and specific microprocessors they implicate, please refer to the
Intel Compiler User and Reference Guides under Compiler Options." Many library routines that are
part of Intel compiler products are more highly optimized for Intel microprocessors than for other
microprocessors. While the compilers and libraries in Intel compiler products offer optimizations for
both Intel and Intel-compatible microprocessors, depending on the options you select, your code and
other factors, you likely will get extra performance on Intel microprocessors.
Intel compilers, associated libraries and associated development tools may or may not optimize to the
same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include Intel Streaming SIMD Extensions 2 (Intel SSE2),
Intel Streaming SIMD Extensions 3 (Intel SSE3), and Supplemental Streaming SIMD Extensions 3
(Intel SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability,
functionality, or effectiveness of any optimization on microprocessors not manufactured by
Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel
microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best
performance on Intel and non-Intel microprocessors, Intel recommends that you evaluate other
compilers and libraries to determine which best meet your requirements. We hope to win your business
by striving to offer the best performance of any compiler or library; please let us know if you find we do
not.
Notice revision #20110307
Optimization Notice

4
Agenda
Part1: Introduction to Pin

Part2: Larger Pin tools and writing efficient Pin
tools

Part3: Deeper into Pin API

Part4: Advanced Pin API

Part5: Performance #s

5
Part1: Introduction to Pin
Dynamic Binary Instrumentation
Pin Capabilities
Overview of how Pin works
Sample Pin Tools

6
What Does Pin Stand For?
Three Letter Acronyms @ Intel
TLAs
26
3
possible TLAs
26
3
-1 are in use at Intel
Only 1 is not approved for use at Intel
Guess which one:
Pin Is Not an acronym
Pin is based on the post link optimizer Spike
Pin is a small Spike
Spike is EOL
http://www.cgo.org/cgo2004/papers/01_82_luk_ck.pdf

7
Which one of these people is the Pin Performance Guru? tevi.devor@intel.com

8
Instrumentation
Source-Code Instrumentation
Static Binary Instrumentation
Dynamic Binary Instrumentation
Instrument code just before it runs (Just In Time JIT)
No need to recompile or re-link
Discover code at runtime
Handle dynamically-generated code
Attach to running processes
A technique that inserts code into a program to collect
run-time information
Program analysis : performance profiling, error detection,
capture & replay
Architectural study : processor and cache simulation, trace
collection
Pin is a dynamic binary instrumentation engine

9
Advantages of Pin Instrumentation
Programmable Instrumentation:
Provides rich set of APIs to write, in C,C++,assembly, your own
instrumentation tools, called PinTools
APIs are designed to maximize ease of use
abstract away the underlying instruction set idiosyncrasies
Multiplatform:

Robust:
Instruments real-life applications: Database, web browsers,
Instruments multithreaded applications
Supports signals and exceptions, self modifying code
If you can Run it you can Pin it
Efficient:
Applies compiler optimizations on instrumentation code
Pin can be used to instrument all the user level code
in an application
OSs Architectures
Windows, Linux IA-32, Intel64

10
Pin Instrumentation Capabilities
Use Pin APIs to write PinTools that:

Replace application functions with your own
Call the original application function from within your replacement
function

Fully examine any application instruction, insert a call to your
instrumenting function to be executed whenever that instruction
executes
Pass parameters to your instrumenting function from a large set of
supported parameters
Register values (including IP), Register values by reference (for modification)
Memory addresses read/written by the instruction
Full register context
.

Track function calls including syscalls and examine/change
arguments
Track application threads
Intercept signals
Instrument a process tree
Many other capabilities

If Pin doesnt have it, you dont want it

11
Usage of Pin at Intel
Profiling and analysis products
Intel Parallel Studio
Amplifier (Performance Analysis)
Lock and waits analysis
Concurrency analysis
Inspector (Correctness Analysis)
Threading error detection (data race and deadlock)
Memory error detection

Architectural research and enabling
Emulating new instructions (Intel SDE)
Trace generation
Branch prediction and cache modeling
GUI
Algorithm
PinTool
Pin

12
MT Workload
Capture
& Deterministic
Replay
PinPlay
Simulation
Region
Selection
PinPoints
Trace
Generation
pinLIT
Instruction
Emulation
(new instructions)
SDE

Cache
Simulation
CMP$IM
Pin
Example Pin-tools
SDE: http://software.intel.com/en-us/articles/intel-software-development-emulator

CMP$IM: http://www-mount.ece.umn.edu/~jjyi/MoBS/2008/program/02A-Jaleel.pdf

PinPlay: Paper presented at CGO2010 http://www.cgo.org/cgo2010/program.html

13
Pin Usage Outside Intel
Popular and well supported
30,000+ downloads, 400+ citations

Free DownLoad
www.pintool.org
Includes: Detailed user manual, source code for 100s of
Pin tools

Pin User Group (PinHeads)
http://tech.groups.yahoo.com/group/pinheads/
Pin users and Pin developers answer questions

14
Starting at first application IP Read
a Trace from Application Code
Jit it, adding instrumentation code
from inscount.dll
Encode the trace into the Code
Cache
Execute Jitted code
Execution of Trace ends
Call into PINVM.DLL to Jit next
trace
Pass in app IP of Traces target
Source Trace exit branch is
modified to directly branch to
Destination Trace
Pin Invocation gzip.exe input.txt
Application
Code and
Data
Application Process
System Call
Dispatcher
Event
Dispatcher
Thread
Dispatcher
PINVM.DLL
inscount.dll
PIN.LIB

Code
Cache
NTDLL.DLL
Windows kernel
CreateProcess (gzip.exe, input.txt, suspended)
Launcher
PIN.EXE
Launcher Process
Boot Routine +
Data:
firstAppIp,
Inscount.dll
Load PINVM.DLL
Inject Pin BootRoutine and Data into application
Load
inscount.dll
and run its
main()
Start PINVM.DLL
running
(firstAppIp,
inscount.dll)

pin.exe t inscount.dll gzip.exe input.txt
Count 258743109

PinTool that counts application
instructions executed, prints Count
at end
Resume at BootRoutine
First
app
IP
app Ip of
Traces
target
Read a Trace from Application
Code
from inscount.dll
Encode the jitted trace into the
Code Cache
GetContext(&firstAppIp)
SetContext(BootRoutineIp)
WriteProcessMemory(BootRoutine, BootData)
D
e
c
o
d
e
r

E
n
c
o
d
e
r

15
All code in this presentation is covered
by the following:
/*BEGIN_LEGAL
Intel Open Source License

Copyright (c) 2002-2010 Intel Corporation. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer. Redistributions
in binary form must reproduce the above copyright notice, this list of
conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution. Neither the name of
the Intel Corporation nor the names of its contributors may be used to
endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE INTEL OR
ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
END_LEGAL */

16
J itting time routine: Pin CallBack
restore eflags
mov 0x1, %edi
jle <L1>
Instruction Counting Tool (inscount.dll)
switch to pin stack
save registers
call docount
restore registers
switch to app stack inc icount
inc icount
inc icount
inc icount
sub $0xff, %edx
cmp %esi, %edx
save eflags
Execution time routine
#include "pin.h"

UINT64 icount = 0;

void docount() { icount++; }

void Instruction(INS ins, void *v)
{
INS_InsertCall(ins, IPOINT_BEFORE,
(AFUNPTR)docount, IARG_END);
}

void Fini(INT32 code, void *v)
{ std::cerr << "Count " << icount << endl; }

int main(int argc, char * argv[]) {
PIN_Init(argc, argv);
INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0);
PIN_StartProgram(); // Never returns
return 0; }
Instrumentation routine
Called during jitting of INS
Analysis routine
Executes each time jitted
INStruction executes
INS is only valid inside this
Instrumentation routine
For 10pts:
Which function is the
analysis routine?
For 20pts:
Which function is executed
more often?

17
Instrumentation vs. Analysis
Concepts borrowed from the ATOM tool:
Instrumentation routines define where
instrumentation is inserted
e.g., before instruction
C Occurs when an instruction is being jitted
Analysis routines define what to do when
instrumentation is activated
e.g., increment counter
C Occurs every time an instruction is
executed

18
Application
Code and
Data
Application Process
System Call
Dispatcher
Event
Dispatcher
Thread
Dispatcher
PINVM.DLL
inscount.dll
PIN.LIB

Code
Cache
NTDLL.DLL
Windows kernel
Launcher
PIN.EXE
Launcher Process
Boot Routine +
Data:
firstAppIp,
Inscount.dll

pin.exe t inscount.dll gzip.exe input.txt
First
app
IP
Read a Trace from Application Code
from inscount.dll
Encode the Jitted trace into the
Code Cache
D
e
c
o
d
e
r

E
n
c
o
d
e
r

19
Trace
Original
code
Trace
BBL#3
BBL#2
BBL#1
Early Exit via Stub
Trace Exit via Stub
Early Exit via Stub
BBL#2 BBL#4
BBL#1
BBL#3
BBL# 5 BBL# 6
BBL# 7
FT
FT
TK
TK

Trace: A sequence of continuous instructions, with
one entry point
BBL: has one entry point and ends at first control
transfer instruction

20

#include "pin.H"

UINT64 icount = 0;

void PIN_FAST_ANALYSIS_CALL docount(INT32 c) { icount += c; }

void Trace(TRACE trace, void *v){// Pin Callback
for(BBL bbl = TRACE_BblHead(trace);
BBL_Valid(bbl);
bbl = BBL_Next(bbl))
BBL_InsertCall(bbl, IPOINT_ANYWHERE,
(AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL,
IARG_UINT32, BBL_NumIns(bbl),
IARG_END);
}

void Fini(INT32 code, void *v) {// Pin Callback
fprintf(stderr, "Count %lld\n", icount);
}

TRACE_AddInstrumentFunction(Trace, 0);
PIN_StartProgram();
return 0;
}
ManualExamples/inscount2.cpp

21

20 0x001de0000 mov r14, 0xc5267d40 //inscount2.docount
58 0x001de000a add [r14], 0x2 //inscount2.docount
2 0x001de0015 0x77ec4600 cmp rax, rdx
9 0x001de0018 jz 0x1deffa0 L1 //patched in future
52 0x001de001e mov r14, 0xc5267d40 //inscount2.docount
29 0x001de0028 mov [r15+0x60], rax
57 0x001de002c lahf
37 0x001de002e seto al
50 0x001de0031 mov [r15+0xd8], ax
30 0x001de0039 mov rax, [r15+0x60]
12 0x001de003d add [r14], 0x2 //inscount2.docount
40 0x001de0048 0x77ec4609 movzx edi, [rax+0x2] //ecx alloced to edi
22 0x001de004c push 0x77ec4612 //push retaddr
61 0x001de0051 nop
17 0x001de0052 jmp 0x1deffd0 L2 //patched in future

L2:
24 0x001deffd0 mov [r15+0x40], rsp // save app rsp
34 0x001deffd4 mov rsp, [r15+0x2d0] // switch to pin stack
66 0x001deffdb call [0x2f000000]// call VmEnter
// data used by VmEnter pointed to by return-address of call
0x001deffe8_svc(VMSVC_XFER)
0x001defff0_sct(0x00065fb60) // current register mapping
0x001defff8_iaddr(0x077ef7870) // app target IP of
// call at 0x77ec460d
L1:
41 0x001deffa0 mov [r15+0x40], rsp // save app rsp
63 0x001deffa4 mov rsp, [r15+0x2d0] // switch to pin stack
56 0x001deffab call [0x2f000000] // call VmEnter
// data used by VmEnter pointed to by return-address of call
0x001deffb8_svc(VMSVC_XFER)
0x001deffc0_sct(0x00065f998) // current register mapping
0x001deffc8_iaddr(0x077f1eac9)// app target IP of jz at 0x77ec4603

APP IP
2 0x77ec4600 cmp rax, rdx
22 0x77ec4603 jz 0x77f1eac9
40 0x77ec4609 movzx ecx, [rax+0x2]
37 0x77ec460d call 0x77ef7870
save
status
flags
Application Trace

How many BBLs in this
trace?
Compiler
generated code
for docount
Inlined by Pin
r14 allocated by
Pin

r15 allocated by
Pin

Points to
per-thread spill
area

22
#include "pin.H"
INT32 numThreads = 0;
const INT32 MaxNumThreads = 10000;
struct THREAD_DATA
{
UINT64 _count;
UINT8 _pad[56]; // guess why? }icount[MaxNumThreads];
// Analysis routine
VOID PIN_FAST_ANALYSIS_CALL docount(ADDRINT c, THREADID tid) { icount[tid]._count += c;}
// Pin Callback
VOID ThreadStart(THREADID threadid, CONTEXT *ctxt, INT32 flags, VOID *v){numThreads++;}

VOID Trace(TRACE trace, VOID *v) { // Jitting time routine: Pin Callback
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl))
BBL_InsertCall(bbl, IPOINT_ANYWHERE, (AFUNPTR)docount, IARG_FAST_ANALYSIS_CALL,
IARG_UINT32, BBL_NumIns(bbl), IARG_THREAD_ID, IARG_END); }

VOID Fini(INT32 code, VOID *v){// Pin Callback
for (INT32 t=0; t<numThreads; t++)
printf ("InsCount[of thread#%d]= %d\n",t,icount[t]._count); }

for (INT32 t=0; t<MaxNumThreads; t++) {icount[t]._count = 0;}
PIN_AddThreadStartFunction(ThreadStart, 0);
PIN_StartProgram(); return 0; }
SimpleExamples/inscount2_mt.cpp
Why is there NO synchronization?

23
Multi-Threading

Pin supports multi-threading

Application threads execute jitted code including
instrumentation code (inlined and not inlined), without any
serialization introduced by Pin

Instrumentation code can use Pin and/or OS synchronization
constructs to introduce serialization if needed.
Will see examples of this in Part4

Pin provides APIs for thread local storage.
Will see examples in Part3

Pin callbacks are serialized

Jitting is serialized
Only one application thread can be jitting code at any time

24
#include "pin.h#include <map>
std::map<ADDRINT, std::string> disAssemblyMap;
VOID ReadsMem (ADDRINT applicationIp, ADDRINT memoryAddressRead, UINT32 memoryReadSize) {
printf ("0x%x %s reads %d bytes of memory at 0x%x\n",
applicationIp, disAssemblyMap[applicationIp].c_str(),
memoryReadSize, memoryAddressRead);}

VOID Instruction(INS ins, void * v) {// Jitting time routine
// Pin Callback
if (INS_IsMemoryRead(ins))
{
disAssemblyMap[INS_Address(ins)] = INS_Disassemble(ins);
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) ReadsMem,
IARG_INST_PTR,// application IP
IARG_MEMORYREAD_EA,
IARG_MEMORYREAD_SIZE,
IARG_END);
} }

PIN_StartProgram(); }

Memory Read Logger Tool
Switch to pin stack
push 4
push %eax
push 0x7f083de
call ReadsMem
Pop args off pin stack
Switch back to app stack
inc DWORD_PTR[%eax]
inc DWORD_PTR[%esi]0x8
Switch to pin stack
push 4
lea %ecx,[%esi]0x8
push %ecx
push 0x7f083e4
call ReadsMem
Pop args off pin stack
Switch back to app stack
Pin has
determined that it
can overwrite ecx
Many other IARGs IARG_BRANCH_TARGET_ADDR

The target address of a branch
instruction when executed (including
indirect branches)
IARG_BRANCH_TAKEN

Is the branch taken (when executed)
IARG_REG_VALUE, <REG>

The value of register REG
IARG_REG_REFERENCE, <REG>

A pointer to a register
IARG_FUNCARG_ENTRYPOINT_VALUE, <ARG#>

The value of ARGUMENT # of the function (RTN instrumentation)
IARG_CONTEXT

Handle to the full register context
of the executing thread
Work in progress:

IARG_MAKE_ME_A_COFFEE
Pin does full register
allocation during jitting.

At the start of jitting a
trace, all registers accessed
by the application code, are
considered to be virtual
registers.

Pin allocates each of these
virtual registers to either:
A physical register.
Or a per-thread register
spill area pointed to by
ebx (IA-32)
r15 (Intel64)

25
#include "pin.H"
void * MallocWrapper( CONTEXT * ctxt, AFUNPTR pf_malloc, size_t size)
{ // Simulate out-of-memory every so often
void * res;
if (TimeForOutOfMem())
return (NULL);
PIN_CallApplicationFunction(ctxt, PIN_ThreadId(),
CALLINGSTD_DEFAULT, pf_malloc,
PIN_PARG(void *), &res, PIN_PARG(size_t), size);
return res; }

VOID ImageLoad(IMG img, VOID *v) { // Pin callback. Registered by IMG_AddInstrumentFunction
if (strstr(IMG_Name(img).c_str(), "libc.so") ||
strstr(IMG_Name(img).c_str(), "MSVCR80") || strstr(IMG_Name(img).c_str(), "MSVCR90"))
{
RTN mallocRtn = RTN_FindByName(img, "malloc");

PROTO protoMalloc = PROTO_Allocate( PIN_PARG(void *), CALLINGSTD_DEFAULT,
"malloc", PIN_PARG(size_t), PIN_PARG_END() );

RTN_ReplaceSignature(mallocRtn, AFUNPTR(MallocWrapper),
IARG_PROTOTYPE, protoMalloc,
IARG_CONST_CONTEXT,
IARG_ORIG_FUNCPTR,
IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
IARG_END);
} }

int main(int argc, CHAR *argv[]) {
PIN_InitSymbols();
PIN_Init(argc,argv));
IMG_AddInstrumentFunction(ImageLoad, 0);
Malloc Wrapping
This is rather expensive
Also has 2 synchroniztion
points
The ImageLoad callback is called for each image
(exe, shared library) loaded into the process

It is called before any code in the loaded image
is executed

This is referred to ahead-of-time-instrumentation

26
#include "pin.H
ADDRINT mallocReturnIp = 0;

VOID BeforeMalloc(ADDRINT returnIp, ADDRINT size) {
mallocReturnIp = returnIp;
printf ("(tool) call to malloc for %5d bytes call will return to %p\n", size, returnIp); }

int CheckReturn(ADDRINT sp, ADDRINT returnRegVal) { // is this a return from malloc
return (mallocReturnIp == *(reinterpret_cast<ADDRINT *>(sp))); }

VOID ProcessReturnFromMalloc(ADDRINT ip, ADDRINT returnVal) {
printf ("(tool) return from malloc at %p malloc returns %p\n", ip, returnVal);
mallocReturnIp = 0; }

static void Instruction(INS ins, void *v) {
if( INS_IsRet(ins)) {
INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)CheckReturn,
IARG_REG_VALUE, REG_STACK_PTR, IARG_END);
INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)ProcessReturnFromMalloc,
IARG_INST_PTR, IARG_FUNCRET_EXITPOINT_VALUE, IARG_END); }}

if (strstr(IMG_Name(img).c_str(), "libc.so") || strstr(IMG_Name(img).c_str(), "MSVCR80)) {
RTN_Open(mallocRtn);
RTN_InsertCall(mallocRtn, IPOINT_BEFORE, AFUNPTR(BeforeMalloc),
IARG_RETURN_IP, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
RTN_Close(mallocRtn); } }

int main(int argc, CHAR *argv[]) { PIN_InitSymbols(); PIN_Init(argc,argv);

Cheaper Malloc Wrapping
IPOINT_AFTER is NOT
guaranteed to find all
returns
Function is inlinable
Function returns TRUE/FALSE
Function is not inlinable
Function will be called iff
The IfCall function
(CheckReturn)
Returns TRUE

27
Pin Probe-Mode
Probe mode is a method of using Pin to instrument at the
function level only. Wrap, Replace, call Analysis function
before/after.

Replacement or Wrapping function can call the replaced
(original) function.

The application and the replacement routine are run natively
(not Jitted).
Faster than Jit-mode
Puts more responsibility on the tool writer.
Probes can only be placed on RTN boundaries
Must be inserted within the Image load callback.
Pin will automatically remove the probes when an image is unloaded.

Many of the PIN APIs that are available in JIT mode are not
available in Probe mode.

28
Entry point overwritten with probe:
0x400113d4: jmp 0x41481064

0x400113d9: push %ebx

A Sample Probe
A probe is a jump instruction that overwrites
original instruction(s) in the application
Instrumentation invoked with probes
Pin copies/translates original bytes so probed (replaced)
functions can be called from the replacement function
Copy of entry point with original bytes:
0x50000004: push %ebp
0x50000005: mov %esp,%ebp
0x50000007: push %edi
0x50000008: push %esi
0x50000009: jmp 0x400113d9
0x41481064: push %ebp // tool wrapper func
::::::::::::::::::::
0x414827fe: call 0x50000004 // call original func
Original function entry point:
0x400113d4: push %ebp
0x400113d5: mov %esp,%ebp
0x400113d7: push %edi
0x400113d8: push %esi

29
#include "pin.H"
void * MallocWrapper(AFUNPTR pf_malloc,
size_t size)
void * res;
return (NULL);
res = pf_malloc(size);
return res; }

VOID ImageLoad (IMG img, VOID *v) {
{

if ( RTN_Valid(mallocRtn) &&
RTN_IsSafeForProbedReplacement(mallocRtn) )
{
PROTO proto_malloc = PROTO_Allocate(PIN_PARG(void *), CALLINGSTD_DEFAULT, "malloc",
PIN_PARG(size_t), PIN_PARG_END() );

RTN_ReplaceSignatureProbed (mallocRtn,
AFUNPTR(MallocWrapper),
IARG_PROTOTYPE, proto_malloc,
IARG_ORIG_FUNCPTR,
IARG_END);
} }}

PIN_InitSymbols(); PIN_Init(argc,argv));
PIN_StartProgramProbed(); }
Malloc Wrapping Probe-Mode

30
SDE
SDE: A fast functional simulator for
applications with new instructions
New instructions have been defined
Compiler generates code with new
instructions
What can be used to run the apps with
the new instructions?
Use PinTool that emulates new instructions.

vmovdqu ymm?, mem256 vmovdqu mem256, ymm?
16 new 256 bit ymm registers
Read/Write ymm register from/to memory.

31
Application
Code and
Data
Application Process
System Call
Dispatcher
Event
Dispatcher
Thread
Dispatcher
PINVM.DLL
sde.dll
PIN.LIB

Code
Cache
NTDLL.DLL
Windows kernel
Launcher
PIN.EXE
Launcher Process
Boot Routine +
Data:
firstAppIp,
sde.dll

pin.exe -t sde.dll -- gzip.exe input.txt
First
app
IP
Read a Trace from Application Code
from sde.dll
Encode the Jitted trace into the
Code Cache
Execute it
D
e
c
o
d
e
r

E
n
c
o
d
e
r

Decoder can decode new
instructions

Host CPU can NOT execute them
New instruction is replaced with
call to emulation function in the tool
gzip.exe compiled with
compiler that generates new
instructions

32
#include "pin.H"

VOID EmVmovdquMem2Reg(unsigned int ymmDstRegNum, ADDRINT * ymmMemSrcPtr) {
PIN_SafeCopy(ymmRegs[ymmDstRegNum], ymmMemSrcPtr, 32); }

VOID EmVmovdquReg2Mem(int ymmSrcRegNum, ADDRINT * ymmMemDstPtr) {
PIN_SafeCopy(ymmMemDstPtr, ymmRegs[ymmRegNum], 32); }

VOID Instruction(INS ins, VOID *v) {
switch (INS_Opcode(ins)
{
:::::
case XED_ICLASS_VMOVDQU:
if (INS_IsMemoryRead(ins)) // vmovdqu ymm? <= mem256
(AFUNPTR)EmVmovdquMem2Reg,
IARG_UINT32, REG(INS_OperandReg(ins, 0)) - REG_YMM0,
IARG_MEMORYREAD_EA,
IARG_END);
else if (INS_IsMemoryWrite(ins)) // vmovdqu mem256 <= ymm?
(AFUNPTR)EmVmovdquReg2Mem,
IARG_UINT32, REG(INS_OperandReg(ins, 1)) - REG_YMM0,
IARG_MEMORYWRITE_EA,
IARG_END);
INS_DeleteIns(ins); //Processor does NOT execute this instruction
break;
} }

sde_emul.dll Schema

33
pin t inscount.so gzip input.txt
Linux Invocation+Injection
gzip input.txt
Child
(Injector)
Pin (Injectee)
PinTool that counts application
instructions executed, prints Count
at end
fork
exitLoop = FALSE;
Ptrace TraceMe
while(!exitLoop){}
Ptrace Injectee Injectee Freezes
Injectee.exitLoop = TRUE;

execv(gzip);
// Injectee Freezes
Ptrace continue (unFreezes Injectee)
Ptrace Copy (save, gzip.CodeSegment, sizeof(MiniLoader))
PtraceGetContext (gzip.OrigContext)
PtraceCopy (gzip.CodeSegment, MiniLoader, sizeof(MiniLoader))

Ptrace continue@MiniLoader (unFreezes Injectee)
MiniLoader loads Pin+Tool,
allocates Pin stack
Kill(SigTrace, Injector):
Freezes until Ptrace Cont
Execution of Injector
resumes after execv(gzip)
in Injectee completes

Ptrace Detach
Wait for MiniLoader
complete (SigTrace from
Injectee)

Pin Code and
Data
MiniLoader

Pin Code and
Data
MiniLoader

gzip Code and
Data
Code to Save
Code to Save
MiniLoader
Code to Save
Ptrace Copy (gzip.CodeSegment, save, sizeof(MiniLoader))
Ptrace Copy (gzip.pin.stack, gzip.OrigCtxt, sizeof (ctxt))
Ptrace SetContext (gzip.IP=pin, gzip.SP=pin.Stack)

gzip OrigCtxt

Pin Code and
Data
MiniLoader
Inscount2.so
gzip (Injectee)

Pin stack
gzip OrigCtxt
IP

34
Part1 Summary
Pin is Intels dynamic binary instrumentation engine

Pin can be used to instrument all user level code
Windows, Linux
IA-32, Intel64, IA64
Product level robustness
Jit-Mode for full instrumentation: Thread, Function, Trace, BBL, Instruction
Probe-Mode for Function Replacement/Wrapping/Instrumentation only.
Pin supports multi-threading, no serialization of jitted application nor of instrumentation code

Pin API makes Pin Tools easy to write
Presented 6 full Pin tools, each one fit on 1 ppt slide


Free DownLoad
www.pintool.org
Includes: Detailed user manual, source code for 100s of Pin tools

Pin User Group

35
Part2: Larger Pin tools and writing
efficient Pin tools

36
CMP$im A CMP Cache Simulation Pin Tool
ThreadID
Address,
Size
Access
Type
Params to
configure # cache
levels, size,
threads/cache etc
PIN
Cache model
WORK LOAD
PRIVATE LLC/SHARED BANKED LLC
DL1 DL1 DL1 DL1 DL1 DL1 DL1 DL1
LLC LLC LLC LLC LLC LLC LLC LLC
ThreadID, Address, Size, Access Type
L2 L2 L2 L2 L2 L2 L2 L2
INTERCONNECT
Instrumentation Routines
Modeling an 8-core CMP using CMP$im
CMP$im author: Aamer.Jaleel@intel.com

37
A
N
A
L
Y
S
I
S

R
O
U
T
I
N
E
S

I
N
S
T
R

R
O
U
T
I
N
E
S

M
A
I
N

Pin Tool
CMP$im Instrument Memory
References
VOID Instruction(INS ins, VOID *v)
{
if( INS_IsMemoryRead(ins) ) // If instruction reads
// from memory
INS_InsertCall(ins,
IPOINT_BEFORE, (AFUNPTR)MemoryReference,
IARG_THREAD_ID, IARG_MEMORYREAD_EA,
IARG_MEMORYREAD_SIZE, IARG_UINT32,
ACCESS_TYPE_LOAD, IARG_END);
if( INS_IsMemoryWrite(ins) ) // If instructions writes
// to memory
(AFUNPTR) MemoryReference,
IARG_THREAD_ID, IARG_MEMORYWRITE_EA,
IARG_MEMORYWRITE_SIZE, IARG_UINT32,
ACCESS_TYPE_STORE, IARG_END);
}

38
A
N
A
L
Y
S
I
S

R
O
U
T
I
N
E
S

I
N
S
T
R

R
O
U
T
I
N
E
S

M
A
I
N

CMP$im Analyze Memory
References
#include cache_model.h

CACHE_t CacheHierarchy[MAX_NUM_THREADS][MAX_NUM_LEVELS];

VOID MemoryReference(
int tid, ADDRINT addrStart, int size, int type)
{
for(addr=addrStart; addr<(addrStart+size);
addr+=LINE_SIZE)
LookupHierarchy( tid, FIRST_LEVEL_CACHE, addr, type);
}

VOID LookupHierarchy(
int tid, int level, ADDRINT addr, int accessType) {
result = cacheHier[tid][cacheLevel]->Lookup(
addr, accessType );
if( result == CACHE_MISS ) {
if( level == LAST_LEVEL_CACHE ) return;
if( IsShared(level) ) AcquireLock(&lock[level], tid);
LookupHierarchy(tid, level+1, addr, accessType);
ReleaseLock(&lock[level]);
}
}
Synchronization
point

39
Intel Thread Checker
Detect data races

Instrumentation
Memory operations
Synchronization operations

Analysis
Use dynamic history of lock acquisition and release
to form a partial order of memory references
[Lamport 1978]
Unordered read/write and write/write pairs to same
location are races
Paul Petersen,
Zhiqiang Ma

40
40

a documented data race in the
art benchmark is detected

41
PinPlay : Workload capture and
deterministic replay
Problem : Multi-threaded programs are inherently non-
deterministic making their analysis, simulation,
debugging very challenging

Solution: PinPlay : A Pin-based framework for
capturing an execution of multi-threaded program and
replaying it deterministically under Pin

Harish Patil & Cristiano Pereira
Joint work with Brad Calder, UCSD
PinPlay
LOGS
Deterministic
replay on any
machine
Application
logger
Pin
i
n
p
u
t

Application
Replayer
Pin
App and input
not needed
once we have
the log

42
Logging to provide deterministic
behavior
Start with checkpoint: memory image of code and
data
A thread is deterministic if every loads sees either:
Data from original checkpoint
Or a value computed and stored on the thread

Potential non-determinism when a load sees a
memory location written by an external agent
Another thread
Or system call, DMA, etc.
Log these values with timestamps

43

Applying multi-threaded tracing to
software tools
Debugging. Customer interested in debugging tools
derived from PinPlay
Capture bug at customer, bring home log to debug
Capture multi-threaded heisenbug, replay multiple times
How: combine PinPlay tracing with transparent debugging
PinPlay
LOGS
Replayer
Pin
Debug
Agent
debugger
Standard
protocol
Pin debug agent enables
custom debugger commands

44
Total Overhead = Pin Overhead + Pintool Overhead
~5% for SPECfp and ~50% for SPECint
Pin teams job is to minimize this
Usually much larger than pin overhead
Pintool writers can help minimize this!
Reducing Instrumentation
Overhead

45
Instrumentation
Routines
Overhead
Pintools Overhead
Frequency of calling
an Analysis Routine
Work required for transiting
to Analysis Routine
Reducing the Pintools Overhead
Analysis
Routines
Overhead
+
Work required in the
Analysis Routine
x
Work done inside
Analysis Routine
+

46
Reducing Work in Analysis Routines

Key: Shift computation from analysis routines to
instrumentation routines whenever possible

This usually has the largest speedup

47
Counting control flow edges
call
jne
ret
jne
jmp
100
60
40
60
40
40
1

48
Edge Counting: a Slower Version
...
void docount2(ADDRINT src, ADDRINT dst, INT32 taken)
{
COUNTER *pedg = Lookup(src, dst);
pedg->count += taken;
}
void Instruction(INS ins, void *v) {
if (INS_IsBranchOrCall(ins))
{
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount2,
IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR,
IARG_BRANCH_TAKEN, IARG_END);
}
}
...
Instrumentation
Analysis

49
Edge Counting: a Faster Version
void docount(COUNTER* pedge, INT32 taken) {
}
void docount2(ADDRINT src, ADDRINT dst, INT32 taken) {
COUNTER *pedg = Lookup(src, dst);
}
void Instruction(INS ins, void *v) {
if (INS_IsDirectBranchOrCall(ins)) {
COUNTER *pedg = Lookup(INS_Address(ins),
INS_DirectBranchOrCallTargetAddress(ins));
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount,
IARG_ADDRINT, pedg, IARG_BRANCH_TAKEN, IARG_END);
} else
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR) docount2,
IARG_INST_PTR, IARG_BRANCH_TARGET_ADDR,
IARG_BRANCH_TAKEN, IARG_END);
}

Analysis
Instrumentation

50

Key: Instrument at the largest granularity
whenever possible

Instead of inserting one call per instruction
Insert one call per basic block or trace

Analysis Routines: Reduce Call
Frequency

51
Slower Instruction Counting
sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax
counter++;
counter++;
counter++;
counter++;
counter++;

52
Faster Instruction Counting
sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax
counter += 3
counter += 2
Counting at BBL level
sub $0xff, %edx

cmp %esi, %edx

jle <L1>

mov $0x1, %edi

add $0x10, %eax
counter += 5
Counting at Trace level
counter+=3
L1

53
Reducing Work for Analysis Transitions
Reduce number of arguments to analysis
routines
Inline analysis routines
Pass arguments in registers
Instrumentation scheduling

54
Reduce Number of Arguments
Eliminate arguments only used for debugging
Instead of passing TRUE/FALSE, create 2 analysis
functions
Instead of inserting a call to:
Analysis(BOOL val)
Insert a call to one of these:
AnalysisTrue()
AnalysisFalse()
IARG_CONTEXT is very expensive (> 10
arguments)
Use the new IARG_CONST_CONTEXT

55
Inlining
int docount0(int i) {
x[i]++
return x[i];
}
Inlinable
if (i == 1000)
x[i]++;
return x[i];
}
Not-inlinable
x[i]++;
printf(%d, i);
return x[i];
}
Not-inlinable
void docount3() {
for(i=0;i<100;i++)
x[i]++;
}
Not-inlinable
Pin will inline analysis functions into
jitted application code

56
Inlining
Use the log_inline invocation switch to record inlining decisions in pin.log

pin log_inline t mytool app

Look in pin.log

Analysis function (0x2a9651854c) from mytool.cpp:53 INLINED

Analysis function (0x2a9651858a) from mytool.cpp:178 NOT INLINED
The last instruction of the first BBL fetched is not a ret instruction

Look at source or disassembly of the function in mytool.cpp at line 178

0x0000002a9651858a push rbp
0x0000002a9651858b mov rbp, rsp
0x0000002a9651858e mov rax, qword ptr [rip+0x3ce2b3]
0x0000002a96518595 inc dword ptr [rax]
0x0000002a96518597 mov rax, qword ptr [rip+0x3ce2aa]
0x0000002a9651859e cmp dword ptr [rax], 0xf4240
0x0000002a965185a4 jnz 0x11

The function could not be inlined because it contains a control-flow changing
instruction (other than ret)

57
Conditional Inlining
Inline a common scenario where the analysis
routine has a single if-then
The If part is always executed
The then part is rarely executed
Useful cases:
1. If can be inlined, Then is not
2. If has small number of arguments, then has many arguments
(or IARG_CONST_CONTEXT)

Pintool writer breaks analysis routine into two:
INS_InsertIfCall (ins, , (AFUNPTR)doif, )
INS_InsertThenCall (ins, , (AFUNPTR)dothen, )

58
IP-Sampling (a Slower Version)
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)IpSample,
IARG_INST_PTR, IARG_END);
}
VOID IpSample(VOID* ip) {
--icount;
if (icount == 0) {
fprintf(trace, %p\n, ip);
icount = N + rand()%M; //icount is between <N, N+M>
}
}
const INT32 N = 10000; const INT32 M = 5000;
INT32 icount = N;

59
IP-Sampling (a Faster Version)
// CountDown() is always called before an inst is executed
INS_InsertIfCall(ins, IPOINT_BEFORE, (AFUNPTR)CountDown,
IARG_END);

// PrintIp() is called only if the last call to CountDown()
// returns a non-zero value
INS_InsertThenCall(ins, IPOINT_BEFORE, (AFUNPTR)PrintIp,
IARG_INST_PTR, IARG_END);
}
INT32 CountDown() {
--icount;
return (icount==0);
}
VOID PrintIp(VOID *ip) {
fprintf(trace, %p\n, ip);
icount = N + rand()%M; //icount is between <N, N+M>
}
inlined
not inlined

60
Using Liveness Information
Use -xyzzy liveness 1

App does NOT have exception handler that examines
registers that are dead to the application

Perf gain mainly from NOT having to save app flags

61
Jitting time
Jitting is expensive
Takes far more time to jit an instruction than to execute
a jitted instruction

Portions of a workload where very many IPs are
being jitted, and executed a small number of
times
Jitting time dominates execution time
E.g.
startup of a GUI app
Compiler compiling a non-large file
Vs Loop executing a large number of times
Jitting time is amortized over execution time

62
Optimizing Your Pintools -
Summary
Baseline Pin has fairly low overhead for non-jitting
portions of workloads (~5-20%)

Adding instrumentation can increase overhead
significantly, but you can help!

1. Move work from analysis to instrumentation
routines
2. Explore larger granularity instrumentation
3. Explore conditional instrumentation
4. Understand when Pin can inline instrumentation

63
OS Specifics

64
Windows-specific Challenges (1/2)
Handling system calls
Pin must intercept system calls to regain control of the application
on return from the system
Pin must monitor system calls to notify instrumentation when DLLs
are loaded/unloaded, threads are created/terminated, etc.
System call interface is undocumented
System call numbers potentially change with each system
build
Handling exceptions and asynchronous interruptions
To maintain control and notify instrumentation about control flow
changes Pin must intercept all transitions from kernel to user mode
Windows is not designed to have an independent agent
interposed between the kernel and application
The kernel dispatches interruptions via (undocumented) entry
points in ntdll.dll
The main obstacle: direct interface between user-level code and Windows kernel is undocumented

65
Windows-specific Challenges (2/2)
Injection
PIN VMM is a DLL that must be loaded into the address space of the
application to get initial control of the process
Windows is not designed for proprietary loader
Common practice: intercept control at the entry point of the
application
Instrumentation can not observe initialization procedures in
staticly linked application DLLs
Injection presented in the introduction is referred to as Late
Injection
It misses the initialization procedures in staticly linked application DLLs
Early injection is not trivial
Isolation of instrumentation from the application
Instrumentation runs in the same process as the application it is
observing
Enabling C run-time in the instrumentation causes sharing of system
libraries (e.g. kernel32.dll) and their state with the application
To be transparent, Pin must
Preserve original state of system resources
Avoid reentrant use of shared libraries
Pin minimizes its dependence on Windows system services in order to maximize observability and achieve better
isolation

66
Windows Kernel
PINTOOL.DLL
PINVM.DLL - Virtual Machine Monitor
Pin Architecture in Windows
Shared memory
Startup time
CreateProcess
CreateProcess
PIN.EXE PIN.EXE
Application Process
Launcher
Code
Cache
Application
Code and Data
KERNEL32.DLL NTDLL.DLL
Injection Helper
Symbol Server DBGHELP.DLL
System Gate
System Call
Emulator
Event
Dispatcher
Thread
Dispatcher
JIT Compiler

67
Injection
Injection is the procedure for loading the PINVM.DLL into the
address space of an application and gaining control of execution
Other systems hook the entry point of the application
Too late: initialization procedures in application DLLs can not be
instrumented
For maximum observability, Pin should inject itself into a new
process as early as possible, however
Pin depends on some basic system services so it is not possible to
load PINVM.DLL until the loader and kernel32.dll have initialized
The optimal injection point: just after initialization of
kernel32.dll
Injection presented in the introduction is referred to as Late
Injection
It misses the initialization procedures in staticly linked application DLLs

68
Launch of the Instrumented Process
pin t pintool.dll -- application.exe
Pin Boot Routine
Create (suspended) application process
Attach to the application as a debugger
Run the application process until kernel32.dll
is loaded and initialized
Copy Boot Routine into the application
process and set PC to start of the routine
Detach from the application process
Load and start Pin VMM
Load the instrumentation tool
PIN.EXE
Pin Boot Routine
PINVM.DLL
Instrument and execute the application
Debugging API
All application instructions are executed under Pin control
PIN.EXE
Windows Kernel
Application Process
NTDLL.DLL
APPLICATION.EXE
APPLICATION.DLL
KERNEL32.DLL
PINVM.DLL
PINTOOL.DLL

69
Handling System Calls
Pin must manage the execution of system calls
To regain control when the system returns to user mode with a modified
thread context
To monitor and handle some important system events
Loading DLLs, creation and termination of threads and processes, etc.
Pin intercepts system call instructions, not Win32 APIs
Pin instruments all modules in the user space, including system libraries
Some applications use native API (NTDLL interface) directly, bypassing
Win32 API
Win32 API layer is very wide, while system call instructions are easy to
discover
Three steps in managing system calls:
Detect a system call and redirect control to VMM
Execute the system call on behalf of the application
Regain control when the kernel returns to user with a new context
The system may interrupt system call execution by asynchronous calls to
application procedures

70
System Call Interception
Pin detects system call instructions when it generates traces
in the code cache
IA-32: sysenter and int 2E; Intel64: syscall; etc.
This is a static analysis, so the overhead is low
Pin executes system calls in VMM, not in the code cache -
emits jump to VMM instead of the system call instruction
Enables flushing the code cache while a system call blocks in the kernel
VM lock is NOT held during the actual syscall
Some system calls may affect Pins internal state. To handle
them properly, Pin must know the corresponding system call
numbers
Windows system call numbers are unpublished and potentially change
with each system build
Pin discovers system call numbers dynamically, on the early stage of the
injection process
We trace the corresponding NTDLL functions until a system call instruction is
reached and then read the system call number from the EAX/RAX register

71
System Call Execution
The System Call Emulator executes all known system calls that
may affect the VMM state, e.g. memory mappings, creation and
termination of threads and processes, etc.
The remaining, unknown system calls are forwarded to the
System Gate
Per-thread procedure that transparently executes system calls and regains
control upon return or interruption
Fills/spills original context before/after system calls
Recovers original context (PC) when a system call is interrupted
Switch to the application context
int 2e
Switch to the Pin context
System Call Emulator
System Gate
Code Cache
Original Code
sysenter
jmp VMM
Notify Pintool before system call
ReturnFromSystemCall:
Notify Pintool after system call
Is known system call?
Execute/Emulate
Y
jmp ReturnFromSystemCall
N
System Gate executes system calls blindly, assuming that each of them can arbitrarily
modify context and control flow (if interrupted)

72
User Procedure Calls (UPC)
UPC is a control transfer from the kernel to a user-level
procedure
Asynchronous procedure call (APC)
Asynchronous events: file I/O completion, timer expiration
Thread initialization APC signals start of a new thread
Callback
Asynchronous Windows GUI message
Exception
Access violation, illegal instruction, divide by zero, etc.
Asynchronous events are not delivered immediately, but wait
in queue until the application invokes an interruptible
(alertable) system call
Pin must intercept UPCs to maintain control of the
application and recover the original interruption
context (visible to the application)

73
UPC Interception
The kernel dispatches UPCs through entry points in NTDLL.DLL
To intercept UPCs, Pin overwrites the NTDLL entry points with trampolines that
jump to the Event Dispatcher in Pin
When a UPC is intercepted, Pin recovers original interruption context in the UPC
frame prepared by the kernel
JIT Compiler recovers context of exceptions that occurred in the code cache
System Call Emulator recovers context of interrupted system calls
NTDLL.DLL
Windows kernel
Pin VMM
Code
Cache
UPC Dispatcher
KiUserApcDispatcher KiUserCallbackDispatcher KiUserExceptionDispatcher
Translated
KiUserApcDispatcher
Translated
KiUserCallbackDispatcher
Translated
KiUserExceptionDispatcher
APC Callback Exception
Recover original PC
in the
APC frame
Save PC of the
interrupted
system call
Recover original
context in the
exception frame
Pin intercepts all control transfers from the kernel to the user mode

74
Exceptions (1/2)
Unlike APCs and callbacks that are queued and delivered at the
next alertable system call, exceptions are synchronous events
Exceptions do not necessarily cause abnormal termination of the
process the application may expect and handle exceptions
Pin must provide exception handlers with the same exception
information that accompanies exceptions in the native
application
Exception context, code and exception-specific parameters
From the Pins perspective, there are three kinds (sources) of
exceptions in Windows applications:
An attempt to fetch an invalid or inaccessible instruction
An attempt to execute a faulting instruction
Software exceptions generated by the application

75
Exceptions (2/2)
Decoder (fetcher) of instructions raises an exception if it
encounters an invalid or inaccessible instruction
When the kernel delivers this exception back to the user mode, Pin skips the
context translation because it sees original PC in the exception context
Other hardware exceptions occur in the code cache
Recovery of the original exception context is nontrivial due to register
allocation
Pin retranslates the interrupted trace to get the virtual-physical register
bindings at the faulting point
Optimization: small cache of register bindings for frequent exceptions
Other hardware exceptions occur in the tool code
Pin APIs for tool to manage its exceptions
Application can generate software exceptions using Win32 API
The exception context represents an original application state
Context translation is not needed
Pin delivers precise exceptions to applications handlers

76
Multithreading Support
Pin instruments and runs all threads of the application
from the first to the last user-mode instruction
Attaches to a new thread when the system delivers the thread
initialization APC
Maintains control until the thread exits
Intercepts threads created by remote processes
Pins threading activities are transparent to the
application
Pin VMM serializes some of its operations (e.g. JIT compilation), but
never executes code of the application under Pin locks
Except for initialization phase, Pin never acquires locks in system
libraries, e.g. loader lock or process heap lock
Each thread has a shadow stack that is used by Pin VMM and Pintool

77
Thread-Local State
Key elements of the Pins thread-local state:
Spill area keeps values of spilled virtual registers
JIT-compiled traces need fast access to spilled register values
Pin steals one physical register to point to the spilling area
TEB state keeps original thread-local state of system libraries,
e.g. last Win32 error value, stack limit
C run-time routines may access/modify these values
Need to preserve the original state while running in Pin VMM or
PinTool
System call state contains information about active and
interrupted system calls in the thread
The information is used to restore the original context on return
from the system
Pin steals one TLS slot from the application to enable
fast access to the thread-local data in Pin VMM

78
Thread Suspension and Context Manipulation
A thread can suspend another thread and read/modify its context
SuspendThread(), GetThreadContext(), SetThreadContext()
Pin must emulate the corresponding system calls to avoid
deadlocks and transparency issues
Target thread may hold a Pin lock
The thread context is not original
Suspended traces disable flushing the code cache
Solution: Force a thread to leave the code cache and wait
until the thread reaches a safe point
Safe point = no locks, not in the code cache, accessible original context
Unlink the suspended trace from successors and let it enter VMM
Block the thread in the safe VMM point or in the System Gate
Use thread-local data to store and access the original context
associated with the safe point

79
Linux-specific Challenges (1/3)
Handling system calls
Pin must intercept system calls to regain control of the application
on return from the system
Pin must monitor system calls to notify instrumentation when DLLs
are loaded/unloaded, threads are created/terminated, etc.
Some system calls may behave differently on different Linux
distributions.
Signal handling
Pin must identify whether the signal originated from the application,
the tool or Pin itself.
Pin cannot seem to interfere with the applications signal
mask.

80
Injection
Pin relies on the ptrace system call for injection. The ptrace system
call has known bugs in several Linux version.
Some platforms do not allow tracing a parent application by a child.
In such cases the application is run on the child -> pid is changed.
Isolation of instrumentation from the application
Instrumentation runs in the same process as the application it is
observing.
Pin and the application share the same physical segment registers. In
probe mode, this restricts the libc versions allowed.

81
GLIBC
Pin must emulate several libc services for several reasons. For example:
Pin may run before libc is initialized e.g. during injection.
Pins libc may get confused. For example: libcs getpid wrapper function incorporates
a cache. The first call to getpid actually calls the getpid system call but any
subsequent calls will access the cache. Upon a fork, the cache is invalidated and the
next call to getpid will again call the getpid system call. Since the process has two
copies of libc, only the applications cache is invalidated and Pins copy is stale.

82
Handling System Calls
Pin must manage the execution of system calls
Pin must maintain control all the time
System calls are executed inside pin and return to the application
In most cases the system call is executed without the pin VM lock
Certain system calls are emulated by pin (see below)
System call emulation
Pin detects if a system call needs emulation.
Pin needs to know the attributes of each memory page for SMC support
Therefore all system calls related to memory are emulated by pin
Signal related system calls are emulated
Creating of new threads and new child processes
Setting/getting of the TLS segment registers
Thread and process termination

83
Signal Handling
Pin registers its own signal handlers for all signals, and saves
the applications handlers.
Pin must handle both synchronous and asynchronous signals.
Asynchronous signals:
These signals may be delivered at will so Pin waits for safe point to deliver
them.
When such a signal arrives, Pins internal handler registers this signal, unlinks the
current trace and resumes execution from the code cache.
At the traces exit point, the executing thread jumps to the VM, thus transferring control
over to Pin. The VM checks if there are pending signals and calls the applications
original signal handlers for these signals (jitting them).
Synchronous signals:
These signals must be delivered immediately.
They may originate from the application, the tool or Pin itself.
Pins internal handler is called, it determines the origin of the signal and propagates the
signal delivery to the tool and application is necessary.
If signal is delivered to the application, the applications signal handler is jitted.

84
84
Multithreading Support
Pin instruments and runs all threads of the application
from the first to the last user-mode instruction
Attaches to the thread upon the first user-space instruction
Maintains control until the thread exits
Pins threading activities are transparent to the
application
The Pin VM serializes some of its operations (e.g. JIT compilation),
but never executes code of the application under Pin locks
Each thread has a shadow stack that is used by the Pin VM and the
Pintool
Pin and pintools are prohibited from using the pthread library due to
conflicts with some internal structures. Therefore Pin provides its own
APIs for thread creation and control.

85
Thread-Local Storage (Linux)
JIT mode segment virtualization
TLS is accessed via the fs (64 bit) or gs (32 bit) segment
register.
Both the application and Pin share this register, but expect
different values.
Pin emulates the applications usage of the fs/gs register thus
isolating the applications TLS for Pins.
Probe mode no TLS usage by Pin
Probe mode does not enable the method described above.
Pin uses its own version of GLIBC which does not use TLS.

86
Isolation/Windows
Pin Tools are compiled to use the static CRT

Pin on Windows does not separate DLLs loaded by
the tool from the application DLLs - it uses the
same system loader.

The tool should not load any DLL that can be shared with
the application.

The tool should avoid static links to any common DLL,
except for those listed in PIN_COMMON_LIBS (see
source\tools\ms.flags file).

87
Isolation/Windows

Pin on Windows guarantees safe usage of C/C++
run-time services in Pin tools, including indirect
calls to Windows API through C run-time library.

Any other use of Windows API in Pin tool is not guaranteed
to be safe

Pin uses some base types that conflict with
Windows types. If you use "windows.h", you may
see compilation errors. So do:

namespace WINDOWS { #include <windows.h> }

88
Isolation/Linux

Pin is injected in to address space and has its own
copy of the dynamic loader and runtime libraries
(GLIBC, etc).

Pin uses a small library of CRT for direct calls to
system calls.

The process has a single signals table (shared
among all threads), pin manages an internal signal
table and emulate all the system calls related to
signals.

89
Isolation/Linux
pthread functions cannot be called from an analysis
or replacement routine

Pintools on Linux need to take care when calling
standard C or C++ library routines from analysis or
replacement functions
because the C and C++ libraries linked into Pintools are
not thread-safe

90
Part3: Deeper into Pin API
Agenda
memtrace_simple tool
membuffer_simple tool
branchbuffer_simple tool
Symbols DebugInfo
Probe-Mode
Multi-Threading
Taint analysis

91
memtrace_simple
Tool code collects pairs of {appIP, memAddr} of
memory accessing instructions into a per-thread
buffer.
Process when no more room in buffer

Collect pairs
of {appIP,
memAddr}
into buffer
No more
room in
Buffer,
process the
entries in the
buffer
Reset the
collection to
start at the
beginning of
the buffer

92
memtrace_simple
Tool code must
Instrument each memory accessing instruction
Determine where in the buffer the {appIP, memAddr} of the
instruction should be written
Determine when the buffer becomes full

Will instrument instructions on Trace level i.e.
Not all instructions in the trace will necessarily execute
each time trace is executed because of early exits.

Will try to allocate, in the buffer, maximum space
needed by trace at the trace start if not enough
space => buffer is full

93
memtrace_simple
Instrumentation code for each memory accessing
instruction in the trace will write its
{appIP, memAddr} pair to a constant offset from
the start of the trace in the buffer.

Empty pairs (those instructions that were NOT executed)
will be denoted by having an appIP==0.

94
memtrace_simple
Early Exit
Trace Exit
Non memory access ins
Instrumentation code for following memory access ins
Memory access ins
Trace
Buffer
endOfTraceReg
endOfBufferReg
TotalSize
Occupied
ByTraceIn
Buffer
If endOf(Previous)TraceReg
+ TotalSizeOccupiedByTraceInBuffer > endOfBufferReg
Then Call BufferFull
endOfTraceReg += TotalSizeOccupiedByTraceInBuffer
appIP
memAddr
appIP
memAddr
appIP
memAddr

95
memtrace_simple
Early Exit
Trace Exit
Non memory access ins
Instrumentation code for following memory access ins
Memory access ins
Trace
Buffer
endOfTraceReg
endOfBufferReg
TotalSize
Occupied
ByTraceIn
Buffer
If endOf(Previous)TraceReg
+ TotalSizeOccupiedByTraceInBuffer > endOfBufferReg
Then Call BufferFull
endOfTraceReg += TotalSizeOccupiedByTraceInBuffer
appIP
memAddr
appIP
memAddr
appIP
memAddr
endOfTraceReg

96
memtrace_simple
Tool will:
iterate thru all INSs of the Trace
Record which ones need to be instrumented (access memory)
Record the ins, the memop, the offset from start of the trace in the buffer
where the {appIP, memAddr} pair of this ins should be written
Get a sum of the TotalSizeOccupiedByTraceInBuffer

Insert the IF-THEN sequence at the beginning of the trace

Insert the update of endOfTraceReg just after the IF-THEN
sequence

iterate thru recorded (memory accessing) INSs of the Trace
Insert the instrumentation code before each recorded memory
accessing instruction
this is the code that writes the {appIP, memAddr} pair into the buffer at the
designated offset (from start of trace) for this INS.

endOfTraceReg and endOfBufferReg are virtual registers allocated
by Pin to the Pin tool.

97
memtrace_simple
TLS_KEY appThreadRepresentitiveKey; // Pin TLS key

REG endOfTraceInBufferReg; // Pin virtual Reg that will hold the pointer to the end of the trace data in
// the buffer
REG endOfBufferReg; // Pin virtual Reg that will hold the pointer to the end of the buffer

struct MEMREF {
ADDRINT appIP;
ADDRINT memAddr;
} ; // structure of the {appIP, memAddr} pair of a memory accessing ins in the buffer

int main(int argc, char * argv[])
{
PIN_Init(argc,argv) ;

// Pin TLS slot for holding the object that represents the application thread
appThreadRepresentitiveKey = PIN_CreateThreadDataKey(0);

// get the registers to be used in each thread for managing the per-thread buffer
endOfTraceInBufferReg = PIN_ClaimToolRegister();
endOfBufferReg = PIN_ClaimToolRegister();

TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0);
PIN_AddThreadFiniFunction(ThreadFini, 0);

PIN_StartProgram();
}

98
memtrace_simple
KNOB<UINT32> KnobNumBytesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_bytes_in_buffer",
"0x100000", "number of bytes in buffer");

APP_THREAD_REPRESENTITVE::APP_THREAD_REPRESENTITVE(THREADID myTid) {
_buffer = new char[KnobNumBytesInBuffer.Value()]; // Allocate the buffer
_numBuffersFilled = 0;
_numElementsProcessed = 0;
_myTid = myTid; }

char * APP_THREAD_REPRESENTITVE::Begin() { return _buffer; }

char * APP_THREAD_REPRESENTITVE:: End() { return _buffer + KnobNumBytesInBuffer.Value(); }

VOID ThreadStart(THREADID tid,
CONTEXT *ctxt,
INT32 flags, VOID *v) // Pin callback on thread creation
{
// There is a new APP_THREAD_REPRESENTITVE object for every thread
APP_THREAD_REPRESENTITVE * appThreadRepresentitive
= new APP_THREAD_REPRESENTITVE(tid);

// A thread will need to look up its APP_THREAD_REPRESENTITVE, so save pointer in Pin TLS
PIN_SetThreadData(appThreadRepresentitiveKey, appThreadRepresentitive, tid);

// Initialize endOfTraceInBufferReg to point at beginning of buffer
PIN_SetContextReg(ctxt, endOfTraceInBufferReg,
reinterpret_cast<ADDRINT>(appThreadRepresentitive->Begin()));

// Initialize endOfBufferReg to point at end of buffer
PIN_SetContextReg(ctxt, endOfBufferReg,
reinterpret_cast<ADDRINT>(appThreadRepresentitive->End())); }

99
memtrace_simple / Trace Instrmnt
void TraceAnalysisCalls(TRACE trace, void *) /*TRACE_AddInstrumentFunction(TraceAnalysisCalls, 0)*/ {
// Go over all BBLs of the trace and for each BBL determine and record the INSs which need
// to be instrumented - i.e. the ins requires an analysis call
TRACE_ANALYSIS_CALLS_NEEDED traceAnalysisCallsNeeded;
DetermineBBLAnalysisCalls(bbl, &traceAnalysisCallsNeeded);

// If No memory accesses in this trace
if (traceAnalysisCallsNeeded.NumAnalysisCallsNeeded() == 0) return;

// APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer will determine if there are NOT enough
// available bytes in the buffer. If there are NOT then it returns TRUE and the BufferFull function is called
TRACE_InsertIfCall(trace, IPOINT_BEFORE,
AFUNPTR(APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer),
IARG_FAST_ANALYSIS_CALL,
IARG_REG_VALUE, endOfTraceInBufferReg, // previous trace
IARG_REG_VALUE, endOfBufferReg,
IARG_UINT32, traceAnalysisCallsNeeded.TotalSizeOccupiedByTraceInBuffer(),
IARG_END);

TRACE_InsertThenCall(trace, IPOINT_BEFORE, AFUNPTR(APP_THREAD_REPRESENTITVE::BufferFull),
IARG_REG_VALUE, endOfTraceInBufferReg,
IARG_THREAD_ID,
IARG_RETURN_REGS, endOfTraceInBufferReg, IARG_END);

TRACE_InsertCall(trace, IPOINT_BEFORE,
AFUNPTR(APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer),

// Insert Analysis Calls for each INS on the trace that was recorded as needing one
traceAnalysisCallsNeeded.InsertAnalysisCalls(); }

Trace level instrumentation Record all INSs in trace that access
memory
At the start of the trace:
InsertIf call to
CheckIfNoSpaceForTraceInBuffer to
check if there is NOT enough room in
the buffer to insert the AppIp,
MemAddr pair for each of the recorded
INSs
CheckIfNoSpaceForTraceInBuffer will
return 1 if there is NOT enough room
in the buffer
In this case the BufferFull function will
be called to process all pairs in the
buffer and set the
endOfTraceInBufferReg to point to the
top of the buffer
Now insert a call to the function
AllocateSpaceForTraceInBuffer which
allocates space in the buffer for the
instrumentations of each of the
memory accessing INSs in the trace to
write their {appIp, memAddr} pairs.
This is done by adding size of the
space needed to the
endOfTraceInBufferReg
Finally, iterate over all of the memory
accessing INSs in the and insert a call
to the analysis routine that records
the {appIp, memAddr} pair into its
allocated space in the buffer

100
memtrace_simple
class ANALYSIS_CALL_INFO {
public:
ANALYSIS_CALL_INFO(INS ins, UINT32 offsetFromTraceStartInBuffer, UINT32 memop) :
_ins(ins),
_offsetFromTraceStartInBuffer(offsetFromTraceStartInBuffer), _memop (memop) {}

void InsertAnalysisCall(INT32 sizeofTraceInBuffer);
private:
INS _ins; INT32 _offsetFromTraceStartInBuffer; UINT32 _memop; };

class TRACE_ANALYSIS_CALLS_NEEDED {
public:
TRACE_ANALYSIS_CALLS_NEEDED() : _numAnalysisCallsNeeded(0), _currentOffsetFromTraceStartInBuffer(0) {}

UINT32 NumAnalysisCallsNeeded() const { return _numAnalysisCallsNeeded; }

UINT32 TotalSizeOccupiedByTraceInBuffer() const { return _currentOffsetFromTraceStartInBuffer; }

void RecordAnalysisCallNeeded(INS ins, UINT32 memop) {
_analysisCalls.push_back(ANALYSIS_CALL_INFO(ins, _currentOffsetFromTraceStartInBuffer,
memop));
_currentOffsetFromTraceStartInBuffer += sizeof(MEMREF);
_numAnalysisCallsNeeded++; }

void InsertAnalysisCalls();
private:
INT32 _currentOffsetFromTraceStartInBuffer;
INT32 _numAnalysisCallsNeeded;
vector<ANALYSIS_CALL_INFO> _analysisCalls; };

void DetermineBBLAnalysisCalls (BBL bbl,
TRACE_ANALYSIS_CALLS_NEEDED * traceAnalysisCallsNeeded) {

for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins)) {
// Iterate over each memory operand of the instruction.
for (UINT32 memOp = 0; memOp < INS_MemoryOperandCount(ins); memOp++)
// Record that an analysis call is needed, along with the info needed to generate the analysis
// call
traceAnalysisCallsNeeded->RecordAnalysisCallNeeded(ins, memOp); } }

Class for recording one memory
accessing INS that will be
instrumented
Class for recording all the
memory accessing INSs in the
Trace
Called for each BBL in the Trace
Records each memory accessing
INS in the BBL into vector of
ANALYSIS_CALL_INFO

101


IARG_END);

IARG_THREAD_ID,



102
memtrace_simple
static ADDRINT PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::CheckIfNoSpaceForTraceInBuffer ( // Pin will inline this function
char * endOfPreviousTraceInBuffer,
char * bufferEnd,
ADDRINT totalSizeOccupiedByTraceInBuffer)
{
return (endOfPreviousTraceInBuffer + totalSizeOccupiedByTraceInBuffer >= bufferEnd);
}

static char * PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::BufferFull ( // Pin will NOT inline this function
char *endOfTraceInBuffer,
ADDRINT tid)
{
// Get this threads APP_THREAD_REPRESENTITVE from the Pin TLS
= static_cast<APP_THREAD_REPRESENTITVE*>
(PIN_GetThreadData(appThreadRepresentitiveKey, tid));

appThreadRepresentitive->ProcessBuffer(endOfTraceInBuffer);

// After processing the buffer, move the endOfTraceInBuffer back to the beginning of the buffer
endOfTraceInBuffer = appThreadRepresentitive->Begin();
return endOfTraceInBuffer;
}

static char * PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::AllocateSpaceForTraceInBuffer (// Pin will inline this function
char * endOfPreviousTraceInBuffer,
ADDRINT totalSizeOccupiedByTraceInBuffer)
{
return (endOfPreviousTraceInBuffer + totalSizeOccupiedByTraceInBuffer);
}

Analysis functions inserted at start
of each trace.

IF call to determine if there is NOT
enough room in the buffer for the
{ appIp, memAddr} pairs of all the
memory accessing INSs in the
trace.
Inserted and Executed at beginning
of each trace.
Inlined by Pin.
Returns 1 if there is NOT enough
room, 0 if there is

THEN call to process the buffer, and
set the endOfTraceInBufferReg to
the beginning of the buffer
Inserted at beginning of each trace,
just AFTER the IF call
Executed only when the IF function
returns 1

Function inserted at the beginning
of each trace, just after the THEN
function.
Executed each time trace executes.
Inlined by Pin
Allocates space in the buffer for the
{appIp, memAddr} pairs of all of
the memory accessing INSs in the
trace.

103


IARG_END);

IARG_THREAD_ID,



104
memtrace_simple
static void PIN_FAST_ANALYSIS_CALL
APP_THREAD_REPRESENTITVE::RecordMEMREFInBuffer ( // Pin will inline this function
char* endOfTraceInBuffer, ADDRINT offsetFromEndOfTrace, ADDRINT appIp, ADDRINT memAddr)
{
*reinterpret_cast<ADDRINT*>(endOfTraceInBuffer+ offsetFromEndOfTrace) = appIp;
*reinterpret_cast<ADDRINT*>(endOfTraceInBuffer+ offsetFromEndOfTrace +sizeof(ADDRINT))
= memAddr;
}

void ANALYSIS_CALL_INFO::InsertAnalysisCall(INT32 sizeofTraceInBuffer)
{
/* the place in the buffer where the {appIp, memAddr} of this _ins should be recorded is
computed by: endOfTraceInBufferReg
-sizeofTraceInBuffer + _offsetFromTraceStartInBuffer(of this _ins) */
INS_InsertCall(_ins, IPOINT_BEFORE,
AFUNPTR(APP_THREAD_REPRESENTITVE::RecordMEMREFInBuffer),
IARG_ADDRINT, ADDRINT(_offsetFromTraceStartInBuffer - sizeofTraceInBuffer),
IARG_INST_PTR,
IARG_MEMORYOP_EA, _memop,
IARG_END);
}

void TRACE_ANALYSIS_CALLS_NEEDED::InsertAnalysisCalls()
{// Iterate over the recorded ANALYSIS_CALL_INFO elements insert the analysis call
for (vector<ANALYSIS_CALL_INFO>::iterator c = _analysisCalls.begin();
c != _analysisCalls.end();
c++)
c->InsertAnalysisCall(TotalSizeOccupiedByTraceInBuffer());
}

105
membuffer_simple
Since managing a per-thread buffer is a necessity of a large
class of Pin tools: Provide Pin APIs to make it (more) easy.

Pin Buffering API, abstracts away the need for a Pin tool to
manage per-thread buffers

PIN_DefineTraceBuffer
Define a per-thread buffer that each application trace can write
data to

INS_InsertFillBuffer
Instrumentation code is generated to write the desired data into
the buffer
This code is inlined

Tool defined BufferFull function, instrumentation code will
cause this function to be called when the buffer becomes full

106
membuffer_simple
Pin Buffering API actually works somewhat different
than memtrace
Instrumentation code will insert the data generated by an
INS into the buffer immediately after the data generated
by the previously executed instrumented INS
Better buffer utilization
Requires the instrumentation to update the next buffer location
to write to this was not required in the memtrace
implementatio
All this is invisible to the Pin tool writer

membuffer_simple is a Pin tool that uses the Pin
Buffering API to do the same memory access
recording that memtrace_simple does

107
membuffer_simple
KNOB<UINT32> KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE, "pintool",
"num_pages_in_buffer", "256", "number of pages in buffer");

// Struct of memory reference written to the buffer
struct MEMREF {
ADDRINT appIP;
ADDRINT memAddr; };

// The buffer ID returned by the one call to PIN_DefineTraceBuffer
BUFFER_ID bufId;

TLS_KEY appThreadRepresentitiveKey;

PIN_Init(argc,argv) ;

// Pin TLS slot for holding the object that represents an application thread

// Define the buffer that will be used buffer is allocated to each thread when the thread starts
//running
bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer,
BufferFull, // This Pin tool function will be called when buffer is full
0);

INS_AddInstrumentFunction(Instruction, 0); // The Instruction function will use the Pin Buffering
// API to insert the instrumentation code that writes
// the MEMREF of a memory accessing INS into the buffer


108
membuffer_simple
/*
* Pin generates code to call this function when a buffer fills up, and exceutes a callback
to this function when the thread exits
* Pin will NOT inline this function
* @param[in] id buffer handle
* @param[in] tid id of owning thread
* @param[in] ctxt application context
* @param[in] buf actual pointer to buffer
* @param[in] numElements number of records
* @param[in] v callback value

* @return A pointer to the buffer to resume filling.
*/
VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf,
UINT64 numElements, VOID *v)
{
// retrieve the APP_THREAD_REPRESENTITVE* of this thread from the Pin TLS
= static_cast<APP_THREAD_REPRESENTITVE*>( PIN_GetThreadData(
appThreadRepresentitiveKey, tid ) );

appThreadRepresentitive->ProcessBuffer(buf, numElements);

return buf;
}}

109
membuffer_simple
VOID Instruction (INS ins, VOID *v)
{
UINT32 numMemOperands = INS_MemoryOperandCount(ins);

for (UINT32 memOp = 0; memOp < numMemOperands ; memOp++)
{ // Add the instrumentation code to write the appIP and memAddr
// of this memory operand into the buffer
// Pin will inline the code that writes to the buffer
INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId,
IARG_INST_PTR, offsetof(struct MEMREF, appIP),
IARG_MEMORYOP_EA, memOp,
offsetof(struct MEMREF, memAddr),
IARG_END);
}
}

110
branchbuffer_simple
Use Pin Buffering API to collect a branch trace:
For each executed branch instruction record:
appIP of the branch instruction
targetAddress of the branch instruction
branchTaken boolean

111
branchbuffer_simple
KNOB<UINT32> KnobNumPagesInBuffer(KNOB_MODE_WRITEONCE, "pintool",
"num_pages_in_buffer", "256", "number of pages in
buffer");

struct BRANCH_INFO { // This is the structure of the data that will be written into the buffer
ADDRINT appIP;
ADDRINT targetAddress;
BOOL branchTaken;
};

int main(int argc, char *argv[])
{
PIN_Init(argc,argv);

bufId
= PIN_DefineTraceBuffer(sizeof(BRANCH_INFO), KnobNumPagesInBuffer, BufferFull, 0);

// Register function to be called to instrument traces

// Register function to be called when the application exits

// Start the program, never returns
PIN_StartProgram();
}

112
branchbuffer_simple
void Trace(TRACE tr, void* V) // TRACE_AddInstrumentFunction(Trace, 0);

{

for(BBL bbl = TRACE_BblHead(tr); BBL_Valid(bbl); bbl=BBL_Next(bbl))
{
if (INS_IsBranchOrCall(BBL_InsTail(bbl))) // The branch instruction, if it exists, will always
// be the last in the BBL

{
INS_InsertFillBuffer(BBL_InsTail(bbl),
IPOINT_BEFORE,
bufId,
IARG_INST_PTR, offsetof(BRANCH_INFO, appIP),
IARG_BRANCH_TARGET_ADDR, offsetof(BRANCH_INFO, targetAddress),
IARG_BRANCH_TAKEN, offsetof(BRANCH_INFO, branchTaken),
IARG_END);

}
}
}

113
Symbols
PIN_InitSymbols()
Pin will use whatever symbol information is available
Debug info in the app
Pdb files
Export Tables
On Windows uses dbghelp
See PIN_InitSymbolsAlt() for more control over which
symbols will be used

Use symbols to instrument/wrap/replace specific
functions
wrap/replace: see malloc replacement examples in intro

Access application debug information from a Pin
tool

114
Symbols: Instrument malloc and free
{
// Initialize pin symbol manager
PIN_InitSymbols();
// See also PIN_InitSymbolsAlt() for more control over which symbols are read


// Register the function ImageLoad to be called each time an image is loaded in the process
// This includes the process itself and all shared libraries it loads (implicitly or explicitly)

// Never returns
PIN_StartProgram();
}

115
Symbols: Instrument malloc and free
VOID ImageLoad(IMG img, VOID *v) // Pin Callback. IMG_AddInstrumentFunction(ImageLoad, 0);
{
// Instrument the malloc() and free() functions. Print the input argument
// of each malloc() or free(), and the return value of malloc().

RTN mallocRtn = RTN_FindByName(img, "_malloc"); // Find the malloc() function.
if (RTN_Valid(mallocRtn))
{

// Instrument malloc() to print the input argument value and the return value.
RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore,
IARG_END);
RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

RTN_Close(mallocRtn);
}

RTN freeRtn = RTN_FindByName(img, "_free"); // Find the free() function.
if (RTN_Valid(freeRtn))
{
RTN_Open(freeRtn);
// Instrument free() to print the input argument value.
RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)FreeBefore,
IARG_END);
RTN_Close(freeRtn);
}
}

116
Symbols: Instrument malloc
Handling name-mangling and multiple
symbols at same address
VOID Image(IMG img, VOID *v) // IMG_AddInstrumentFunction(Image, 0);
{
// Walk through the symbols in the symbol table.
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);

if (undFuncName == "malloc") // Find the malloc function.
{
RTN mallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

if (RTN_Valid(mallocRtn))
{


// Instrument to print the input argument value and the return value.
RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore,
IARG_END);
RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
IARG_FUNCRET_EXITPOINT_VALUE,
IARG_END);

RTN_Close(mallocRtn);
}
}
}
}

117
Symbols: Accessing Application Debug
Info from a Pin Tool:
Catch a Memory Overwrite
VOID Instruction(INS ins, VOID *v) // INS_AddInstrumentFunction(Instruction, 0);
{

{
if (INS_MemoryOperandIsWritten(ins, memOp))
{ // Insert instrumentation code to catch a memory overwrite
INS_InsertIfCall (ins, IPOINT_BEFORE,
AFUNPTR(AnalyzeMemWrite),
IARG_MEMORYOP_EA, memop,
IARG_MEMORYWRITE_SIZE,
IARG_END);

INS_InsertThenCall (ins, IPOINT_BEFORE,
AFUNPTR(MemoryOverWriteAt),
IARG_INST_PTR,
IARG_MEMORYOP_EA, memop,
IARG_MEMORYWRITE_SIZE,
IARG_END);

}
}
}

118
Symbols: Accessing Application Debug
Info from a Pin Tool
KNOB<ADDRINT> KnobMemAddrBeingOverwritten(KNOB_MODE_WRITEONCE, "pintool",
"mem_overwrite_addr", "256", "overwritten memaddr");

static ADDRINT PIN_FAST_ANALYSIS_CALL
AnalyzeMemWrite ( // Pin will inline this function, it is the IF part
ADDRINT memWriteAddr, UINT32 numBytesWritten)
{ // return 1 if this memory write overwrites the address specified by
// KnobMemAddrBeingOverwritten
return (memWriteAddr<= KnobMemAddrBeingOverwritten &&
(memWriteAddr + numBytesWritten) > KnobMemAddrBeingOverwritten);
}

static VOID PIN_FAST_ANALYSIS_CALL
MemoryOverWriteAt ( // Pin will NOT inline this function, it is the THEN part
ADDRINT appIP, ADDRINT memWriteAddr, UINT32 numBytesWritten)
{
INT32 column, lineNum;
string fileName;

PIN_GetSourceLocation (appIP, &column, &line, &fileName);

printf ("overwrite of %p from instruction at %p originating from file %s line %d col %d\n",
KnobMemAddrBeingOverwritten, appIP, fileName.c_str(), lineNum, column);
printf (" writing %d bytes starting at %p\n", numBytesWritten, memWriteAddr);
}

119
Probe Mode
JIT Mode
Pin creates a modified copy of the application on-
the-fly
Original code never executes
More flexible, more common approach
Probe Mode
Pin modifies the original application instructions
Inserts jumps to instrumentation code
(trampolines)
Lower overhead (less flexible) approach

120
Pin Probe-Mode
Probe mode is a method of using Pin to instrument at the
function level only. Wrap, Replace, call Analysis function
before/after.

Replacement or Wrapping function can call the replaced
(original) function.

The application and the replacement routine are run natively
(not Jitted).
Faster than Jit-mode
Puts more responsibility on the tool writer.
Probes can only be placed on RTN boundaries
Must be inserted within the Image load callback.
Pin will automatically remove the probes when an image is unloaded.

Many of the PIN APIs that are available in JIT mode are not
available in Probe mode.

121
Entry point overwritten with probe:
0x400113d4: jmp 0x41481064


A Sample Probe
A probe is a jump instruction that overwrites
original instruction(s) in the application
Instrumentation invoked with probes
Pin copies/translates original bytes so probed (replaced)
functions can be called from the replacement function
Copy of entry point with original bytes:
0x50000004: push %ebp
0x50000005: mov %esp,%ebp
0x50000007: push %edi
0x50000008: push %esi
0x50000009: jmp 0x400113d9
0x41481064: push %ebp // tool wrapper func
::::::::::::::::::::
0x414827fe: call 0x50000004 // call original func
Original function entry point:
0x400113d4: push %ebp
0x400113d5: mov %esp,%ebp
0x400113d7: push %edi
0x400113d8: push %esi

122
PinProbes Instrumentation
Advantages:
Low overhead few percent
Less intrusive execute original code
Leverages Pin:
API
Instrumentation engine
Disadvantages:
More tool writer responsibility
Routine-level granularity (RTN)

123
Using Probes to Replace/Wrap a
Function
RTN_ReplaceSignatureProbed() redirects all calls
to application routine rtn to the specified
replacementFunction
Can add IARG_* types to be passed to the replacement
routine, including pointer to original function and
IARG_CONTEXT.
Replacement function can call original function.

To use:
Must use PIN_StartProgramProbed()
Application prototype is required

124
#include "pin.H"
void * MallocWrapper(AFUNPTR pf_malloc, size_t size)
void * res;
return (NULL);
res = pf_malloc(size);
return res; }

VOID ImageLoad (IMG img, VOID *v) {
{

if ( RTN_Valid(mallocRtn) && RTN_IsSafeForProbedReplacement(mallocRtn) )
{

RTN_ReplaceSignatureProbed(mallocRtn,
AFUNPTR(MallocWrapper),
IARG_ORIG_FUNCPTR,
IARG_END);
} }}

PIN_InitSymbols();
PIN_StartProgramProbed(); }
Malloc Replacement Probe-Mode

125
Using Probes to Call Analysis
Functions
RTN_InsertCallProbed() invokes the analysis
routine before or after the specified rtn
Use IPOINT_BEFORE or IPOINT_AFTER
Pin may NOT be able to find all AFTER points on
the function when it is running in Probe-Mode
PIN IARG_TYPEs are used for arguments

To use:
Must use PIN_StartProgramProbed()
Application prototype is required for IPOINT_AFTER

126
Symbols: Instrument malloc
Handling name-mangling and multiple
symbols at same address Probe-Mode
VOID Image(IMG img, VOID *v) // IMG_AddInstrumentFunction(Image, 0);
{
// Walk through the symbols in the symbol table.
for (SYM sym = IMG_RegsymHead(img); SYM_Valid(sym); sym = SYM_Next(sym))
{
string undFuncName = PIN_UndecorateSymbolName(SYM_Name(sym), UNDECORATION_NAME_ONLY);

if (undFuncName == "malloc") // Find the malloc function. {
RTN mallocRtn = RTN_FindByAddress(IMG_LowAddress(img) + SYM_Value(sym));

if (RTN_Valid(mallocRtn)) {


// Instrument to print the input argument value and the return value.
RTN_InsertCallProbed(mallocRtn, IPOINT_BEFORE, (AFUNPTR)MallocBefore,
IARG_END);
RTN_InsertCallProbed(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
IARG_FUNCRET_EXITPOINT_VALUE,
IARG_END);

RTN_Close(mallocRtn); } }
}
}

127
Tool Writer Responsibilities
No control flow into the instruction space where
probe is placed
6 bytes on IA-32, 7 bytes on Intel64, 1 bundle
on IA64
Branch into replaced instructions will fail
Probes at function entry point only
Thread safety for insertion and deletion of probes
During image load callback is safe
Only loading thread has a handle to the image
Replacement function has same behavior as
original

128
Multi-Threading
Have shown a number of examples of Pin tools
supporting multi-threading

Pin fully supports multi-threading
Application threads execute jitted code including
instrumentation code (inlined and not inlined), without any
serialization introduced by Pin
Instrumentation code can use Pin and/or OS synchronization
constructs to introduce serialization if needed.
Will see examples of this in Part3
System calls require serialized entry to the VM before and after
execution BUT actual execution is NOT serialized
Pin does NOT create any threads of its own

Pin callbacks are serialized
Including the BufferFull callback

Jitting is serialized
Only one application thread can be jitting code at any time

129
Multi-Threading
Pin Tools, in Jit-Mode, can:

Track Threads
ThreadStart, ThreadFini callbacks
IARG_THREAD_ID

Use Pin TLS for thread-specific data

Use Pin Locks to synchronize threads

Create threads to do Pin Tool work
Use Pin provided APIs to do this
Otherwise these threads would be Jitted
Details in Part4

130
Multi-Threading, Locking Guidelines
Basic Rules

If the tool acquires any locks in a Pin call-back, it must
release those locks before returning from that call-back.
If the tool acquires any locks in an analysis routine, it must
release those locks before returning from the analysis
routine.
If the tool calls a Pin API from a call-back, it should not
hold any tool locks when calling the API.
If the tool calls a Pin API from an analysis routine, it may
need to acquire the Pin client lock first (see the
documentation for the API). The tool should not hold any
other locks when calling the API.

131
Advanced Rules

If the tool acquires any locks in a Pin call-back, it must
release those locks before returning from that call-back.

If the tool calls a Pin API from a call-back, it should not
hold any tool locks when calling the API.

If the tool calls a Pin API from an analysis routine, it may
need to acquire the Pin client lock first (see the
documentation for the API). If the tool holds a tool lock L
while calling the API, that lock L must obey the following
sub-rule:
The tool must not acquire lock L from any call-back. This
avoids a lock order inversion with respect to the Pin internal
locks.

132
Advanced Rules

If the tool acquires any locks in an analysis routine, it must
release those locks before leaving the trace that contains
the analysis routine. Tools must expect that the trace may
exit early if an application instruction raises an
exception. Any lock L, which the tool might hold when the
application raises an exception, must obey the following
sub-rules:
The tool must establish a call-back that executes when the
application raises an exception, and this call-back must release
lock L if it was acquired at the time of the exception. Tools
can use PIN_AddContextChangeFunction() to establish this
call-back.
The tool must not acquire lock L from any call-back. This
avoids a lock order inversion with respect to the Pin internal
locks.

133
Taint Analysis
For each instruction
Identify source and destination operands
Explicit, Implicit
If SRC is tainted then set DEST is tainted
If SRC isnt tainted then set DEST isnt tainted

Sounds simple, right?

134
Taint Analysis
Implicit operands
Partial register taint
Math instructions
Logical instructions
Exchange instructions

135
A simple taint analyzer
Set of Tainted Memory Addresses
Tainted Registers
Fetch
next inst.
If src is
tainted
set dest
tainted
If src is
untainted
set dest
untainted
bffff081
bffff082
b64d4002
EAX EDX ESI
Define
initial
taint
Set of Tainted Memory Addresses
Tainted Registers
bffff081
bffff082
b64d4002
EAX EDX ESI

136
#include "pin.H"
#include <iostream>
#include <fstream>
#include <set>
#include <string.h>
#include "xed-iclass-enum.h"

set<ADDRINT> TaintedAddrs; // tainted memory addresses
bool TaintedRegs[REG_LAST]; // tainted registers
std::ofstream out; // output file

KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool",
"o", "taint.out", "specify file name for the output file");

/*!
* Print out help message.
*/
INT32 Usage()
{
cerr << "This tool follows the taint defined by the first argument to " << endl <<
"the instrumented program command line and outputs details to a file" << endl ;
cerr << KNOB_BASE::StringKnobSummary() << endl;
return -1;
}

137
{
// Initialize PIN
PIN_InitSymbols();

if( PIN_Init(argc,argv) )
{
return Usage();
}

// Register function to be called to instrument traces
RTN_AddInstrumentFunction(Routine, 0);

// Register function to be called when the application exits

// init output file
string fileName = KnobOutputFile.Value();
out.open(fileName.c_str());

// Start the program, never returns
PIN_StartProgram();

return 0;
}

138
/*!
* Routine instrumentation, called for every routine loaded
* this function adds a call to MainAddTaint on the main function
*/
VOID Routine(RTN rtn, VOID *v)
{
RTN_Open(rtn);

if (RTN_Name(rtn) == "main") //if this is the main function
{
RTN_InsertCall(rtn, IPOINT_BEFORE, (AFUNPTR)MainAddTaint,
IARG_END);
}

RTN_Close(rtn);
}

/*!
* Print out the taint analysis results.
* This function is called when the application exits.
*/
VOID Fini(INT32 code, VOID *v)
{
DumpTaint();
out.close();
}

139
VOID DumpTaint() {
out << "======================================" << endl;
out << "Tainted Memory: " << endl;
set<ADDRINT>::iterator it;
for ( it=TaintedAddrs.begin() ; it != TaintedAddrs.end(); it++ )
{
out << " " << *it;
}
out << endl << "***" << endl << "Tainted Regs:" << endl;

for (int i=0; i < REG_LAST; i++) {
if (TaintedRegs[i]) {
out << REG_StringShort((REG)i);
} }
out << "======================================" << endl;
}

// This function marks the contents of argv[1] as tainted
VOID MainAddTaint(unsigned int argc, char *argv[]) {
if (argc != 2) return;

int n = strlen(argv[1]);
ADDRINT taint = (ADDRINT)argv[1];

for (int i = 0; i < n; i++) TaintedAddrs.insert(taint + i);

DumpTaint();
}

140
// This function represents the case of a register copied to memory
void RegTaintMem(ADDRINT reg_r, ADDRINT mem_w) {
out << REG_StringShort((REG)reg_r) << " --> " << mem_w << endl;

if (TaintedRegs[reg_r]) {
TaintedAddrs.insert(mem_w);
}
else //reg not tainted --> mem not tainted
{
if (TaintedAddrs.count(mem_w)) { // if mem is already not tainted nothing to do
TaintedAddrs.erase(TaintedAddrs.find(mem_w));
}
}
}

// this function represents the case of a memory copied to register
void MemTaintReg(ADDRINT mem_r, ADDRINT reg_w, ADDRINT inst_addr) {
out << mem_r << " --> " << REG_StringShort((REG)reg_w) << endl;

if (TaintedAddrs.count(mem_r)) //count is either 0 or 1 for set
{
TaintedRegs[reg_w] = true;
}
else //mem is clean -> reg is cleaned
{
TaintedRegs[reg_w] = false;
}
}

141
// this function represents the case of a reg copied to another reg
void RegTaintReg(ADDRINT reg_r, ADDRINT reg_w)
{
out << REG_StringShort((REG)reg_r) << " --> " <<
REG_StringShort((REG)reg_w) << endl;

TaintedRegs[reg_w] = TaintedRegs[reg_r];
}

// this function represents the case of an immediate copied to a register
void ImmedCleanReg(ADDRINT reg_w)
{
out << "const --> " << REG_StringShort((REG)reg_w) << endl;

TaintedRegs[reg_w] = false;
}

// this function represents the case of an immediate copied to memory
void ImmedCleanMem(ADDRINT mem_w)
{
out << "const --> " << mem_w << endl;

if (TaintedAddrs.count(mem_w)) //if mem is not tainted nothing to do
{
TaintedAddrs.erase(TaintedAddrs.find(mem_w));
}
}

142
// True if the instruction has an immediate operand
// meant to be called only from instrumentation routines
bool INS_has_immed(INS ins);

// returns the full name of the first register operand written
REG INS_get_write_reg(INS ins);

// returns the full name of the first register operand read
REG INS_get_read_reg(INS ins)

Helpers

143
/*!
* This function checks for each instruction if it does a mov that can potentially
* transfer taint and if true adds the approriate analysis routine to check
* and propogate taint at run-time if needed
* This function is called every time a new trace is encountered.
*/
VOID Trace(TRACE trace, VOID *v) {
for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) {
for (INS ins = BBL_InsHead(bbl); INS_Valid(ins); ins = INS_Next(ins)) {
if ( (INS_Opcode(ins) >= XED_ICLASS_MOV) &&
(INS_Opcode(ins) <= XED_ICLASS_MOVZX) ) {
if (INS_has_immed(ins)) {
if (INS_IsMemoryWrite(ins)) { //immed -> mem
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ImmedCleanMem,
IARG_MEMORYOP_EA, 0,
IARG_END);
}
else //immed -> reg
{
REG insreg = INS_get_write_reg(ins);
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ImmedCleanReg,
IARG_ADDRINT, (ADDRINT)insreg,
IARG_END);
}
} // end of if INS has immed
else if (INS_IsMemoryRead(ins)) //mem -> reg

144
else if (INS_IsMemoryRead(ins)) { //mem -> reg
//in this case we call MemTaintReg to copy the taint if relevant
REG insreg = INS_get_write_reg(ins);
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)MemTaintReg,
IARG_ADDRINT, (ADDRINT)insreg, IARG_INST_PTR,
IARG_END);
}
else if (INS_IsMemoryWrite(ins)) { //reg -> mem
//in this case we call RegTaintMem to copy the taint if relevant
REG insreg = INS_get_read_reg(ins);
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)RegTaintMem,
IARG_ADDRINT, (ADDRINT)insreg,
IARG_END);
}
else if (INS_RegR(ins, 0) != REG_INVALID()) { //reg -> reg
//in this case we call RegTaintReg
REG Rreg = INS_get_read_reg(ins);
REG Wreg = INS_get_write_reg(ins);
INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)RegTaintReg,
IARG_ADDRINT, (ADDRINT)Rreg,
IARG_ADDRINT, (ADDRINT)Wreg,
IARG_END);
}
else { out << "serious error?!\n" << endl; }
} // IF opcode is a MOV
} // For INS
} // For BBL
} // VOID Trace

145
Part3 Summary
Saw Examples of
Allocating Pin Registers for Pin Tool Use
Pin IF-THEN instrumentation
Changing register values in instrumentation code
Changing register values in CONTEXT
Knobs
Pin TLS
Pin Buffering API
Using Symbol and Debug Info
Probe-Mode
Multi-Threading support

146
Part4: Advanced Pin API
To boldly go where few PinHeads have
gone before
Agenda
membuffer_threadpool tool
Using multiple buffers in the Pin Buffering API
Using Pin Tool Threads
Using Pin and OS locks to synchronize threads
System call instrumentation
Instrumenting a process tree
CONTEXT* and IARG_CONST_CONTEXT, IARG_CONTEXT
Managing Exceptions and Signals
Accessing Decode API
Pin Code-Cache API
Transparent debugging, and extending the debugger

147
membuffer_threadpool
Recall membuffer_simple:

Uses Pin Buffering API

One buffer for each thread

Inlined call to INS_InsertFillBuffer writes instrumentation data
into the buffer
Application threads execute jitted application and instrumentation code

When buffer becomes full the Pin Tool defined BufferFull function
is called (by the application thread)
Process the data in the buffer
After the buffer is processed it is set to be re-filled from the top
Application thread continues executing jitted application and instrumentation
code

148
Improvement: Process buffers that become full asynchronously, allows
application code to continue executing while buffers are being processed.

Pin Buffering API supports multiple buffers per-thread
Each application thread will allocate a number of buffers.
The buffers allocated by the thread can only be used by the allocating thread so:
Each application thread will have a buffers-free list, holding all buffers that are not currently
full or being filled.

Pin supports creating Pin Tool threads, these are NOT jitted and can be used to do
Pin Tool work asynchronously.
A number of these threads will be created, their job is:
Process buffers that become full. These will be located on a global full-buffers list.
After processing, return them to the buffers-free list of the application thread that filled them

Application threads execute jitted application code and instrumentation code the
instrumentation code writes data into the buffers and when it detects that the buffer
is full calls the BufferFull callback.

The BufferFull callback function will NOT process the buffer
Remember it is executed by an application thread
It places the buffer on the global full-buffers list
It retrieves a free buffer from this application threads free buffer list and returns it as the next
buffer to fill.

149

Application thread
buffers-free list
Buffer being filled
Application thread
buffers-free list
Buffer being filled
Pin Tool Processing thread
buffers-full list
Pin Tool Processing thread
Buffer
becomes
full
BufferFull
function
executed
Buffer
becomes
full
BufferFull
function
executed
Buffer
Processing
finishes.
Buffer
returned to
owners
buffers-free
list

150
int main(int argc, char *argv[]) {

// Pin TLS slot for holding the object that represents an application thread

// Define the buffer that will be used
bufId = PIN_DefineTraceBuffer(sizeof(struct MEMREF), KnobNumPagesInBuffer,
BufferFull, // This Pin tool function will be called when buffer is full
0);

TRACE_AddInstrumentFunction(Trace, 0); // add an instrumentation callback function

// add callbacks
PIN_AddFiniUnlockedFunction(FiniUnlocked, 0); // Used for Pin Tool thread termination

/* It is safe to create internal threads in the tool's main procedure and spawn new
* internal threads from existing ones. All other places, like Pin callbacks and
* analysis routines in application threads, are not safe for creating internal threads. */
// NOTE: These threads are NOT jitted, Need to discuss when the threads actually start running
for (int i=0; i<KnobNumProcessingThreads; i++) {

THREADID threadId; PIN_THREAD_UID threadUid;

threadId
= PIN_SpawnInternalThread (BufferProcessingThread, NULL, 0, &threadUid);
RecordToolThreadCreated(threadUid); /* Used for Pin Tool thread termination */ }

PIN_StartProgram(); /* Start the program, never returns */ }

151
static void RecordToolThreadCreated (PIN_THREAD_UID threadUid)
{ // Record the unique ID of the Pin Tool thread
uidSet.insert(threadUid);
}

// The thread function of Pin Tool threads this code runs natively: NO Jitting
static VOID BufferProcessingThread(VOID * arg)
{
processingThreadRunning = TRUE; // Indicate that thread has started running
THREADID myThreadId = PIN_ThreadId();

while (!doExit)
{
VOID *buf;
UINT64 numElements;
APP_THREAD_REPRESENTITVE *appThreadRepresentitive;
// Get full buffer from the full buffer list
fullBuffersListManager.GetBufferFromList(&buf ,&numElements,
&appThreadRepresentitive, myThreadId);
if (buf == NULL) { // this will happen at process termination time when there are NO
ASSERTX(doExit); // no buffers left to process
break; }
// Process the full buffer
ProcessBuffer(buf, numElements, appThreadRepresentitive);
// Put the processed buffer back on the free buffer list of the application thread that owns it
appThreadRepresentitive->FreeBufferListManager()
->PutBufferOnList(buf, 0, appThreadRepresentitive, myThreadId);
}
}

152
/*!
* Called by, instrumentation code, when a buffer fills up, by Pin when the thread exits, so the buffer
can be processed
* Called in the context of the application thread
* @param[in] id buffer handle
* @param[in] tid id of owning thread
* @param[in] ctxt application context
* @param[in] buf actual pointer to buffer
* @param[in] numElements number of records
* @param[in] v callback value
* @return A pointer to the buffer to resume filling.
*/
VOID * BufferFull(BUFFER_ID id, THREADID tid, const CONTEXT *ctxt, VOID *buf,
UINT64 numElements, VOID *v)
{

// get the APP_THREAD_REPRESENTITVE of this app thread from the Pin TLS
= static_cast<APP_THREAD_REPRESENTITVE*>( PIN_GetThreadData(
appThreadRepresentitiveKey, tid ) );

// Enqueue the full buffer, on the full-buffers list, and get the next buffer to fill, from this
// threads free buffer list
VOID *nextBuffToFill
= appThreadRepresentitive->EnqueFullAndGetNextToFill(buf, numElements);

return (nextBuffToFill);
}

153
VOID * APP_THREAD_REPRESENTITVE::EnqueFullAndGetNextToFill(VOID *fullBuf,
UINT64 numElements) {
// cannot wait for Pin Tool threads to start running since this may cause deadlock
// because this app thread may be holding some OS resource that the Pin Tool
// thread needs to obtain in order to start - e.g. the LoaderLock
if ( !processingThreadRunning) { // process buffer in this app thread
ProcessBuffer(fullBuf, numElements, this);
return fullBuf; }

if (!_buffersAllocated) {
// now allocate the rest of the KnobNumBuffersPerAppThread buffers to be used
for (int i=0; i<KnobNumBuffersPerAppThread-1; i++)
_freeBufferListManager->PutBufferOnList(PIN_AllocateBuffer(bufId), 0, this, _myTid);
_buffersAllocated = TRUE; }

// put the fullBuf on the full buffers list, on the Pin Tool processing
// threads will pick it from there, process it, and then put it on this app-thread's free buffer list
fullBuffersListManager.PutBufferOnList(fullBuf, numElements, this, _myTid);

// return the next buffer to fill.
// It is always taken from the free buffers list of this app thread. If the list is empty then this app
// thread will be blocked until one is placed there (by one of the Pin Tool buffer processing threads).
VOID *nextBufToFill; UINT64 numElementsDummy;
APP_THREAD_REPRESENTITVE *appThreadRepresentitiveDummy;
_freeBufferListManager->GetBufferFromList(&nextBufToFill,
&numElementsDummy,
&appThreadRepresentitiveDummy,
_myTid);
ASSERTX(appThreadRepresentitiveDummy = this);
return nextBufToFill; }

154
VOID Instruction (INS ins, VOID *v)
{

{ // Add the instrumentation code to write the appIP and memAddr
// of this memory operand into the buffer
// Pin will inline the code that writes to the buffer
INS_InsertFillBuffer(ins, IPOINT_BEFORE, bufId,
IARG_INST_PTR, offsetof(struct MEMREF, appIP),
IARG_MEMORYOP_EA, memOp,
offsetof(struct MEMREF, memAddr),
IARG_END);
}
}

155
class BUFFER_LIST_MANAGER {
public:
BUFFER_LIST_MANAGER();
VOID PutBufferOnList (VOID *buf, UINT64 numElements, APP_THREAD_REPRESENTITVE *appThreadRepresentitive,
THREADID tid) {
// build the list element
BUFFER_LIST_ELEMENT bufferListElement;
bufferListElement.buf = buf;
bufferListElement.numElements = numElements;
bufferListElement.appThreadRepresentitive = appThreadRepresentitive;

GetLock(&_bufferListLock, tid+1); // lock the list, using a Pin lock
_bufferList.push_back(bufferListElement); // insert the element at the end of the list
ReleaseLock(&_bufferListLock); // unlock the list
WIND::ReleaseSemaphore(_bufferSem, 1, NULL); // signal that there is a buffer on the list }

VOID GetBufferFromList (VOID **buf ,UINT64 *numElements, APP_THREAD_REPRESENTITVE **appThreadRepresentitive,
THREADID tid){
WIND::WaitForSingleObject (_bufferSem, INFINITE); // wait until there is a buffer on the list

GetLock(&_bufferListLock, tid+1); // lock the list
BUFFER_LIST_ELEMENT &bufferListElement = (_bufferList.front()); // retrieve the first element of the list
*buf = bufferListElement.buf;
*numElements = bufferListElement.numElements;
*appThreadRepresentitive = bufferListElement.appThreadRepresentitive;
_bufferList.pop_front(); // remove the first element from the list
ReleaseLock(&_bufferListLock); // unlock the list
}
VOID SignalBufferSem() {WIND::ReleaseSemaphore(_bufferSem, 1, NULL);}
UINT32 NumBuffersOnList () { return (_bufferList.size());}
private:
struct BUFFER_LIST_ELEMENT // structure of an element of the buffer list {
VOID *buf;
UINT64 numElements;
APP_THREAD_REPRESENTITVE *appThreadRepresentitive; // the application thread that owns this buffer };

WIND::HANDLE _bufferSem; // counting semaphore, value is #of buffers on the list, value==0 => WaitForSingleObject
blocks
PIN_LOCK _bufferListLock; // Pin Lock
list<const BUFFER_LIST_ELEMENT> _bufferList; };

156
VOID ThreadFini(THREADID tid, const CONTEXT *ctxt, INT32 code, VOID *v)
{
// get the APP_THREAD_REPRESENTITVE of this app thread from the Pin TLS
= static_cast<APP_THREAD_REPRESENTITVE*>(PIN_GetThreadData(
appThreadRepresentitiveKey, tid));

// wait for all my buffers to be processed
while(appThreadRepresentitive->_freeBufferListManager->NumBuffersOnList() !=
KnobNumBuffersPerAppThread-1)
PIN_Sleep(1);

delete appThreadRepresentitive;
PIN_SetThreadData(appThreadRepresentitiveKey, 0, tid);
}

static VOID FiniUnlocked(INT32 code, VOID *v) {

BOOL waitStatus;
INT32 threadExitCode;

doExit = TRUE; // indicate that process is exiting

// signal all the Pin Tool threads to wake up and recognize the exit
for (int i=0; i<KnobNumProcessingThreads; i++)
fullBuffersListManager.SignalBufferSem();

// Wait until all Pin Tool threads exit
for (set<PIN_THREAD_UID>::iterator it = uidSet.begin(); it != uidSet.end(); ++it)
waitStatus = PIN_WaitForThreadTermination(*it, PIN_INFINITE_TIMEOUT, &threadExitCode); }

157
System Call Instrumentation
VOID SyscallEntry(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v)
{

ADDRINT appIP = PIN_GetContextReg(ctxt, REG_INST_PTR);

printf ("syscall# %d at appIP %x param1 %x param2 %x param3 %x param4 %x param5 %x
param6 %x\n",
PIN_GetSyscallNumber(ctxt, std), appIP,
PIN_GetSyscallArgument(ctxt, std, 0), PIN_GetSyscallArgument(ctxt, std, 1),
PIN_GetSyscallArgument(ctxt, std, 2), PIN_GetSyscallArgument(ctxt, std, 3),
PIN_GetSyscallArgument(ctxt, std, 4), PIN_GetSyscallArgument(ctxt, std, 5));
}

VOID SyscallExit(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, VOID *v)
{
printf(" returns: %x\n", PIN_GetSyscallReturn(ctxt, std);
}

{

// Instrument system calls via these Pin Callbacks and not via analysis functions
PIN_AddSyscallEntryFunction (SyscallEntry, 0);
PIN_AddSyscallExitFunction (SyscallExit, 0);

}

158
Instrumenting a Process Tree
Process A creates Process B
Process B creates Process C and D
And so forth

Can use Pin to instrument all or part of the processes of a
process tree
Use the follow_exevc Pin invocation switch to turn this on
Can use different Pin modes (Jit or Probe) on the different
processes in the process tree.
Can use different Pin Tools on the different processes of a
process tree.
Architecture of processes in the process tree may be
intermixed: e.g. Process A is 32bit, Process B is 64 bit,
Process C is 64 bit, Process D is 32 bit

159
Instrumenting a Process Tree
// If this Pin Callback returns FALSE, then the child process will run Natively
BOOL FollowChild(CHILD_PROCESS childProcess, VOID * userData) {
BOOL res; INT appArgc; CHAR const * const * appArgv;

OS_PROCESS_ID pid = CHILD_PROCESS_GetId(childProcess);

// Get the command line that child process will be Pinned with, these are the Pin invocation switches
// that were specified when this (parent) process was Pinned
CHILD_PROCESS_GetCommandLine(childProcess, &appArgc, &appArgv);

// The Pin invocation switches of the child can be made to order
INT pinArgc = 0;
CHAR const * pinArgv[20];
:::: Put values in pinArgv, Set pinArgc to be the number of entries in pinArgv that are to be used

CHILD_PROCESS_SetPinCommandLine(childProcess, pinArgc, pinArgv);

return TRUE; /* Specify Child process is to be Pinned */ }

int main(INT32 argc, CHAR **argv) {
cout << " Process is running on Pin in " << PIN_IsProbeMode() ? " Probe " : " Jit " << " mode "

// The FollowChild Pin Callback will be called when the application being Pinned is about to spawn
// child process
PIN_AddFollowChildProcessFunction (FollowChild, 0);

if ( PIN_IsProbeMode() )
PIN_StartProgramProbed(); // Never returns
else

160
CONTEXT*,
IARG_CONST_CONTEXT, IARG_CONTEXT
CONTEXT* is a Handle to the full register context of the
application at a particular point in the execution

CONTEXT* can NOT be dereferenced. It is a handle to be passed
to Pin API functions

CONTEXT* is passed by default to a number of Pin
Callback functions: e.g.
ThreadStart registered by PIN_AddThreadStartFunction
BufferFull registered by PIN_DefineTraceBuffer
OnContextChange registered by PIN_AddContextChangeFunction

161
CONTEXT*

Pin API functions supplied to Get and Set registers within the
CONTEXT

Have Pin API functions to Get and Set FP context

Can request CONTEXT* be passed to an analysis function by
requesting IARG_(CONST)_CONTEXT

Requesting IARG_CONTEXT
The analysis function will NOT be inlined
The passing of the CONTEXT* is time consuming

Passing IARG_CONST_CONTEXT is ~4X faster than
IARG_CONTEXT
Contents of CONTEXT* passed for IARG_CONST_CONTEXT can NOT be
changed

162
CONTEXT*

Changes made to the contents of a CONTEXT*

IARG_CONTEXT
Changes made will be visible in subsequent PIN API calls made from
within the nesting of the analysis function
Changes made will NOT be visible in the application context after
return from the analysis function

Passed to PIN Callbacks
Changes made will be visible in both of above

163
#include "pin.H"
void *FunctionReplacer (
CONTEXT * ctxt,
AFUNPTR pf_malloc, size_t size)
{
void * res;
CONTEXT writableContext, * context = ctxt;

if (TimeForRegChange()) {
PIN_SaveContext(ctxt, &writableContext); // need to copy the ctxt into a writable context
context = & writableContext;
PIN_SetContextReg(context , REG_GAX, 1); }
PIN_CallApplicationFunction(context , PIN_ThreadId(),
CALLINGSTD_DEFAULT, pf_malloc,
PIN_PARG(void *), &res, PIN_PARG(size_t), size);
return res; }

RTN rtn = RTN_FindByName(img, Function");

PROTO proto = PROTO_Allocate( PIN_PARG(void *), CALLINGSTD_DEFAULT,
"proto", PIN_PARG(size_t), PIN_PARG_END() );

RTN_ReplaceSignature (rtn, AFUNPTR(FunctionReplacer), IARG_PROTOTYPE, proto,
IARG_CONST_CONTEXT,
IARG_ORIG_FUNCPTR, IARG_FUNCARG_ENTRYPOINT_VALUE, 0, IARG_END);
}

PIN_InitSymbols();
Function Replacement with register change

164
memtrace_simple
KNOB<UINT32> KnobNumBytesInBuffer(KNOB_MODE_WRITEONCE, "pintool", "num_bytes_in_buffer",
"0x100000", "number of bytes in buffer");

APP_THREAD_REPRESENTITVE::APP_THREAD_REPRESENTITVE(THREADID myTid) {
_buffer = new char[KnobNumBytesInBuffer.Value()]; // Allocate the buffer
_numBuffersFilled = 0;
_numElementsProcessed = 0;
_myTid = myTid; }

char * APP_THREAD_REPRESENTITVE::Begin() { return _buffer; }

char * APP_THREAD_REPRESENTITVE:: End() { return _buffer + KnobNumBytesInBuffer.Value(); }

VOID ThreadStart(THREADID tid,
CONTEXT *ctxt,
INT32 flags, VOID *v) // Pin callback on thread creation
{
// There is a new APP_THREAD_REPRESENTITVE object for every thread
= new APP_THREAD_REPRESENTITVE(tid);

// A thread will need to look up its APP_THREAD_REPRESENTITVE, so save pointer in Pin TLS
PIN_SetThreadData(appThreadRepresentitiveKey, appThreadRepresentitive, tid);

// Initialize endOfTraceInBufferReg to point at beginning of buffer
PIN_SetContextReg(ctxt, endOfTraceInBufferReg,
reinterpret_cast<ADDRINT>(appThreadRepresentitive->Begin()));

// Initialize endOfBufferReg to point at end of buffer
PIN_SetContextReg(ctxt, endOfBufferReg,
reinterpret_cast<ADDRINT>(appThreadRepresentitive->End())); }

165

166
Exceptions
Catch Exceptions that occur in Pin Tool code

Global exception handler
PIN_AddInternalExceptionHandler

Guard code section with exception handler
PIN_TryStart
PIN_TryEnd

167
Exceptions
VOID InstrumentDivide(INS ins, VOID* v)
{
if ((INS_Mnemonic(ins) == "DIV") &&
(INS_OperandIsReg(ins, 0)))
{ // Will Emulate div instruction with register operand
INS_InsertCall(ins,
IPOINT_BEFORE,
AFUNPTR(EmulateIntDivide),
IARG_REG_REFERENCE, REG_GDX,
IARG_REG_REFERENCE, REG_GAX,
IARG_REG_VALUE, REG(INS_OperandReg(ins, 0)),
IARG_CONST_CONTEXT,
IARG_THREAD_ID,
IARG_END);
INS_Delete(ins); // Delete the div instruction
}

int main(int argc, char * argv[])
{
INS_AddInstrumentFunction (InstrumentDivide, 0);
PIN_AddInternalExceptionHandler (GlobalHandler, NULL); // Registers a Global Exception Handler
return 0;
}

168
Exceptions
EXCEPT_HANDLING_RESULT DivideHandler (THREADID tid, EXCEPTION_INFO * pExceptInfo,
PHYSICAL_CONTEXT * pPhysCtxt, // The context when the exception
// occurred
VOID *appContextArg // The application context when the
// exception occurred
) {
if(PIN_GetExceptionCode(pExceptInfo) == EXCEPTCODE_INT_DIVIDE_BY_ZERO)
{ // Divide by zero occurred in the code emulating the divide, use PIN_RaiseException to raise this exception
// at the appIP for handling by the application
cout << " DivideHandler : Caught divide by zero." << PIN_ExceptionToString(pExceptInfo) << endl;

// Get the application IP where the exception occurred from the application context
CONTEXT * appCtxt = (CONTEXT *)appContextArg;
ADDRINT faultIp = PIN_GetContextReg (appCtxt, REG_INST_PTR);

// raise the exception at the application IP, so the application can handle it as it wants to
PIN_SetExceptionAddress (pExceptInfo, faultIp);
PIN_RaiseException (appCtxt, tid, pExceptInfo); // never returns
}
return EHR_CONTINUE_SEARCH; }

VOID EmulateIntDivide(ADDRINT * pGdx, ADDRINT * pGax, ADDRINT divisor, CONTEXT * ctxt, THREADID tid) {

PIN_TryStart(tid, DivideHandler, ctxt); // Register a Guard Code Section Exception Handler

UINT64 dividend = *pGdx;
dividend <<= 32;
dividend += *pGax;
*pGax = dividend / divisor;
*pGdx = dividend % divisor;

PIN_TryEnd(tid); /* Guarded Code Section ends */ }

169
Exceptions
EXCEPT_HANDLING_RESULT GlobalHandler(THREADID threadIndex, EXCEPTION_INFO * pExceptInfo,
PHYSICAL_CONTEXT * pPhysCtxt, VOID *v)
{ // Any Exception occurring in Pin Tool, or Pin that is not in a Guarded Code Section will cause this function to be
// executed
cout << "GlobalHandler: Caught unexpected exception. " << PIN_ExceptionToString(pExceptInfo) << endl;
return EHR_UNHANDLED;
}

170
Exceptions, Monitoring Application
Exceptions
PIN_AddContextChangeFunction
Can monitor and change that application state at
application exceptions

int main(int argc, char **argv)
{

PIN_AddContextChangeFunction(OnContextChange, 0);

PIN_StartProgram();
}

171
Exceptions, Monitoring Application
Exceptions
static void OnContextChange (THREADID tid,
CONTEXT_CHANGE_REASON reason,
const CONTEXT *ctxtFrom // Application's register state at exception point
CONTEXT *ctxtTo, // Application's register state delivered to handler
INT32 info,
VOID *v)
{
if (CONTEXT_CHANGE_REASON_SIGRETURN == reason
|| CONTEXT_CHANGE_REASON_APC == reason
|| CONTEXT_CHANGE_REASON_CALLBACK == reason
|| CONTEXT_CHANGE_REASON_FATALSIGNAL == reason
|| ctxtTo == NULL)
{ // don't want to handle these
return;
}

// CONTEXT_CHANGE_REASON_EXCEPTION
// change some register values in the context that the application will see at the handler
FPSTATE fpContextFromPin;
// change the bottom 4 bytes of xmm0
PIN_GetContextFPState (ctxtFrom, &fpContextFromPin);
fpContextFromPin.fxsave_legacy._xmm[3] = 'de';
fpContextFromPin.fxsave_legacy._xmm[2] = 'ad';
fpContextFromPin.fxsave_legacy._xmm[1] = 'be';
fpContextFromPin.fxsave_legacy._xmm[0] = 'ef';
PIN_SetContextFPState (ctxtTo, &fpContextFromPin);

// change eax
PIN_SetContextReg(ctxtTo, REG_RAX, 0xbaadf00d);
}

172
Signals
Establish an interceptor function for signals
delivered to the application

Tools should never call sigaction() directly to handle
signals.

function is called whenever the application receives the
requested signal, regardless of whether the application has
a handler for that signal.

function can then decide whether the signal should be
forwarded to the application

173
Signals
A tool can take over ownership of a signal in order
to:
use the signal as an asynchronous communication
mechanism to the outside world.
For example, if a tool intercepts SIGUSR1, a user of the tool could
send this signal and tell the tool to do something. In this usage
model, the tool may call PIN_UnblockSignal() so that it will receive the
signal even if the application attempts to block it.

"squash" certain signals that the application generates.
a tool that forces speculative execution in the application may want to
intercept and squash exceptions generated in the speculative code.

A tool can set only one "intercept" handler for a
particular signal, so a new handler overwrites any
previous handler for the same signal. To disable a
handler, pass a NULL function pointer.

174
Signals
BOOL EnableInstrumentation = FALSE;

BOOL SignalHandler(THREADID, INT32, CONTEXT *, BOOL, const EXCEPTION_INFO *, void *) {
// When tool receives the signal, enable instrumentation. Tool calls
// PIN_RemoveInstrumentation() to remove any existing instrumentation from Pin's code cache.
EnableInstrumentation = TRUE;
PIN_RemoveInstrumentation();

return FALSE; /* Tell Pin NOT to pass the signal to the application. */ }

VOID Trace(TRACE trace, VOID *) {
if (!EnableInstrumentation)
return;

BBL_InsertCall(bbl, IPOINT_BEFORE, AFUNPTR(AnalysisFunc), IARG_INST_PTR, IARG_END);}


PIN_InterceptSignal(SIGUSR1, SignalHandler, 0); // Tool should really determine which signal is NOT in
// use by application
PIN_UnblockSignal(SIGUSR1, TRUE);


175
Accessing the Decode API
The decoder/encoder used is called XED
http://www.pintool.org/docs/24110/Xed/html/

Tool code can use the XED API
E.g. decode an instruction inside an analysis routine.

176
Accessing the Decode API
extern "C" {
#include "xed-interface.h"
}

static VOID PIN_FAST_ANALYSIS_CALL
MemoryOverWriteAt ( // Pin will NOT inline this function, it is the THEN part
ADDRINT appIP, ADDRINT memWriteAddr, UINT32 numBytesWritten)
{
INT32 column, lineNum;
string fileName;

PIN_GetSourceLocation (appIP, &column, &line, &fileName);

static const xed_state_t dstate = { XED_MACHINE_MODE_LEGACY_32, XED_ADDRESS_WIDTH_32b};

xed_decoded_inst_t xedd;
xed_decoded_inst_zero_set_mode (&xedd, &dstate);

xed_error_enum_t xed_code = xed_decode (&xedd, reinterpret_cast<UINT8*>(appIP), 15);
char buf[256];

xed_decoded_inst_dump_intel_format(&xedd, buf, 256, appIP);

printf ("overwrite of %p from instruction at %p %s originating from file %s line %d col %d\n",
KnobMemAddrBeingOverwritten, appIP, buf, fileName.c_str(), lineNum, column);
printf (" writing %d bytes starting at %p\n", numBytesWritten, memWriteAddr);
}

177
Pin Code-Cache API
The Code-Cache API allows a Pin Tool to:
Inspect Pin's code cache and/or alter the code cache
replacement policy
Assume full control of the code cache
Remove all or selected traces from the code cache
Monitor code cache activity, including start/end of
execution of code in the code cache

178
Pin Code-Cache API
VOID DoSmcCheck(VOID * traceAddr, VOID * traceCopyAddr, USIZE traceSize, CONTEXT * ctxP) {
if (memcmp(traceAddr, traceCopyAddr, traceSize) != 0) /* application code changed */ {
// the jitted trace is no longer valid
free(traceCopyAddr);
CODECACHE_InvalidateTraceAtProgramAddress((ADDRINT)traceAddr);
PIN_ExecuteAt(ctxP); /* Continue jited execution at this application trace */ } }

VOID InstrumentTrace(TRACE trace, VOID *v) {
VOID * traceAddr; VOID * traceCopyAddr; USIZE traceSize;

traceAddr = (VOID *)TRACE_Address(trace); // The appIP of the start of the trace

traceSize = TRACE_Size(trace); // The size of the original application trace in bytes
traceCopyAddr = malloc(traceSize);

if (traceCopyAddr != 0) {
memcpy(traceCopyAddr, traceAddr, traceSize); // Copy of original application code in trace
// Insert a call to DoSmcCheck before every trace
TRACE_InsertCall(trace, IPOINT_BEFORE, (AFUNPTR)DoSmcCheck,
IARG_PTR, traceAddr,
IARG_PTR, traceCopyAddr,
IARG_UINT32 , traceSize,
IARG_CONTEXT,
IARG_END);
} }

TRACE_AddInstrumentFunction(InstrumentTrace, 0);

179
Transparent debugging, and extending
the debugger
Transparently debug the application while it is
running on Pin + Pin Tool
PinADX: Customizable Debugging with Dynamic
Instrumentation CGO2012

Use Pin Tool to enhance/extend the debugger
capabilities
Watchpoint: Is order of magnitude faster when
implemented using Pin Tool
See previous Symbols: Accessing Application Debug Info from
a Pin Tool

Which branch is branching to address 0
Easy to write a Pin Tool that implements this

180
Debug Application while Running
Pin
Useful for Pin-based emulators
User can debug application while emulating

Provide advanced debugging features with Pin:
Stack monitoring features
Buffer overrun detection
Reverse debugging
Write your own debugger extension via Pin

181
Nave Solution Wont Work
Why cant we just debug normally?
Debugger sees Pin state, not application state
Pin recompiles application code
Instructions wrong, registers wrong, PC wrong,
Pin
Application
T
o
o
l

GDB
Pined process
?

182
Pin Debugger Interface
GDB debugs application (not Pin itself)
Leverage GDB remote protocol ABI
Application
T
o
o
l

GDB
Debug
Agent
Pin
GDB remote
protocol (tcp)
Pin process
(unmodified)

183
1. Run Pin with -appdebug

2. Start GDB, enter target remote

3. Set breakpoints, etc. Continue with cont

$ pin -appdebug -t tool.so -- ./application
Application stopped until continued from debugger.
Start GDB, then issue this command at the (gdb) prompt:
target remote :1234
Debug the Application with Pin
$ gdb ./application
(gdb) target remote :1234
(gdb) break main
(gdb) cont

184
Extending the Debugger

Normal debugging with Pin useful but limited

Extending the debugger:
Add GDB commands via a Pin tool
Stop at semantic breakpoint via
instrumentation

185
Pintool 4: Stack Debugger
$ pin -appdebug -t stack-debugger.so --
./app

$ gdb ./app
(gdb) target remote :1234
(gdb) monitor stackbreak 4000
Break when thread uses 4000 stack bytes.
(gdb) cont
Thread uses 4004 stack bytes.
[]
(gdb) monitor stats
Maximum stack usage: 8560 bytes.
Commands implemented
in Pintool

186
Stack-Debugger Instrumentation
Thread Start:

[]
sub $0x60, %esp

cmp %esi, %edx
jle <L1>
size = StackBase - %esp;
if (size > MaxStack) MaxStack = size;
if (size > StackLimit) TriggerBreakpoint();
StackBase = %esp;
MaxStack = 0;
After each stack-changing
instruction

187
ManualExamples/stack-debugger.cpp
instrumentation routine
analysis routine

VOID Instruction(INS ins, VOID *)
{
if (INS_RegWContain(ins, REG_STACK_PTR))
{
IPOINT where = (INS_HasFallThrough(ins)) ?
IPOINT_AFTER : IPOINT_TAKEN_BRANCH;
INS_InsertCall(ins, where, (AFUNPTR)OnStackChange,
IARG_REG_VALUE, REG_STACK_PTR,
IARG_THREAD_ID, IARG_CONST_CONTEXT, IARG_END);
}
}
VOID OnStackChange(ADDRINT sp, THREADID tid, CONTEXT *ctxt)
{
size_t size = StackBase - sp;
if (size > StackMax) StackMax = size;
if (size > StackLimit) {
ostringstream os;
os << "Thread uses " << size << " stack bytes.";
PIN_ApplicationBreakpoint(ctxt, tid, FALSE, os.str());
}
}
Triggers debugger
breakpoint
Insert only after instructions
that write to %esp

188
ManualExamples/stack-debugger.cpp
int main() {
[]
PIN_AddDebugInterpreter(HandleDebugCommand, 0);
}
BOOL HandleDebugCommand(const string &cmd, string *result) {
if (cmd == "stats")
{
ostringstream os;
os << "Maximum stack usage: " << StackMax << " bytes.\n";
*result = os.str();
return TRUE;
}
else if (cmd.find("stackbreak ") == 0) {
StackLimit = /* parse limit */;
ostringstream os;
os << "Break when thread uses " << StackLimit << " stack
bytes.";
*result = os.str();
return TRUE; }
return FALSE; // Unknown command }
Hooks the GDB monitor command. E.g.:
(gdb) monitor stats
(gdb) monitor stackbreak 4000 Receives text after monitor
This string written to GDB session

189
Other Debugger Tools

Breakpoint on buffer overrun
Debug from a recorded log file
Reverse debugging from a recording
Design your own custom debugger tool

190
Pin Debugger Internals

191
Pin Debugger Interface
Application
T
o
o
l

GDB
Debug
Agent
Pin
GDB remote
protocol(tcp)
Pin process
(unmodified)

192
Communication Details

Very low level
Symbol processing in GDB
Expression evaluation in GDB
Reusing GDBs remote debugging interface
Many debuggers have interface like this
GDB
Debug
Agent
Pin
Commands
Read / write registers, memory
Set breakpoints
Continue, single-step, stop
Notifications
Breakpoint triggered
Caught signal
Application exited

193
Commands / Notifications
Virtualized

Commands
Read register -> read virtualized app register
Set breakpoint -> set virtual breakpoint
Single step -> step VM one app instruction

Notifications
Signal notification -> on virtual signal in VM
Breakpoint notification -> on virtual BP in VM

194
Single Step
Original code
Code cache
1
2
3
1
4
5
6
step complete
notification
GDB Pin
Execution stops in Pin
Waits for GDB to continue
do single-step

195
Breakpoint
Original code
Code cache
1
2
3
1
4
5
6
breakpoint
notification
GDB Pin
Execution stops in Pin
Waits for GDB to continue
BP
2
3
set breakpoint
at 4
continue

196
Part4 Summary
Boldly went where no pin head has gone before
Lived to tell the tail

membuffer_threadpool tool
Using multiple buffers in the Pin Buffering API
Using Pin Tool Threads
Using Pin and OS locks to synchronize threads
System call instrumentation
Instrumenting a process tree
CONTEXT* and IARG_CONTEXT
Decode API
Pin Code-Cache API
Transparent debugging, and extending the debugger

197
Part5 Performance #s

198
Pin Performance (Windows)

199
Pin Performance (Windows)

200
POV-Ray
0.0X
0.5X
1.0X
1.5X
2.0X
2.5X
3.0X
3.5X
4.0X
2 threads 4 threads
S
p
e
e
d
u
p

(
R
e
l
a
t
i
v
e

t
o

S
i
n
g
l
e

T
h
r
e
a
d
)
CINEBENCH
0.0X
0.5X
1.0X
1.5X
2.0X
2.5X
3.0X
3.5X
4.0X
2 threads 4 threads
S
p
e
e
d
u
p

(
R
e
l
a
t
i
v
e

t
o

S
i
n
g
l
e

T
h
r
e
a
d
)
Scalability of Workloads on Pin Scalability
Pin VMM serialization does not impact scalability of the application
Execution in the code cache is not serialized
Scalability may drop due to limited memory bandwidth (MemTrace) or
contention for tool private data (MemError)
No Instrumentation
BBCount
Native
MemTrace
MemError

201
Kernel Interaction Overhead
assa
System Calls 12X
Exceptions 10.5X
APCs 3X
Callbacks 1.8X
Slowdown Relative to Native per Kernel Interaction
Cost of a trip in Pin VMM for each system call is high
~3000 cycles in VMM vs. ~500 cycles for ring crossing
Future work: a faster path in VMM for system calls
Illustrator Excel CINEBENCH POV-Ray
System Calls 1,659,298 658,683 101,700 75,313
Exceptions 1 0 0 0
APCs 6 6 24 24
Callbacks 73,062 68,767 961 7,682
Overhead vs. Total
Runtime
3.3% 2.8% <1% <1%
Kernel Interaction Counts
Total overhead for handling kernel interactions is relatively low
Kernel interactions are infrequent for majority of applications

202
Overall Summary
Pin is Intels dynamic binary instrumentation engine

Pin can be used to instrument all user level code
Windows, Linux
IA-32, Intel64, IA64
Product level robustness
Jit-Mode for full instrumentation: Thread, Function, Trace, BBL, Instruction
Probe-Mode for Function Replacement/Wrapping/Instrumentation only.
Pin supports multi-threading, no serialization of jitted application nor of instrumentation code

Pin API makes Pin tools easy to write
Presented many tools, many fit on 1 ppt slide

Pin performance is good
Pin APIs provide for writing efficient Pin tools


Free DownLoad
www.pintool.org
Includes: Detailed user manual, source code for 100s of Pin tools, tutorials

Pin User Group

Pin: Intel's Dynamic Binary Instrumentation Engine Pin Tutorial

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Pin: Intel's Dynamic Binary Instrumentation Engine Pin Tutorial

Transféré par

Droits d'auteur :

Formats disponibles

Software & Services Group

Vous aimerez peut-être aussi