Vous êtes sur la page 1sur 13

WHAT DO COMPILATION STRATEGIES

HAVE TO DO WITH WEB CONTENT


DELIVERY?
BY MEHRDAD RESHADI

Instart Logics unique


Software-Defined Application Delivery
service can optimize web apps and their
content during delivery via our client-cloud
architecture. Controlling the end-to-end
application delivery path in software gives
us the ability to use a wide range of new
techniques and algorithms to optimize
performance.
In general, the optimization process
involves:
1. Reading and de-serializing content from
file or network into an intermediate
representation (IR) in memory;
2. Applying optimizations and
transformations that change the IR; and
3. Writing and serializing the IR back to
the desired format for disk storage or
network transfer.

An example of this process is reading a JPEG image file into an image representation in memory, transcoding and
compressing that image, and then writing it back to JPEG or a different file format (e.g. WebP or JPEG XR and so on).
Further examples are the transformations we offer which we call
True Fidelity Image Streaming, Dynamic HTML Streaming, and Flash Application Streaming.
This process of de-serializing content to an IR, transforming the IR, and serializing the IR is identical to what compilers
do. You might be more familiar with compilers in the context of programming languages. Many optimizations follow the
same process and can be considered to be compilers. You can find such compilers at the heart of many applications,
such as a datasheet equation engine, a word processor, a database query optimizer, a network router, a web browser,
a web server, even the autocorrect in a simple chat application.
In this article Ill talk about the pros and cons of different compilation strategies and where Instart Logics approach fits
into the big picture. Since compiling programming languages is the better-known, most involved and most complex
type of compilation, Ill explain using that context.

COMPILERS IN PROGRAMMING LANGUAGES


Today, almost all programs are written in some high level programming language that is then translated to machine code for
execution. This is done almost always by a compiler. The main job of a compiler is translating one format to another. More
precisely, compilers manipulate graphs, trees, lists, etc. to create new ones. These data structures may have different file
representations on disk. Serialization/deserialization processes convert the memory representation to/from the file
representation.
For example, an Abstract Syntax Tree (AST) can be represented by a high level programming language while a Control Data
Flow Graph (CDFG) can be represented by machine assembly or binary code. The compiler converts the program into AST,
transforms the AST to CDFG, then generates the executable binary.
Any typical compiler usually has three main phases:
1. The Frontend (or Parser), which reads the file representation and converts (de-serializes) it to a data structure (e.g. C++
to AST) in memory;
2. The Optimizer, which transforms the data structure;
3. The Backend (or Code Generator), which converts (serializes) the data structure to a file or binary representation.

In order to work effectively, the Optimizer and the Backend need to run through many complex algorithms, and would
significantly benefit from knowledge about:
the way users use the application
the capabilities of the devices that the program will run on
For example, if the Optimizer knows which code paths are more likely to be executed, it can order them better in the
memory to get improved cache performance. Also, if the Backend knows the type of the target device, it may be able to
use specific machine instructions that can speed up the program execution.
On the other hand, a program or application itself has typically three main phases:
1. Development time when programmers create, modify, or extend the application
2. Delivery time when the application is being delivered to users
3. Run time when the application actually runs on the users device
Different compilation strategies can be categorized based on which compilation phase is done at which application
phase. This in turn determines the pros and cons of each approach.

STATIC COMPILATION
In standard static compilers, all compiler phases happen at development time. The result of the compilation is executable
machine binary that is delivered and executed on a users machine.
As such:
Pros

Cons

The Optimizer is not under time constraints and can run


very complex algorithms for analyzing and optimizing the
code.

The Optimizer does not know the application behavior


and should just conservatively compile for all possible
cases. The Backend does not know about the actual
target device and should rely on the least common
denominator set of instructions and characteristics.

Example of static compilations include the GCC, Visual Studio and LLVM compilers for languages such as C and C+
+.

JUST-IN-TIME (JIT) COMPILATION ON VIRTUAL MACHINES


(VMS)
Some languages parse the language syntax to low level format (usually known as bytecode) and then use a VM at
runtime to execute this binary. In such cases, parsing happens at the development time, bytecode is delivered, and then
the optimization and Backend phases in the VM convert the bytecode to be executable on a users machine.
As such:
Pros

Cons

The Optimizer can observe the application behavior by


the user and make better optimization decisions.

The Optimizer runs right before execution and hence is


under time constraint to finish quickly. Often only simple
and fast optimization algorithms can run because both
the runtime of an algorithm and the quality of generated
results must be balanced.

The Backend knows about the actual target device it is


running on and can use all special instructions.

Examples of JITing and virtual machines include the JVM for Java, Dalvik on Android and the .NET framework for C#, and VB.net.

INTERPRETATION AND JITING OF SCRIPTING LANGUAGES


While usually the typed dynamic languages use the VM plus bytecode approach, the dynamic untyped scripting languages
tend to ship and run the source directly on the user machine. In such cases all phases of the compiler (Frontend,
Optimization, and Backend) run on the target device when the user runs the application.
As such:
Pros

Cons

The Optimizer can observe the application behavior by


the user and make better optimization decisions.

The Frontend, Optimizer, and Backend run right before


execution and hence are under extreme time constraints.
In some cases, these approaches rely on interpretation
instead of compilation and suffer from performance loss
against the other typed languages; although increasingly
a lot of scripting languages mix interpretation and JITing
to achieve better performance results.

The Backend knows about the actual target device it is


running on and can use all special instructions.

Examples of dynamic scripting runtimes include JavaScript, Python, PHP, and Ruby.

AHEAD OF TIME (AOT) COMPILATION


Although the dynamic compilation approaches can potentially offer better optimizations because of in-depth knowledge of
application and device behavior, they cannot often benefit practically from this knowledge because they have little time
available at runtime to apply related optimizations.
To address this issue, AOT approaches move the compilation and code generation in front of runtime and apply it at
installation time. This approach is usually only applicable to typed languages that deliver fully-typed bytecode instead of
source as a complete package.
As such:
Pros

Cons

The Optimizer and Backend do not run at runtime and can afford
to use more complex and time consuming algorithms.

The Optimizer cannot use the application behavior to


make better optimization decisions.

The Backend does know about the specific device and can use
special features when available.

Some language features (such as reflection) are usually


disabled in AOT mode.

Examples of this approach include NGen for .NET framework on Windows, ART on Android, and Xamarin compiler for iOS.

PROFILE-DRIVEN STATIC COMPILATION


Although dynamic approaches can rely on user behavior to better optimize the application, they usually face two problems:
There is not enough time to run complex optimization algorithms at runtime. Even in AOTs case, the users patience is
limited at installation time. They have access to only one users usage of the application, and that knowledge is not always
usable across multiple runs of the same application by the same user. There are cases where the Optimizer prefers
statistically significant insights in the application usage for better and lower-overhead optimization. For example, to better
optimize the two branches of a conditional block, it would help to know the likelihood of each branch being taken. This data,
along with the knowledge of how branch predictor works in the target device, can help the Optimizer to generate better
code. There are also cases where the Optimizer can make speculative decisions, but needs to ensure the overall
performance of the application is helped. To address this issue, profile-driven (a.k.a profile-guided) static compilation
approaches rely on collecting a usage profile of the application in different scenarios in order to better optimize the
application. However:
Pros

Cons

The Optimizer and Backend can use application behavior by


potentially multiple users to optimize the application better.

The Backend still does not know about all possible target
devices and cannot use all possible instructions available
in all machines. Since the profile based decisions are
hard coded in the application, if over time the user
behavior changes, the application cannot adapt itself to it.

OPTIMIZATION DURING DELIVERY


In the world of the web and rapidly-changing connected applications, the application is constantly delivered to all users. This
opens the door for an interesting new way of running compilers and optimizers during delivery time.
This is almost like having the profile-driven compilation approach, with the difference that the optimizations can be
constantly updated based on the set of observed devices and user behaviors. As such:
Pros
The Optimizer runs offline on powerful servers (in the cloud),
therefore there is no major time constraint and it can run complex
algorithms as it sees fit.
The Optimizer can collect and deploy many different user
behaviors and also quickly adapts to the changes it observes in
the data.
The Optimizer and Backend can generate different versions for
each device and then deliver the right version to that device.

Cons

At Instart Logic, we use this approach to optimize different elements of web apps before delivering various components of
the application (HTML, images, JavaScript, etc.) to the device. In addition to the above benefits, this approach is also the
least intrusive for both our customers and their users. Developers at our customers do not need to change their
development flow and can work as usual. The end users of these customers also will not notice any change in the way they
interact with the web app. Instart Logics service constantly monitors and optimizes the applications based on user
behaviors, browser/device types, etc. Every time the customer updates the application, the optimizations automatically
restart again and get applied.
Depending on the type of optimization, the input/output formats may be the same or different. For example, in the case of
HTML Streaming, the input is the HTML page that is being served to the user. Our service monitors the contents of the
HTML pages and extracts the common parts of the page which then caches and sends it faster to users. In other words, the
service parses the HTML, creates a model of the page, modifies this model, and then generates modified HTML to be
served to the user (frontend + optimization + backend).
In the case of Image Streaming, the input format might be a bitmap which then the Instart Logic service converts to
progressive JPEG and then streams. In this case, input images are decoded, modified in the memory, and then encoded
into the proper format (again, frontend + optimization + backend).
Instart Logic's compilation strategy is cloud-based, implicitly crowd-sourced, adaptive based on data analytics and machine
learning, and targets web and mobile. When was the last time you worked on a project that touches on so many cool and
hot topics!

www.instartlogic.com/blog/

Vous aimerez peut-être aussi