Académique Documents
Professionnel Documents
Culture Documents
Abstract
This paper describes why Object-Serialization is not appropriate for providing persistence in Java. With numerous
code examples, Object-Serialization is shown to be easy to work with initially which seduces the developer into
relying on it for persistence within more complex applications. The advanced use of object-serialization requires
significant work from the programmer, something that is not apparent at first. The use of object-serialization together
with static and transient fields and within multi-threaded programs is discussed together with the big inhale
problem: the need to read in the entire object graph before processing over it can commence. The complexity
of using object-serialization within a distributed environment, when evolving classes and when using specialised
classloaders is also discussed. The paper compares the performance of serializing and deserializing a byte array and
binary tree of the same data size to and from an NFS mounted disk and two kinds of local disk. Alternative solutions
to object-persistence in Java are presented at the end of the paper.
1 Introduction
Since 1995, Java has become a very popular programming language, receiving a great deal of attention both in
academia and industry. One package that must be provided by all implementations of Java is the java.io package
[GJS96, pg 113]. When used in a certain way, this package can be used to provide a form of object persistence so
that a specified subset of the state of a program can outlive the lifetime of the process that created it. This is achieved
by serializing an object graph into a file on disk via an in-memory buffer. Serializing a graph of objects is the process
of flattening that graph into a form suitable to be written to disk or passed across a network. The reverse operation,
deserialization, is the act of taking the data from disk or the network and reconstituting a graph of the same shape.
Javas serialization technology can be used for other purposes such as passing object graphs by value in remote
method invocations. This paper only discusses the use of serialization for object persistence.
Serialization and deserialization are provided by the java.io package via the java.io.ObjectOutputStream
and java.io.ObjectInputStream classes. Figure 1, overleaf, contains the code1 to serialize a single object into a
file.
Line 6 opens a new file object to which the serialized object can be written. Line 7 uses this object when creating
a new ObjectOutputStream. Line 8 initialises the object we wish to serialize, line 9 serializes that object into the file
and line 10 closes the file.
In figure 2 (line 6) we first create an object connected to the underlying file. This is used to create an ObjectInputStream object (line 7) which will perform the deserialization of the objects in the file. Line 8 defines a String
object to hold the result of deserialization. Line 9 calls the readObject method on the ObjectInputStream object which
Parts
of this technical report have appeared in Java Report for October 2000 www.javareport.com.
handling code has been removed from the code fragments in this paper for brevity. They are included only when necessary for the
discussion.
1 Exception
import java.io.*;
2
3
4
5
6
7
8
9
10
11
12
import java.io.*;
2
3
4
5
6
7
String s = null;
s = (String) in.readObject();
10
11
12
13
System.out.println(s);
file.close();
}
}
deserializes the object in the file, creating a copy of it in memory. Line 10 outputs this String to show it has the same
contents as the serialized String. Line 11 closes the file.
The above code is very concise and easy to write. The programmer only needs three objects to perform object
serialization and deserialization to and from a file. The same code can also be used to make an entire object graph
persistent as the serialization mechanism works by transitive reachability. If the object passed to the writeObject
method contains references to two other objects, for example, the passed object and the two objects reachable from
it will be serialized. If either of these objects contain references themselves, the referenced objects will also be
serialized. This kind of example makes it look very easy to gain simple and cheap (in terms of programmer time)
object persistence.
A programmer may make use of this mechanism as a quick solution to a simple problem that needs object persistence. The problem has been solved quickly and easily; therefore, it is quite likely the code will kept by the
programmer. However, if this is the case, the programmer has been seduced by the apparent simplicity of the object
serialization system. The implications of relying on object serialization for object persistence are huge and the example above is so simplistic that it hides all the pitfalls that become apparent when using the mechanism on a larger
scale.
The rest of this paper argues that Javas object serialization mechanism should not be used to provide objectpersistence within production level code.
1.1
Document Structure
This paper is divided into thirteen main sections. Section 2 discusses in more detail how the serialization and deserialization mechanisms work and how the programmer uses them. Section 3 describes the problem of serializing
and deserializing a file as a single action. Section 4 shows why the copying semantics inherent in the serialization
mechanism can pose problems when using certain data structures, e.g. java.util.Hashtable. Section 5 discusses the
problem of having to reassign static variables. When a graph is sent to disk, a classs static variables are not written
and this state has to be recreated when a deserialization is performed. Section 6 describes why references marked
transient need to be treated in a similar way to statics, as transient variables are also not written by the serialization
mechanism. Section 7 describes the use of serialPersistentFields, an alternative mechanism for specifying which
fields in a class are passed to the serialization mechanism. Section 8 shows that using the serialization mechanism
in a concurrent environment can be problematic. For example, if one thread is used to perform the serialization,
another thread can change the contents of the graph being written, causing inconsistencies at the application-level.
Section 9 discusses the problems that can be raised when using serialization in a distributed environment. Serializing
a graph on the server can cause delay at clients waiting to perform remote method invocations. Section 10 builds
on section 9 to describe the problems that can occur when placing references to remotely invokable objects and the
remotely invokable objects themselves into persistent stores. When a client-side store that contains a reference to
a remotely invokable object is deserialized, the reference to the remote object is automatically re-established. This
will work if the remote object is listening on the port expected, but errors can occur if the remote object is listening
on an alternative port, because, for example, the server crashed and was restarted. Section 11 indicates how using
an application-specific Java classloader can cause problems. Java classes are not serialized to the file, and when the
graph is later deserialized, the required classes need to be accessible by the classloader that was used to load the deserialization code. If the required classes cannot be found, errors will occur. Section 12 describes how class evolution
can be performed and how this impacts instances already on disk when reading them within the context of an evolved,
related class. Section 13 describes some alternative object persistence technologies that do not rely on serialization.
Section14 concludes this paper.
The code fragments presented in this paper have all been tested with Suns Java 2 and the discussion of the
serialization system is made with respect to that version of the JDK and language definition. Java Object Serialization
shall be referred to as JOS for the rest of this paper.
2.1
Serialization
Section 2.1 describes how a graph of objects is serialized into a file. Firstly, the model used by the serialization
mechanism is presented, followed by all the classes that are necessary to make use of the mechanism. This section
ends with a discussion of how to perform specialised serialization of a graph.
o:O
p:P
r:R
u:U
q:Q
s:S
v:V
t:T
2.2
Deserialization
This section describes object deserialization: the act of reconstituting a graph of objects from its serialized form. This
section describes the classes necessary to perform deserialization and gives a detailed example of how it is performed
together with a discussion of the run-time requirements and its implications.
ObjectInputStream in = null;
2
3
4
5
6
7
try {
in = new ObjectInputStream(new FileInputStream("./store"));
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(1);
}
O o = null;
9
10
11
12
13
14
15
16
17
try {
o = (O) in.readObject();
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(1);
} catch(java.lang.ClassNotFoundException cnfe) {
cnfe.printStackTrace();
System.exit(2);
}
18
System.out.println(o);
Lines 1 to 7 define the ObjectInputStream object that will be used to read the serialized form of the data. The
ObjectInputStream is connected to the underlying file (called ./store) by passing a new FileInputStream to the
constructor of the ObjectInputStream at line 3.
Line 8 defines an object of the correct type to receive the root of the graph. Line 10 reads from the store and
deserializes the graph given on the lefthand side in figure 3, assigning the root to the variable o. Line 18 then prints
out the object to confirm it is the object originally written.
2.3
Comment
Requiring the programmer to specify whether a class implements java.io.Serializable burdens them with the
problem of deciding whether its instances should be capable of being made persistent. Inevitably, the programmer is
going to have to change code when the persistence of previously transient objects is required. This assumes that the
source code or a bytecode rewriting tool is available and that changing either code to add java.io.Serializable
does not change some other aspect of the application. For example, having a class implement
java.io.Serialization allows instances of it to leave the virtual machine that created them2 and this may not be
realised when introducing Serializable for persistence. This point is further discussed in section 10.2.3 within the
section on combining this form of persistence with distribution.
4 Copying Semantics
The JOS mechanism is based on the copying of objects to and from a stream. In certain circumstances, taking a copy
of an object can break code. This section describes this problem with the aid of the code in figures 6 and 7.
Figure 6 defines two classes, Key and Value. An instance of each will be put into the hash table that is created in
the main method in figure 7.
In main we create a new Hashtable object called table. Single key and value objects, k and v respectively, are
created and v is added to the table, using k as the key. The value stored under k is then retrieved using table.get(k)
and displayed on standard output. As Value overrides the toString method, the string I am a Value is displayed.
The table object is then passed to the serialize method, which serializes the object into a file ./test and closes
it. deserialize() opens ./test and reads the root object from the file, passing it back. The code in main casts the
object to a Hashtable, assigning it to new table. The new table contains the same information as table, as can be
2 The
3 This
package copying;
package copying;
4.1
Comment
This problem has happened because the serialization mechanism copies objects, and the implementation of Hashtable
assumes that keys and values remain in the same memory location. This assumption is false when we consider the
hash table in a persistent environment.
The solution adopted above requires the software engineer to build information into their class that allows instances to be used across multiple program executions. The hashCode and equals methods have to be overloaded to
ensure this information is returned so that two objects can be distinguished. This can add considerable complexity to
the code and requires the programmer consider the identity of certain objects, when the default equals method (as
used by Key in figure 6) should suffice.
5 Static References
The serialization mechanism does not serialize static or transient fields. If a field is marked as static, the
default behaviour is to write a default value, e.g. 0 for int and null for object references, when the object containing
those fields is written. This is because the static field is associated with the class and not an object of that class. The
serialization mechanism does not wite the class to the store and so static fields are not preserved. The meaning given
package copying;
import java.io.*;
import java.util.Hashtable;
public class Test
{
public static void serialize(Object root)
{
ObjectOutputStream out = null;
FileOutputStream out_file = null;
try {
out_file = new FileOutputStream("./test");
out = new ObjectOutputStream(out_file);
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(1);
}
try {
in_file = new FileInputStream("./test");
in = new ObjectInputStream(in_file);
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(1);
}
try {
out.writeObject(root);
out.close();
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(2);
}
Object o = null;
try {
o = in.readObject();
in.close();
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(2);
} catch(java.lang.ClassNotFoundException cnfe) {
cnfe.printStackTrace();
System.exit(3);
}
return o;
}
public static void main(String argv[])
{
Hashtable table = new Hashtable();
Key k = new Key();
Value v = new Value();
table.put(k, v);
System.out.println("Value: " + table.get(k));
System.out.println("Writing table: " + table);
serialize(table);
Hashtable new_table = (Hashtable) deserialize();
System.out.println("Reading table: " + new_table);
System.out.println("Value: " + new_table.get(k));
}
}
package copying;
public class Key implements java.io.Serializable
{
int hashcode = 0;
public Key()
{
hashcode = System.identityHashCode(this);
}
public String toString()
{
return "I am a Key";
}
public int hashCode()
{
return hashcode;
}
public boolean equals(Object o)
{
return this.hashcode == ((Key) o).hashcode;
}
}
Figure 10: Output from Running the Code Fragment in Figure 7 using the Definition of Key from Figure 9
package singleton;
public class Singleton
{
private static Singleton s = null;
private Singleton()
{
}
public static void initialise()
{
if(s == null)
s = new Singleton();
}
public static Singleton getReference()
{
return s;
}
}
Figure 11: Singleton Class Definition
to transient has been interpreted differently by different groups [NEED]. However, the interpretation used by JOS is
that fields marked in this way are not written to the store.
Consider the following class, Singleton (figure 11) that implements4 the Singleton pattern[GHJV96]. A singleton class is one such that only one instance can exist within a virtual machine.
The class contains a private static reference to the single instance, and the constructor is private so that it cannot
be called from code outside this class. It is initialised through the initialise method which tests the instance for
null and only allocates the object if it is. The method getReference returns a copy of the reference to the Singleton
instance.
If a programmer were to make use of the singleton reference in a persistent context, they could do so as depicted
in figure 12.
Singleton.initialise();
store.writeObject(s_ton);
Figure 12: Writing the Singleton Reference to a Store
The Singleton is initialised via the static method initialise and a reference to it is passed to the reference s ton
in line 2. This object is then passed to the serialization mechanism in line 3 and written to the store. If this process is
terminated and re-started, the Singleton object can be read from the store and reassigned to s ton as in figure 13.
However, because the serialization and deserialization mechanism does not handle static references, it is possible
to call Singleton.initialise again and have two Singleton objects in the same process, breaking the required
semantics. This is possible because when the program deserializes the object from the store the Singleton class is
found and its static references are initialised. This causes the s reference inside the Singleton class to be initialised to
null. A subsequent call to initialise will then reinitialise s with a reference to another object.
4 The class implements the singleton pattern as described in the May98 edition of JavaWorld
http://www.javaworld.com/javaworld/javatips/jw-javatip52.html.
10
6 Transient References
If the programmer wants to ensure that a reference is not followed during a serialization then they can mark the
reference in the class as transient. In [GJS96, pg 147] the authors describe that marking a variable as transient
indicates that it will not be part of the persistent state of an object, although the specification does not give details
of the system services that would make an objects state persistent. In JDK 1.2, transient is defined as being not
serializable. If an object reference is marked as transient, null is serialized so that on deserialization, the object
reference is initialised with null. On deserialization, primitive types that are marked transient are initialised to
default values, e.g. integer values are initialised to 0 and single characters to the Unicode value nu00005.
For example, if the programmer marked the variable r in figure 14 as transient and did not provide readObject
or writeObject methods, then non of the objects reachable from r would be serialized6 . The same effect can be
achieved by providing readObject and writeObject methods and not passing r to the serialization mechanism
(figure 14).
public class P
{
R r;
S s;
private void readObject(java.io.ObjectInputStream stream)
throws IOException, ClassNotFoundException
{
s = (S) stream.readObject();
}
private void writeObject(java.io.ObjectOutputStream stream)
throws IOException
{
stream.writeObject(s);
}
}
11
In addition, all the objects referred to from r are not serialized. If part of this sub-graph needs serializing,
a different non-transient reference is required to point to the root of the sub-graph. As transient references are
not serialized, any object state that the program relies on needs to be defined when the file is deserialized. As
the programmer has marked a field transient, they have to ensure that any state is reinitialised when the store is
deserialized. Such state can be initialized within the readObject method, for example, to re-open a file or socket
connection.
For all these reasons, use of the transient keyword requires careful thinking.
package fields;
import java.io.ObjectStreamField;
public class A implements
java.io.Serializable
{
private B b;
private C c;
private final static
ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("b", B.class),
};
public A()
{
b = new B();
c = new C();
}
package fields;
12
ObjectStreamField describing the field to be serialized. The above description says that the field called b of type
B.class should be serialized. As field c is not mentioned, it will not be serialized.
Serializing an instance of A into a file and deserializing from it gives the output in figure 16 when the instance
variables of A are printed.
Before serialization:
I am an A
I am a B
I am a C
After serialization:
I am an A
I am a B
null
7.1
Comment
This mechanism is an alternative to using the transient keyword. It provides the same semantics, but the programmer specifies what is to be serialized, whereas transient specifies what is not to be serialized.
serialPersistentFields may also be changed at run-time, whereas transient can only be changed at compiletime. This allows the programmer to defer the decision of transience to run-time, rather than having to make it
statically at compile-time. However, the system has a number of implications.
13
package fields;
import java.io.ObjectStreamField;
public class A implements java.io.Serializable
{
private B b;
private C c;
private final static
ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("b", B.class)
};
public A()
{
b = new B_Subtype();
c = new C();
}
}
No null References
If serialPersistentFields contains any null references, the serialization mechanism aborts, throwing a
NullPointerException. In our example where we want to change between only two object references, this is not
a problem. However, if A had more than two fields, being able to leave elements of the array null and have them
skipped over would very convenient. As this is not possible, the run-time flexibility of this mechanism is severely
limited.
15
Figure 22: The Result of Changing serialPersistentFields between Serialization and Deserialization
16
17
7.2
Summary
8 Concurrency
Consider a program consisting of two threads running at equal priorities: one thread is devoted to periodically serializing the object graph given in figure 27, while the other thread mutates its contents.
Assume each class in the object graph is a subclass of NameAddress where NameAddress defines four strings,
name, addr1, addr2 and postcode as well as an update method that takes four string parameters and assigns them
to the relevant fields (figure 28).
Also assume that each class in figure 27 overrides the update method to call update on its superclass, thus setting
its state, and then calling update on objects it references. For example, type Q is defined in figure 29.
A call to qs update method causes update to be called on t. Every type also contains an overloaded toString
method that returns a string representation of the objects fields, together with a string representation of the objects
that are reachable from it. Each string returned starts with the type of the object on which the method has been called.
18
package fields;
package fields;
import java.io.ObjectStreamField;
import java.io.ObjectStreamField;
public A()
{
b1 = new B();
b2 = new B();
c = new C();
}
}
kona:huw% od -c test
0000000 254 355 \0 005
s
r \0 \b
f
i
e
l
0000020 350 267
6 237
e
@ 002 \0 002
L
0000040
t \0 \n
L
f
i
e
l
d
s
/
B
0000060
b
2
q \0
\0 001
x
p
s
r \0
0000100
l
d
s
.
B 017 305
K 177 311 312 203
0000120
I \0 001
x
I \0 001
y
x
p \0 \0
0000140 \0 006
s
q \0
\0 003 \0 \0 \0 005
0000160
d
s
\0 002
;
L
\b
f
V 002
\0 005
\0 \0
.
A
b
1
\0 002
i
e
\0 002
\0 \0
\0 006
19
o:O
p:P
r:R
q:Q
s:S
t:T
20
8.1
In figure 30, the store write thread serializes the graph and writes it to disk.
FileOutputStream
file;
ObjectOutputStream out;
boolean
more = true;
while(more)
{
try {
file = new FileOutputStream("./test");
out = new ObjectOutputStream(file);
out.writeObject(o);
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(2);
}
System.out.print(".");
try {
Thread.yield();
} catch(Exception e) {
e.printStackTrace();
}
}
8.2
The graph mutation thread changes the four strings in each object by calling update on the root object o, passing in
the same string, generated from a monotonically increasing integer (figure 31).
boolean more = true;
int
i
= 0;
String i_s = null;
while(more)
{
i_s = String.valueOf(++i);
o.update(i_s, i_s, i_s, i_s);
System.out.print("|");
}
may be other runnable system threads waiting as well. However, the graph mutation thread will eventually run.
21
setting its own fields, and then calls update on p and then q. Each object in turn calls update for itself, via its
superclass and then on any reachable objects.
In addition, object q also calls Thread.yield. This is to ensure there is concurrent activity between the mutation
and store write thread to simulate the activity of library code. Consider the call to Thread.yield to be in-between the
call to super.update and t.update so the graph is written to disk while an update has only partly been performed
on it. In code from third parties it would not be possible to know at which point a thread yielded to another, for
example, code may explicitly call Thread.yield or it may implicitly give up control, via a call to Thread.sleep or
by performing I/O activity.
We define a consistent graph to be one that contains the same value in each corresponding field of every object in
the graph. For example, if i s above contains the string 9, a consistent graph is one that, for each node o to t,
every field in each object contains the string 9.
8.3
When the two threads are run we see a number of .| characters written to standard output. This indicates that the two
threads are running, and that they are yielding to each other, ensuring the graph is mutated during a write to disk. The
program is stopped after a while9 and another program is run over the contents of the store to retrieve the last data
successfully written.
8.4
To deserialize the store we run the code given in figure 32. We create an in stream, define an object o to hold the
result of the deserialization, deserialize the graph with a call to in.readObject, assigning it to o, and then print the
contents of the graph.
ObjectInputStream in = null;
FileInputStream file = null;
try {
file = new FileInputStream("./test");
in = new ObjectInputStream(file);
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(1);
}
O o = null;
try {
o = (O) in.readObject();
} catch(java.io.IOException ioe) {
ioe.printStackTrace();
System.exit(2);
} catch(java.lang.ClassNotFoundException cnfe) {
cnfe.printStackTrace();
System.exit(3);
}
System.out.println(o);
22
8.5
In the general case, when integrating multi-threaded code from a number of suppliers, there is no way to tell when
a particular thread will run, relative to the thread that writes the graph of objects to disk. It is possible, as the above
example demonstrates, that the graph may be written while it is being manipulated and in certain circumstances this
can lead to inconsistencies.
In this particular example, as we have control over all the code, we could remove the explicit call to yield as
described above. However, this is not possible in the general case. In order to solve this in a generally applicable way,
we need to lock the graph so that only graph mutation or a write to the store can happen at any one time.
In Java, this can be solved by synchronizing on the root object, o, before calling o.update and
out.writeObject(o) as in figure 34.
// Store Mutation Thread
synchronized(o) {
o.update(i_s, i_s, i_s, i_s);
}
synchronized(o) {
out.writeObject(o);
}
Figure 34: Synchronizing on the Root Object to Solve the Inconsistency Problem
The o object above used in the two threads is a reference to the same object, the root of the graph in figure 27.
This code ensures that only the update method or the call to out.writeObject can be called at any one time. If the
update method is being executed, and the store write thread calls synchronized(o), it will block, until the mutation
thread leaves the synchronized block, ie. the call to o.update completes. This ensures the contents of the serialized
graph are consistent and therefore meets the definition given in section 8.2.
8.6
Discussion
There are several problems with the above solution. Firstly, the entire graph must be locked. If the graph is large,
locking a large number of objects may not be acceptable. In addition to locking the entire graph, any code that
changes the state of any objects in the graph must also respect the lock. In our example, it is trivial to ensure that the
graph is locked as mutation only originates from o.update, and so locking o at this level ensures no other changes
will be made to the graph. However, in a large program, changes to the graph will take place at numerous places in
the code, will be performed by different threads and arbitrary objects within the graph will be changed. Ensuring that
the appropriate lock has been acquired would be very difficult and maintaining this code would be time-consuming
and costly.
Locking the entire graph decreases the amount of concurrent processing the application can perform. If the
application has to wait periodically on a lock to ensure the graph is written consistently, graph processing the
progress of the application will be significantly reduced as serializing a large graph can be time-consuming (see
Appendix A). In the Java thread model ([GJS96, chptr 17]) it is possible to alter the priority of a thread, favouring the
graph mutation thread, by assigning it a higher priority. However, this increases the overall complexity of the program
and the programmer has to manage multiple threads running at different priorities, which can be complicated in itself.
If the serialization thread is run at a lower priority, serialization will be performed less often. Therefore, the data will
23
be written to disk less often, thus increasing the likelihood that the state of a computation will be lost on the event of
a failure and would need to be performed again on deserializing the file.
Rather than locking the entire graph, locking on a per object basis may be acceptable in certain circumstances, e.g.
when consistency within a single object is preferred over the consistency of the whole graph. However, this requires
the use of readObject and writeObject methods to be defined on each type that will form part of the graph so
that these methods may acquire a lock on themselves, via this, using the synchronized keyword. This significantly
increases the complexity of the code (deadlocks are also more likely to occur) and requires the programmer to identify
the graphs transitive closure. This solution will only work if the code serializing the graph respects the per-object
locks that have been acquired by readObject and writeObject. The Object Serialization system of Suns Java 2
does not acquire any locks on application-level Java objects during the serialization and so programmers would have
to write code to do this themselves.
9 Distribution
Distributed programs are partly characterised by their ability to support genuine process concurrency. This is possible
because these programs make use of more than one host and therefore have access to more than one processor. A
collection of distributed objects in Java communicate by sending messages across a network. These messages take a
finite time to propagate from one host to another and the receiving object must be listening and ready to accept them.
In addition, distributed systems are subject to partial failure. The receiving process may have failed and will no longer
be listening, causing an exception to be raised in the client.
9.1
Consider a multi-threaded server process consisting of two threads, running at the same priority: one thread is responsible for serializing a store to disk as in the previous section; and the other thread, the main thread, creates a remotely
invokable object that offers a single method for distributed invocation, called invoke, which outputs a string at the
server. Another process, the RMI registry holds named references to remotely invokable objects, is listening on a
well known port number. The registry is used by the third process, the client, to obtain a copy of a reference to the
remotely invokable object. The client then calls invoke once per second in an infinite loop.
The server is started, the remotely invokable object is created and its reference is placed into the registry so the
client can obtain a copy of it; then the serialization thread is started. The client process is started, it obtains a copy of
the reference to the remote object from the registry, enters its infinite loop and starts to call invoke.
Whenever the server process is serializes a graph to a store, the request from the client is blocked. This behaviour
is due to the equal priority of the two threads in the server and the implementation of JOS and the RMI mechanism. It
will only be serviced when the serialization has completed. If the serialization is sufficiently complex and takes a long
time, the semantics of the client code may be affected. For example, the client may not be able to tolerate the delay
imposed by the server-side serialization. Once again, it is possible to lower the priority of theserialization thread, but
this requires the programmer to use thread priorities. Although the remotely invokable object is associated with its
own thread, this information is hidden from the programmer and, therefore, the priority of the remotely invokable
objects thread cannot be altered.
9.2
It is possible to serialize a remotely invokable object into a store which can provide a remotely invokable object with
some resilience against failures. For example, assume that the server serializes the remotely invokable object to disk
and then crashes. When the server is re-started, it retrieves its last known state from the store and re-registers the
remotely invokable object reference with the registry. When the server process crashes, the client side will receive a
java.rmi.RemoteException when it next uses the remote reference. If, when this happens, the client periodically
attempts to get another copy of the reference to the remote object, it will eventually retrieve a copy of the value last
placed into the registry when the server process restarted. The server has to re-register a new reference because the
server object is likely to be listening on a different port from before, even though the object, at the Java language
level, is logically the same.
When rebooting such a system it is necessary to read the entire store before the remotely invokable object becomes
available. If the store is very complex, it may be quite a while before a client can make access the remote object. The
programmer has to be aware that this behaviour is likely and needs to add code to handle it. For example, the client
could attempt to get a copy of the reference from the registry, use it, and, if it fails, pause and try again some time
later. However, the client side is affected by the behaviour of the server. This could have a knock-on effect throughout
24
the whole system. If a client has to periodically retry a server that is reinitialising itself from a store, the client may
not be able to deal with incoming calls to its own remotely invokable objects. Behaviour such as this increases the
complexity of the code at the client side, whereas it is usually preferable to have the complexity inside the server.
All the problems described above are also applicable to three-tier systems. However, the problems there are worse
as a pair of client-server interactions are performed within such systems.
10.1
Combining distribution and persistence is a hard problem to solve as the two mechanisms can work against each
other when used in conjunction. It is possible for references to build up between processes running over JOS stores
as a client store can contain a reference to a remote object (see section 10.3.2). If such references are allowed to
build up, they increase store inter-dependencies, reduce their autonomy, making it harder to develop and evolve them.
Furthermore, distributed computation in Java requires that method parameters and results are either passed by copy
or as references to remote objects. If passed by copy, then, for the duration of a remote method, two copies of the
parameters are present in the system, one at the client and one in the server. If a reference to a remote object is
passed, this does not cause consistency problems, but rather performance problems as the network has to be traversed
for each method invocation to the remote object passed as a reference. Problems can also arise if the values passed
between client and server become part of the persistent state on either side. In the case of the transient server, a
value may be passed to the server, where it is used, a result is generated and sent back to the client, with the server
discarding its copy and the client retaining it. However, if the server retains the value passed and makes it part of the
persistent state, consistency problems can occur as replicating information in this way may break the semantics of the
application. Stores can also become very large and if a parameter is passed to a remote object that is close to the root
of persistence, a large part of a store can be passed across the network. This can raise performance issues as well as
semantic issues such as consistency and sharing.
These problems exist whenever distribution and persistence are combined and are not peculiar to using the store
mechanisms provided by Java object serialization. For a more detailed description of the problems encountered when
combining distribution and persistence, see [SA97, Spe97, Spe99]. The rest of this section describes some problems
that are specific to using object serialization in a distributed system.
10.2
Java allows a programmer to control how an object is serialized by providing readObject and writeObject methods
for the class of the object. However, these same methods are called whether the object is being serialized to pass it
across the network or serialized to write it into a file. In certain circumstances, an object may need to be serialized
differently if it is being passed to another virtual machine rather than if it is being written to disk. However, Java
requires the programmer to use the same mechanism for both.
For example, consider the class in figure 35 that is involved in a distributed application that uses serialization
based persistence. The class represents an employee in a company and it consists of: an id, used to uniquely identify
that person; the persons job title; their salary; and a number of appraisal reports, detailing the employees history
with the company.
When serializing an object of this type all the information should be saved and this can be accomplished using
the default behaviour. However, the appraisal history of the employee is deemed sensitive and only of local interest
and should not, therefore, be transferred outside the virtual machine that contains the object. We assume that storing
the appraisals on a local disk is secure.
It is not enough to make the appHistory field transient as this will stop the information from being serialized
to disk. There are many partial solutions to this problem, each with widely varying implications.
25
10.2.3 Discussion
This problem arises because the serialization system gives the programmer one mechanism for dealing with objects
in two different contexts. As this section has shown, distribution and persistence can require an object to be dealt
with differently and Java forces the programmer to code around it. A more flexible solution would be to provide the
programmer with two pairs of methods, one used for serialization to disk, and one for network-based serialization.
This would require the programmer to understand two more methods but it would allow them to distinguish between
the two different cases. The stream used to contain the serialized data would then have to be associated with one of
two modes which the prrogrammer would have to set before using them. This ties the serialized stream to only one
use, or else the programmer would have to be very careful when re-using a stream in the other context to ensure the
correct pair of methods is called.
26
10.3
Further problems can arise if references to remotely invokable objects become persistent on both the client and server
sides.
import java.io.ObjectOutputStream;
import java.io.ObjectInputStream;
import java.io.Serializable;
public class Wrapper implements Serializable
{
remobject obj = null;
private void writeObject(ObjectOutputStream stream) throws IOException
{
stream.writeObject(obj);
}
private void readObject(ObjectInputStream stream) throws IOException,
ClassNotFoundException
{
try {
obj = (rem_object) stream.readObject();
} catch(java.rmi.ConnectException ce) {
System.err.println("Error connecting to remote side");
}
}
}
Figure 36: Handling a Rebind Failure
27
The class Wrapper contains the reference to the remote object of type rem object. When this container object
is read from the store, its readObject method is called by the serialization mechanism. This code attempts to read the
reference to the remote object from the stream and assign it to the field called obj. If the server is listening on the
same port, the read will be successful and the client will have been successfully bound to the server. However, if the
read is not successful as the server is not available as expected, a java.rmi.ConnectException will be thrown. The
programmer can then arrange for the field to be assigned at a later stage and the rest of the deserialization can take
place.
The serialization and distribution mechanisms in Java have forced the programmer to handle this situation using
the readObject and writeObject methods because, if an error is encountered and it is not handled, the entire
deserialization is aborted. Another solution would be to only try the rebind when the reference to the remote object
is used. When the reference is read from the store at the client side there may be very good reasons why the server
is not available, for example, it may have been taken down for maintenance. It is only when the client tries to use
the reference that dealing with the failure is important. This would allow the failure code to be placed as close as
possible to the use of the reference. In the solution above, the failure handling code is artificially grouped with the
deserialization code.
11 Classloaders
Some applications require the use of their own classloaders to retrieve classes in a specialized way that is not supported
by the default Java classloader. Java classes are typically stored as files on disk and they are loaded lazily by the virtual
machine on first usage by the program.
However, in some circumstances, the required classes may reside in remote location or they may require additional
processing, such as decrypting. The default classloader cannot be used in these circumstances and an application
specific classloader is required.
This section describes how the use of an application-specific classloader can interfere with deserialization. The
delegation model of class loading is not used in these examples, however, what is discussed below is applicable to
that model.
Consider again the Singleton example given in figure 13. It reads a graph from a store, rooted at an object of type
Singleton.
Singleton s = null;
s = (Singleton) store.readObject();
Figure 37: Reading a Store Rooted at a Singleton Object
When the store.readObject line of code is executed, the object graph is deserialized and the root object is
passed back to the program. The object is cast from the Object return type defined by readObject, to the type
Singleton. If this test is successful, the root of the graph is assigned to the variable s. However, for the test
to succeed, not only does the cast have to be performed correctly, but class Singleton has to be available to the
classloader. If the class cannot be found, the serialization is aborted with a java.lang.ClassNotFoundException.
During deserialization the class may not be available. This is because the class may have been loaded from a
different virtual machine when the object was first created and written into the store, or the class may have moved
location on a locally available disk to a directory not accessible via the CLASSPATH. If the process is terminated,
restarted and the store is read, the new virtual machine needs to be able to find the class when the object is read
from the store. The class needs to be found again because only information about a class is written into the store,
the classes themselves are not written. It must be possible for the virtual machine to find the class when the cast is
performed. If the class is only available remotely, the programmer must use a classloader capable of loading it, either
an application specific classloader, or one already provided such as the URLClassLoader, RMIClassLoader or the
AppletClassLoader.
In addition to these requirements, the class that contains the code in figure 37 must be loaded by the application specific classloader. If the code was loaded by the default classloader, when the virtual machine attempts
the cast, the default classloader will be used to try to find Singleton. The default classloader will not be able to
find it and a ClassNotFoundException will be generated. Unfortunately, it is not sufficient to explicitly catch the
ClassNotFoundException when readObject fails, find the class using an application-specific classloader and then
try to read from the store again. This is because the virtual machine uses the classloader that was used to load the
class that contains the cast code, and in this case that classloader was the default classloader. This is the reason the
28
class that contains the code in figure 37 must be read in using the application-specific classloader. If this is the case,
when the cast to Singleton is performed, the application-specific classloader will be used to find the Singleton
class, it will succeed and the cast will succeed.
Therefore, to combine this type of persistence with the ability to download classes from remote virtual machines
(or to find them locally, should their location change) requires an application-specific classloader. This assumes that
objects instantiated from the remotely loaded classes are written to the store. If, when the store is deserialized, all
classes can be found locally by the default classloader, the use of an application specific classloader is not necessary.
12.1
Evolution Example
Consider the code in figure 38. We define three classes, Point, Point3D and ColourPoint3D. Class Point captures
a single point in a two dimensional space, using two integers x and y. Class Point3D extends Point adding support
for the third dimension using integer z. ColourPoint3D extends Point3D and defines a colour for the point, using a
String colour. Each class defines two constructors, a no-arg constructor and one that initialises all of its state. Each
class also overloads the toString method to output the objects state.
Note that x and y are defined as protected in Point as the author of this class anticipated sub-types being
defined, but that the author of Point3D defined z to be private and no methods are provided to set or get it.
We have two other classes (Serialize and Deserialize, which are not shown) that are used to serialize
and deserialize an instance of ColourPoint3D to and from a file called test. Class Serialize creates an instance of ColourPoint3D with the values x=5, y=66, z=34, colour="Black" and serializes it to test. Class
Deserialize opens test and connects an ObjectInputStream to it. This object is used to retrieve the contents of
test and assign it to an object of type ColourPoint3D. However, in between writing test and reading from it, we
will evolve Point, Point3D and ColourPoint3D.
29
package evolution;
package evolution;
public Point()
{
x = 0;
y = 0;
}
public Point3D()
{
z = 0;
}
package evolution;
public class ColourPoint3D extends Point3D
{
String colour;
public ColourPoint3D()
{
colour = "Not Defined";
}
public ColourPoint3D(int x, int y, int z, String colour)
{
super(x, y, z);
this.colour = colour;
}
public String toString()
{
return super.toString() + " Colour: " + colour;
}
}
30
package evolution;
import java.util.Date;
import java.io.IOException;
public class ColourPoint3D extends Point3D
{
String colour;
Date date;
public ColourPoint3D()
{
colour = "Not Defined";
date = new Date();
}
public ColourPoint3D(int x, int y, int z, String colour)
{
super(x, y, z);
this.colour = colour;
date = new Date();
}
public String toString()
{
return super.toString() +
" Colour: " + colour + " Date: " + date;
}
}
31
An incompatibility in the stream has been encountered. The object serialization mechanism has noticed that
the class of the instance in the stream is different from the evolved class that we are now trying to use. This
incompatibility is detected because the serialization mechanism calculates a hash value for the two classes. The
serialVersionUID of ColourPoint3D when the stream was written was -1421736715104280864 and the evolved
classs is -5252289778578619139.
This value is the hash of the original ColourPoint3D class. The number is the same as in the exception in figure 40
for the instance in the stream (stream classdesc above). To complete our definition of the evolved ColourPoint3D
class we add this line of code to the class, giving the completed class shown in figure 41.
package evolution;
import java.util.Date;
import java.io.IOException;
public class ColourPoint3D extends Point3D
{
String colour;
Date date;
static final long serialVersionUID = -1421736715104280864L;
public ColourPoint3D()
{
colour = "Not Defined";
date = new Date();
}
public ColourPoint3D(int x, int y, int z, String colour)
{
super(x, y, z);
this.colour = colour;
date = new Date();
}
public String toString()
{
return super.toString() +
" Colour: " + colour + " Date: " + date;
}
}
32
12.1.6 Comment
The contents of the serialized object have been successfully read and assigned to an object whose class is different to
that used when the object was first written. Therefore, Java object serialization provides some support for transferring
the state of objects between different versions of the same class. However, the solution adopted has a number of
implications.
Date is null
The Date value output in figure 43 is null. This is because the serialized object was instantiated from the original
definition of ColourPoint3D which did not define a Date field. Therefore, the object serialization mechanism cannot
assign anything meaningful to this field, so the object reference default of null is used.
However, the date field was added to this object to capture information on when the ColourPoint3D object
was created. In our evolved system, when the pre-evolution object is read from the file, the ColourPoint3D object
is created. We would like to be able to represent this fact in our object. This is not possible as neither of the two
constructors are executed. The object is created by the object-serialization mechanism and its state is initialised from
the contents of the stream. To solve this problem, the programmer could add a pair of readObject and writeObject
methods and create a new date object in the readObject method. This does, however, increase the complexity of
the code. Another solution would be to add a setDate method on the evolved ColourPoint3D class and call this for
every ColourPoint3D object read from the file. This, however, requires the entire graph of objects to be traversed,
looking for objects of the right type to call the method.
33
which captures the changes embodied in T2 plus the changes that T2 did not capture over T. It may not be possible,
or desirable, to express the evolution of T3 solely in terms of T2. Given the use of a single serialVersionUID, this
kind of multiple ancestor versioning cannot be expressed. This is further enforced because serialVersionUID is
defined as final and so cannot be redefined in a sub-type, effectively restricting its meaning to exactly the type it is
defined in.
Implicit Compatibility
The Java Object Serialization Specification [JOS97] in section 5.1 claims that the versioning mechanism provides
automatic handling of classes that evolve by adding fields and classes to the inheritance hierarchy and that the
serialization mechanism will handle versioning without class-specific methods to be implemented for each version.
This assumes that the serialvVersionUID has been added to the evolved class.
This means for those changes that are defined as compatible the programmer does not need to provide classspecific code. The serialization mechanism implicitly deserializes an object from the stream, initialising a compatible
version with the retrieved state. This is appropriate because the programmer does not need to provide any code to
perform the initialisation of the new object. However, this is inadequate for the same reason as the programmer has
little control over how the initialisation is performed. Section 12.4 proposes a different mechanism for performing
the update of objects which is motivated by the change described in the next section.
12.1.9 Solution
One possible solution is to provide a pair of readObject and writeObject methods to handle the conversion between
the two types, as shown in figure 46.
When the stream is deserialized and the original version of ColourPoint3D is found, the stream object is passed
to the readObject method defined on the latest version of ColourPoint3D given in figure 46. The first line in this
method then reads an object from the stream, casting it to what we know it to be, a String. The hashCode method is
then called on the object to generate a long value that we have decided will be the new representation for this colour.
In addition, we also create a new Date object, assigning it to date.
This solution works well. However, [JOS97][section 5.6.2] states that if the version of a class reading the stream
defines a readObject method, readObejct should first call stream.defaultReadObject. Therefore, the code
above is flawed as it does not conform to the specification. If we do call stream.defaultReadObject the code
breaks and we are back where we started, as the String value will automatically be assigned to the long, throwing a
ClassCastException.
34
package evolution;
import java.util.Date;
import java.io.IOException;
public class ColourPoint3D extends Point3D
{
long colour;
Date date;
static final long serialVersionUID = -1421736715104280864L;
public ColourPoint3D()
{
colour = 0;
date = new Date();
}
public ColourPoint3D(int x, int y, int z, String colour)
{
super(x, y, z);
this.colour = colour.hashCode();
date = new Date();
}
public String toString()
{
return super.toString() + " Colour: " + colour + " Date: " + date;
}
}
Figure 45: Output from Failed Deserialization after Second Evolution Step
35
package evolution;
import java.util.Date;
import java.io.IOException;
public class ColourPoint3D extends Point3D
{
long colour;
Date date;
static final long serialVersionUID = -1421736715104280864L;
public ColourPoint3D()
{
colour = 0;
date = new Date();
}
public ColourPoint3D(int x, int y, int z, String colour)
{
super(x, y, z);
this.colour = colour.hashCode();
date = new Date();
}
public String toString()
{
return super.toString() + " Colour: " + colour + " Date: " + date;
}
private void writeObject(java.io.ObjectOutputStream stream) throws IOException
{
stream.writeLong(colour);
stream.writeObject(date);
}
private void readObject(java.io.ObjectInputStream stream) throws IOException,
ClassNotFoundException
{
colour = ((String) stream.readObject()).hashCode();
date = new Date();
}
}
36
12.2
Comment
In this section we wanted to change the type of colour from String to long. This approach caused a
ClassCastException to be thrown. One possible solution was to provide readObject and writeObject methods to
specialise reading of the stream. However, such a solution confuses two object-serialization issues. The readObject
and writeObject methods are used to customise copying of the state of an object to and from the serialized stream.
In figure 46, readObject is being used to specialise reading from the stream for the purposes of evolution. The
implementation of the readObject method is tied to the contents of the original stream, in effect, restricting it to the
version of the original class. The writeObject method writes the contents of the evolved class to the stream, which
is the normal use for this method.
A problem arises when we want to provide a readObject method so that the evolved ColourPoint3D can read an
instance of itself from a stream in a specialised way, which may be unrelated to the question of evolution. However,
if we want to be able to provide the above functionality as well, we have to combine the code in the readObject
method. This requires code to distinguish between two streams, which is not easy as the hash code describing the
class and its compatibility is not available via a method call.
As the readObject and writeObject methods are defined on the evolved class the programmer is forced to
combine code in this way. A better solution would be to separate the two concerns into two classes: the evolved class
and a special class used to perform the translation between the original and new class, generating an instance of the
new, evolved class. This approach is described in more detail in section 12.4.
12.3
The third kind of change we examine is to modify the class hierarchy. During the development of trees of objectoriented classes, it is common to move a class within the hierarchy. In our example, we swap the positions of
ColourPoint3D and Point3D, so that the functionality for colour is introduced before support for three dimensions.
This means that ColourPoint3D extends Point and Point3D extends ColourPoint3D10 . The modified code is
shown in figure 48.
The main changes are to the constructors Point3D and ColourPoint3D, as they call different super-types; also
Point3D now defines a serialVersionUID to ensure compatibility with the previous definition of Point3D given in
section 12.1. Point has not been changed.
Deserialize reads the stream containing the serialized Point state, casting the result to a ColourPoint3D
object. The current definition of ColourPoint3D as shown in figure 48 is a direct descendant of Point, so it does not
define a z field. Therefore, when the stream is deserialized and the ColourPoint3D is created, it only contains, x, y,
colour and date values (figure 49). Thus, the evolution has caused information to be lost.
10 This
class names are retained for purposes of explanation. However, in reality, these would be changed also to reflect the new semantics.
37
package evolution;
package evolution;
import
import
import
import
import
java.util.Date;
java.io.IOException;
java.io.ObjectInputStream;
java.io.ObjectOutputStream;
java.io.ObjectInputStream.GetField;
public ColourPoint3D()
{
colour = 0;
date = new Date();
}
38
12.4
This section briefly outlines an alternative design for combining the support for evolution with the basic objectserialization mechanism. The code presented in this system is valid Java, however, it will not run as the necessary
support from the underlying serialization mechanism does not exist and GetField would need to be re-implemented.
This section has shown that the support for versioning in the object-serialization mechanism is closely coupled
with the normal serialization and deserialization mechanism. This is mainly because readObject and writeObject
are used for expressing conversion of the state between differing versions of the same type and for expressing nondefault object serialization and deserialization. This design proposes separating these two concerns. readObject
and writeObject are only to be used for specialising how an object is serialized and deserialized at that version.
Support for translating between two versions of the same type is handled by a different object that is registered with
the ObjectInputStream object. When an object instantiated from the old type is encountered in the stream, the
stream is passed to the conversion routine. This method then: gains access to all the fields of the old class; converts
them; and then generates an instance of the new type, returning this as a result to the object-serialization mechanism,
which makes this new object part of the deserialized graph of objects.
39
package evolution;
import java.io.ObjectInputStream;
import java.io.IOException;
public interface JOS_ConvertToCallback
{
public Object read(ObjectInputStream stream) throws IOException, ClassNotFoundException;
}
40
here if Java defined a more powerful meta-level architecture, rather than the currently define reflection mechanism.
A system more akin to that supported by CLOS [KdRB91] would provide access to the necessary internals of objectserialization at the Java-level to facilitate such an approach.
12.5
Evolution Summary
This section has presented the support for evolution between versions of compatible classes in some detail (sections
12.1 to 12.1.5). Some limitations of using the system have been presented: the use of serialVersionUID is inherently linear, multiple ancestor versioning cannot be expressed and it encourages the programmer to keep only two
versions of any one class (sections 12.1.6 and 12.3.1); the system supports implicit change, which requires less code
from the programmer if the two versions of a classs fields are compatible; however, such automatic assignment
takes away from the programmer a lot of control and as such makes it difficult to access the serialization and deserialization mechanism to specialise it (section 12.1.7); the readObject and writeObject methods are used to describe
both the specialised manipulation of an objects state to and from the stream as well as to convert the state of an
object from the stream to the current object definition, if the change cannot be handled automatically because it is not
compatible (section 12.2); some changes to the class hierarchy can result in information loss that can be difficult to
recover after an evolution as the evolved types essentially provide a restricted view onto the underlying pre-evolution
stream (section 12.3.1. an outline of a design for an alternative versioning system was proposed that separated the
issues of using readObject and writeObject for the specialised manipulation of object state to and from the stream
from the manipulation of the contents of the stream for evolution between versions (section 12.4).
Using the object serialization systems versioning mechanism would be easier if additional tool support was
available. If a tool was able to tell the programmer which changes lead to incompatibilities and which do not, moving
between versions of the same class would be much easier.
11 The
41
The support for versioning of classes in Suns Java 2 makes the transition of state from an old class definition
to a new definition easy if the evolution can be expressed in terms of additions to the old class. If the evolution is
more complicated, such as removing fields, more work is required from the programmer and more sophisticated tool
support would be appropriate.
13.1
Orthogonal Persistence
An object-oriented orthogonally persistent programming language [AM95] seamlessly integrates the language with
the technology for saving and restoring objects and their code to and from stable storage. Such languages are referred
to as orthogonally persistent as any object, regardless of type, may be made persistent; the issues of typing and
persistence are orthogonal to each other. This means that the programmer does not have to make a decision about
which types are candidates for persistence and which are not.
One implementation of an orthogonally persistent Java is the PJama project [ADJ+96, PAD+97], a collaborative
project between Glasgow University and Sun Microsystems12.
In this system the programmers classes do not inherit from a Persistence class and they do not indicate their
suitability for persistence by implementing an interface such as java.io.Serializable. Rather, an object is identified to the persistence technology by making it reachable from a named root of persistence, and approach known
as persistence by reachability. This is something that can be decided at run-time, on an object by object basis, rather
than at compile time on a per class basis that affects all instances of a type. The object by object, run-time approach
is inherently more flexible than using java.io.Serializable.
The object graph is written to the store either when the programmer calls the org.opj.PJStore.stabilizeAll method
or when the program exits with a successful status, System.exit(0). Object serialization requires the programmer to
explicitly write the graph of objects to a file, so gaining the same effect as PJamas write on successful exit requires
the programmer to coordinate the termination of their program. In a multi-threaded environment, this can be quite
complex and is prone to error.
When starting a PJama-based Java program, objects are brought in from the store when they are first referenced.
In order to load an object, its class must also be available and this information is held in the store in the form of class
objects together with the associated bytecode. This is different to object serialization when only information about
the class is written to disk. PJamas approach has an implication for evolving classes: if the programmer changes
the class, there are now two versions: one that is resident in the store and the new version which lies outside it. If
the programmer wants to use the new version, the store must be evolved, the old class replaced and the state of all
instances converted to work with the new version. Such a tool exists for PJama, it is called opjsubst and is described
in [Dmi98].
PJama defines a version of RMI that supports distributed programming within an orthogonally persistent system.
In a system that supports persistence by reachability, if an object is reachable from a persistent root, it should be
made persistent. In the client/server model supported by RMI, any process can simultaneously be a server and a
client. Supporting persistence by reachability in a system of clients and servers, with objects remotely reachable from
other address spaces, raises a number of issues. For example, if part of the clients persistent state is passed as the
parameters of a remote method invocation and they become reachable from a persistent root in the server, then two
copies exist, one in the client and one in the server. This may be intentional, however, there may be some requirement
at the application-level to keep the replicated state of these two groups consistent. In addition, the graph of persistent
objects is likely to be quite deep and consist of many objects of widely differing sizes. If an object near the top of the
root of persistence is passed as a copied parameter to a remote method, a significant part of the persistent store will
have to be passed across the network with determental effects on performance. These issues and others are discussed
in the work on distribution support in PJama, called PJRMI [SA97, Spe97, Spe99].
12 http://www.dcs.gla.ac.uk/pjama/ and http://www.sunlabs.com/research/forest/.
42
13.2
Object-Oriented Databases
13.2.1 GemStone/J
Gemstone/J [Gem98] is an enterprise-level Java application server that is intended for building and deploying large,
mission-critical Java applications. It is based on a three-tier client/server architecture together with support for the
World Wide Web.
The programming model offered by GemStone/J [BOS+ 98] is similar to that for PJama. Objects are identified to
the persistence mechanism via reachability from a named root of persistence. However, the main difference between
PJama and GemStone/J is the environment in which they are expected to work. GemStone/J is intended to function as
the middle tier of a three-tier architecture made up of web browsers and Java/Corba clients as the first tier, GemStone/J
in the middle with RDBMS and mainframes as the third tier. GemStone/J presents an object-oriented view to the first
tier and provides the necessary integration to support legacy systems in the third tier. PJama, on the other hand, is
a research project conducting an investigation into providing efficient support for orthogonal persistence in Java. As
such there is less focus on integration with legacy systems and multiple-users.
13.2.2 O2
O2 [BDK92] is similar in many ways to GemStone/J. O2 is an object database system that enables developers to
build industrial-strength database applications within an open environment. The main difference between O2 and
Gemstone/J is that at the centre of O2 there is a database engine that that can be queried using OQL, the object query
language [CBB+ 97]. A programmer may also use C++ and Java to communicate with the database. O2 also supports
persistent roots which can be used in a similar manner to PJama and GemStone/Js.
13.3
Relational Databases
One way to connect a Java program to a relational database is to use a Java-based product that implements the Java
Database Connectivity (JDBC) specification [WH98]. The paradigm for communicating with a relational database
using the JDBC is to: load the necessary JDBC driver for the particular database; connect to it, getting back a
Connection object; use this object to generate a Statement object, which has an SQL statement embedded within it
as a String; send this statement to the database; and receive a ResultSet object in return; and then use this object
to extract the result of the query.
In this approach the data is stored in a relational database that is outside the Java virtual machine. The relational
tables that are retrieved are presented to the programming in terms of objects and the JDBC defines many classes to
conveniently access the rows and columns in results. The client side of the architecture is inherently transient and the
programmer is responsible for mapping their data from the object-oriented model supported by Java into one that is
suitable for a relational database. Such a mapping can be expensive to develop and maintain and so the advantages
of an orthogonal model as promoted by PJama are lost. However, the JDBC does give convenient access to relational
legacy data from a Java program.
13.4
Distribution-based Solutions
Our last category allows Java programmers to store their objects in distributed, stable storage. This section divides
these systems into two groups: commercially supported systems and research systems.
43
JavaSpaces
JavaSpaces [Wal98] is a product from Sun Microsystems that supports concurrent, distributed, persistent programming using a model very similar to that of Linda [Gel92]. A JavaSpace is a Java virtual machine that provides a
persistent storage area for application objects from remote clients. Distribution is handled by RMI and persistence
is provided using object-serialization. The main operations on the JavaSpace are to: write objects; read a copy,
leaving the original intact in the space; and to take an object from the space, removing the original. The focus of
JavaSpaces is on providing a platform to make designing and implementing distributed computing systems easier
than it currently is, rather than on providing a sophisticated persistence solution.
JavaBeans
JavaBeans is Sun Microsystems component architecture for Java [Ham97]. A JavaBean is a Java class that: supports
introspection, so that other beans and code can analyze how it works; customization, so that the beans appearance
and behaviour can be changed; generate and receive events, in order to communicate with other beans; provides a set
of properties that help to describe the bean to other code; and provides support for persistence so that any state may
be saved and retrieved later. The default mechanism for persistence in JavaBeans is provided by object-serialization.
Indeed, as specified in [Ham97][section 5.1], all beans must support serialization or externalization.
44
PerDis
PerDis [FSB+98] is a persistent distributed store intended for large-scale object sharing. Its architecture is very
similar to that of Thor and PerDis also defines persistence in terms of reachability. A machine in a PerDis system
is made up of: an application; an API onto the PerDis user-level library (ULL); the user-level library itself; and a
PerDis daemon which interacts with the ULL. The PerDis daemon is responsible for logging and moving objects to
and from disk and the ULL deals with application-level memory mapping and the management of clusters of objects,
locks and transactions. A cluster is an abstraction for the physical grouping of logically-related objects. Programs
allocate an object in a specific cluster, clusters have names and attributes and they are persistent. The cluster is the
user-visible unit of naming (based on URLs), storage and security. PerDis focuses on object caching and coherency
and the management of its clusters with transactions. As such, the comment made above on PJama and Thor also
applies to PerDis.
14 Conclusions
This paper has shown, with numerous supporting examples, that using Javas object-serialization mechanism to provide object persistence is inappropriate. The system appears simple on the surface (section 1) but there are many
implications from relying on it as a persistence technology. The programmer must state the types that are candidates
for persistence at compile time, whereas making this decision at run-time, on a per-object basis, is more appropriate
(section 2). The serialization mechanism suffers from the big-inhale problem where the whole graph must be read
before it can be used (section 3); loading objects on demand is more efficient, reducing delay in starting an application. The serialization mechanism creates copies of objects that it writes and reads. This can break some code
that makes assumptions about the hash code of an object (section 4), requiring fundamental methods such as hashCode and equals to be overridden. Static values are not written to the store by default, requiring the programmer
to manage this explicitly, using readObject and writeObject methods (section 5). The fields of a class that are
marked transient are not followed by the serialization mechanism, thus the programmer has the same problem as
with statics (section 6). Another mechanism (serialPersistentFields) for specifying which fields of a class are
to be serialized was given in section 7. This mechanism uses an array with a well-known signature for specifying
the fields that will be serialized. Although this allows the choice of which objects will be serialized to be deferred to
run-time, its implementation severely limits its usefulness. Section 8 showed that concurrent activity over the graph
while it is being serialized can introduce inconsistencies at the application level. Problems with remote invocations
to server processes while the processes were using object serialization for persistence were discussed in section 9 and
it was shown that this can affect client code. Section 10 showed that the methods readObject and writeObject
are being used for two purposes: specialised serialization for persistence and specialised serialization for distribution.
This expands to three purposes if we consider using these methods for type translation during evolution as well. This
forces the programmer to add code for more than one purpose to these methods, making their implementation and
maintenance very difficult. Section 11 highlighted a problem that required the use of a specialised classloader if an
instance of a remotely loaded class was written to the store. Section 12 discussed the versioning support available in
object serialization and it was shown that moving state between instances in a stream and instances of new versions
of the same type was implicit and automatic. While this is convenient for the programmer, the migration of state is
implicit and without extra code is not under their control. The implications of using the versioning system of Java
object serialization were also discussed in this section. Section 13 showed that there many other solutions to the problem of providing object persistence for Java and that they have widely varying degrees of sophistication and impact
on an application.
15 Acknowledgements
The author would like to thank Susan Spence for interesting discussions about the implementation and use of the
object-serialization mechanism, for suggesting discussing the copying semantics in section 4 and for reading a draft
of this paper. The author also thanks John Hagemeister and Tony Printezis for commenting on an earlier draft.
45
References
[Abe97]
Steven T. Abell. Using Java With PSE. Netscape Communication Corporation, 1997.
[ADJ+96] M. P. Atkinson, L. Daynes, M. J. Jordan, T. Printezis, and S. Spence. An orthogonally persistent Java.
ACM SIGMOD Record, 25(4):6875, December 1996.
[AM95]
M. P. Atkinson and R. Morrison. Orthogonally persistent object systems. VLDB Journal, 4(3):319401,
1995.
[BDK92]
F. Bancilhorn, G. Delobel, and P. Kanellakis, editors. Building an Object-Oriented Database System The Story of O2. Morgan Kaufmann, San Mateo, 1992.
[BOS+ 98] Bob Bretl, Allen Otis, Marc San Soucie, Bruce Schuchardt, and R Venkatesh. Persistent Java Objects
in 3 tier Architectures. In Malcolm Atkinson and Mick Jordan, editors, The Third Persistence and Java
Workshop, Tiburon, California, September 1st to 3rd 1998.
[CBB+ 97] R. G. G. Cattell, D. Barry, D. Bartels, M. Berler, J. Eastman, S. Gamerman, D. Jordan, A. Springer,
H. Strickland, and D. Wade. The Object Database Standard: ODMG 2.0. Morgan Kaufmann Publishers,
Los Altos (CA), USA, 1997.
[Dmi98]
Misha Dmitriev. The First Experience of Class Evolution Support in PJama. In Malcolm Atkinson and
Mick Jordan, editors, The Third Persistence and Java Workshop, Tiburon, California, September 1st to
3rd 1998.
[FSB+98] Paulo Ferreira, Marc Shapiro, Xavier Blondel, Olivier Fambon, Joao Garcia, Sytse Kloosterman, Nicolas
Richer, Marcus Roberts, Fadi Sandakly, George Coulouris, Jean Dollimore, Paulo Guedes, Daniel Hagimont, and Sacha Krakowiak. PerdiS: Design, implementation, and use of a PERsistent DIstributed store.
Technical Report RR-3525, Inria, Institut National de Recherche en Informatique et en Automatique,
1998.
[Gel92]
D. Gelernter. Current research on linda. In Jean Pierre Banatre and Daniel Le Metayer, editors, Proceedings of Research Directions in HighLevel Parallel Programming Languages, volume 574 of LNCS,
pages 7476, Berlin, Germany, June 1992. Springer.
[Gem98]
GemStone Systems, Inc. GemStone/J Programming Guide, 1.1 edition, March 1998.
[GHJV96] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Objectoriented Software. Addison Wesley, Reading, 1996.
[GJS96]
James Gosling, Bill Joy, and Guy L. Steele. The Java Language Specification. The Java Series. AddisonWesley, Reading, MA, USA, 1996.
[Ham97]
Graham Hamilton. JavaBeans API Specification. Sun Microsystems, v1.01 edition, July 1997.
[JOS97]
Java Object Serialization Specification. Sun Microsystems Technical Specification, Revision 1.2, JDK
1.1 FCS, February 10, 1997.
[KdRB91] Gregor Kiczales, Jim des Rivi`eres, and Daniel G. Bobrow. The Art of the Metaobject Protocol. MIT
Press, 1991.
[KPT96]
Jan Kleindienst, Frantisek Plas il, and Petr Tuma. Lessons learned from implementing the CORBA persistent object service. In Proceedings of the Conference on Object-Oriented Programming Systems,
Languages, and Applications, volume 31, 10 of ACM SIGPLAN Notices, pages 150167, New York,
October 610 1996. ACM Press.
[LCSA99] Barbara Liskov, Miguel Castro, Liuba Shrira, and Atul Adya. Providing Persistent Objects in Distributed
Systems. In Rachid Guerraoui, editor, Proceedings of the European Conference on Object-Oriented Programming (ECOOP 99), volume 1628 of LNCS, pages 230257, Lisbon, Portugal, jun 1999. Springer.
[NEED]
[Obj95]
Object Management Group. The Common Object Request Broker: Architecture and Specification, July
1995. Version 2.0.
[Obj97]
[PAD+97] Tony Printezis, Malcolm P. Atkinson, Laurent Dayn`es, Susan Spence, and Pete Bailey. The design of a
new persistent object store for PJama. In Proceedings of the Second International Workshop on Persistence and Java (PJW2), Half Moon Bay, CA, USA, August 1997.
46
[PC98]
Doug Kramer Patrick Chan, Rosanna Lee. The Java Class Libraries, Second Edition: Volume 1: java.io,
java.lang, java.math, java.net, java.text.java.util. The Java Series. Addison-Wesley, Reading, MA, USA,
1998.
[SA97]
S. Spence and M. Atkinson. A Scalable Model of Distribution Promoting Autonomy of and Cooperation
Between PJava Object Stores. In Proceedings of the Thirtieth Hawaii International Conference on System
Sciences, Hawaii, USA, January 1997.
[Spe97]
[Spe99]
S. Spence. PJRMI: Remote Method Invocation for a Persistent System. In Proceedings of the International Symposium on Distributed Objects and Applications (DOA99), Edinburgh, Scotland, September
1999. IEEE Press.
[Wal98]
J. Waldo. Javaspace specification - 1.0. Technical report, Sun Microsystems, July 1998.
[WH98]
Seth White and Mark Hapner. JDBC 2.0 API Specification. Sun Microsystems, 1.0 edition, May 1998.
47
In order to calculate the cost of performing a serialization and deserialization of an object graph a three part experiment
was devised. To calculate the cost of performing a graph serialization, the experiment consists of two parts, each
writing one kind of data to the disk. The first kind is an n-ary tree initialised as a binary tree consisting of between
1000 and 14,000 elements, increasing in units of 1000. The node (TestNode) in the tree is an object consisting of
a single field, of one byte. No methods are defined. This node is wrapped inside another node, an MNode, which is
actually inserted into the tree. This node consists of a string key, so the node can be found, and an array to the nchildren the node refers to. To ensure the key is unique it is generated from a monotonically increasing integer which is
reset for each tree size. A new BTree was created for each of the 14 serializations and the current time in milliseconds
was taken immediately before and after the serialization of the graph using System.currentTimeMillis. The sizes
of the 14 stores were recorded and used in the second part of the experiment.
The second part of the experiment serialized a single byte array to disk. This was performed 14 times, once for
each of the sizes gained from the first experiment. This experiment gives a feel for the base cost of moving the number
of bytes to disk that are being written in the first experiment. Comparison of the two figures then shows how much
overhead there is in serializing the more complex binary tree object graph. The sizes of the stores are 69,662 bytes
for the 1000 node binary tree, increasing to 1,024,766 bytes for the 14,000 element tree. An increase of 1000 nodes
increases the store size by approximately 73KB.
The third part deserialized the 14 stores. The current time in milliseconds was captured before and after the
deserialization was performed.
A.1
Experimental Setup
All the experiments were performed on a lightly loaded Sparc Ultra-4 (called mars) running Solaris 2.5.1 with 1.5Gb
of main memory. Each experiment was conducted several times, once in a different configuration of disk and use of
just-in-time (JIT) compilation. The Java virtual machine was run with all values defaulted.
Three disk configurations were used, one NFS-based and two local. The NFS disk was served from a machine
(called bathurst) running Digital UNIX V3.2C, with 196Mb of main memory. mars is connected to the departmental
gigabit network with 100Mb/s network card. bathurst is connected to the same network via a 10Mb/s network card.
The two disks local to mars were tmp, the machines swap space, and extra, a directory on a local, non-swap disk.
As /tmp is the swap space for the machine, the disk is memory resident, thus no actual writes to a disk will be
performed. Timing the write for the NFS mounted disks give times very similar to that for /tmp. These writes are
actually writes to the local cache on the machine that initiates the writing of the data.
The experiments were run using version 1.2 of Suns Java JDK and each experiment was run once with and
without just-in-time compilation selected. Each experiment was conducted ten times and an average of the elapsed
times was taken.
A.2
Results
Section A.2.1 compares the amount of time taken to serialize the binary tree to an NFS mounted disk and the equivalent amount of raw bytes from a byte array. Section A.2.2 describes the difference between using /tmp, /extra and
the NFS mounted disk for reading and writing the binary tree and the raw byte array.
A.2.1
The time to serialize a graph of objects to disk is dominated by the manipulation of the graph. The overhead of
moving the data to and from the disk is, in comparison, very small. This is illustrated on figure 53 which shows, using
a logscale, the time to write to the NFS disk.
The top curve represents the amount of time taken to serialize the BTrees of varying sizes. The bottom curve is
the time taken to serialize the same number of bytes to disk when they are stored as a simple byte array. Serializing
the data to disk is very fast. For example, it takes approximately 50ms to save 1Mb of data to disk. By contrast it
takes approximately 5000ms to save the same number of bytes from a binary tree consisting of 14,000 elements. The
anomalous first value for the lower curve is because Java is using unoptimised bytecodes. This is not noticeable in
the upper curve as the time to optimise the bytecodes is dominated by the time to process the tree.
A.2.2
This section divides in two the results of using the three different disk configurations. First, figures for reading the
raw byte array data and the binary tree from disk are given, then figures for writing the data are given.
48
1000
100
10
1
0
100
200
300
400
500
600
700
Number Bytes Written (x 1000)
800
900
1000
1100
20000
15000
10000
5000
0
0
100
200
300
400
500
600
700
Number Bytes Read (x 1000)
800
900
1000
1100
Figure 54: Reading the Binary Tree from the Three Disks
Figure 55 shows a similar result to figure 54, namely that using NFS is quicker for larger files. The first initial
read is expensive as unoptimised Java bytecodes are being used.
Figure 56 shows figures for reading the binary tree data from the three kinds of disk with and without JIT selected.
In this graph there is no clear separation of curves based on using JIT (cf. figure 57). The dominant effect is whether
an NFS mounted disk is being used as the two lower curves are for reads from the NFS disk when JIT is on and JIT
is off. The use of JIT together with NFS gives the quickest read.
49
Disk Read Comparison for Raw Data over NFS, /tmp and /extra (jit on)
60
NFS Raw Read
/tmp Raw Read
/extra Raw Read
55
50
45
40
35
30
25
20
15
10
5
0
100
200
300
400
500
600
700
Number Bytes Read (x 1000)
800
900
1000
1100
Figure 55: Reading the Raw Byte Array from the Three Disks
Disk Read Comparison of using JIT for Tree over NFS, /tmp and /extra ((jit on))
25000
NFS Tree READ (jit on)
/tmp Tree READ (jit on)
/extra Tree READ (jit on)
NFS Tree READ (jit off)
/tmp Tree READ (jit off)
/extra Tree READ (jit off)
20000
15000
10000
5000
0
0
100
200
300
400
500
600
700
Number Bytes Written (x 1000)
800
900
1000
1100
Figure 56: Reading the Tree from the Three Disks with and without JIT
50
Writing
Figure 57 shows the elapsed times for writing the binary tree to the serialized stores of various sizes. Here, no disk
configuration is superior.
Disk Write Comparison for Tree over NFS, /tmp and /extra (jit on)
5500
NFS Tree Write
/tmp Tree Write
/extra Tree Write
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
0
100
200
300
400
500
600
700
Number Bytes Written (x 1000)
800
900
1000
1100
140
120
100
80
60
40
20
0
0
100
200
300
400
500
600
700
Number Bytes Written (x 1000)
800
900
1000
1100
Figure 58: Writing the Raw Byte Array to the Three Disks
Figure 59 shows performance figures for writing the binary tree data to the three kinds of disk with and without
JIT selected. When writing the data there is a clear advantage to having just-in-time compilation selected. For the
1024KB store, the write using JIT is approximately twice as fast as that without it.
A.3
The experiments above were originally conducted with Suns Java JDK 1.1.6 with no just-in-time compilation installed. Without using JIT technology at this version of the JDK, the time spent serializing the String key value is
51
Disk Write Comparison of using JIT for Tree over NFS, /tmp and /extra (jit on)
10000
NFS Tree Write (jit on)
/tmp Tree Write (jit on)
/extra Tree Write (jit on)
NFS Tree Write (jit off)
/tmp Tree Write (jit off)
/extra Tree Write (jit off)
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
0
100
200
300
400
500
600
700
Number Bytes Written (x 1000)
800
900
1000
1100
Figure 59: Writing the Tree to the Three Disks with and without JIT
significant for larger trees. For example, 4.3s can be saved when serializing the largest store. This section describes
this optimization.
During the course of the serialization experiment a further experiment was conducted to calculate the elapsed
time of running the BTree tests when the String based key was made transient. For large trees the length of the String
becomes longer and there are more of them. Therefore, the curve for the BTree starts to diverge from the optimised
case (figure 60) as the BTree is forcing the keys to be serialized. This shows that it is costly to serialize Strings in
Java. For example, for the 10,000 to 14,000 element trees, the difference in store size and time to serialize them is
given in table 1.
Figure 60 also contains the results for the third experiment, the elapsed time to deserialize each of the 14 stores.
As can be seen, deserializing a store is significantly quicker than serializing it. When serializing a graph a number of
tests have to be made that are not applicable when deserialization is performed. For example, serialization requires
that cycles are dealt with correctly to ensure infinite recursion does not occur. During deserializing, each node can
simply be read from disk and the pre-serialization shape reassembled; a test for infinite recursion is not necessary.
For large stores the difference is quite substantial. For the largest, consisting of 1,024,706 bytes, the time to serialize
is 1.81 minutes (109328ms), whereas the time to deserialize is 39.26s (39269 ms).
The figures for deserializing the BTree are improved by the small number of classes it is made up from. The
BTree is implemented using only four classes: BTree implements the BTree functionality; MNode handles the n
children a node may have; Entry objects contain the particular user level node; and TestMNode is the node used by
the experiment containing the single byte. Therefore, when the store is deserialized the virtual machine has to load in
only four classes in order to read in the entire graph. For a more complex store, made up of numerous object types,
the virtual machine would have to temporarily suspend deserialization to load the classes the store is made up of, thus
increasing the overall read time. The class only has to be located and read the first time an object of that type is found
in the store, subsequent objects of the same type can just be read in.
The serialization and deserialization systems behave in quite different ways as the size of the store increases. The
deserialization mechanism is approximately linear, whereas serialization more accurately fits a gentle squaring effect.
If the store size were to increase in size significantly over the 1Mb size, the time to serialize the store would become
very expensive, tens to hundreds of seconds.
No. Elements
(thousands)
10
11
12
13
14
Unoptimised Store
Size (bytes)
739,114
810,887
888,327
957,882
1,024,706
Optimised Store
Bytes Saved
68921
76921
84921
92921
100921
Time Saved
(ms)
3280.6
4192.2
4232.2
3901.5
4374.0
52
120000
Unoptimised BTree
Optimised BTree
Read Unoptimised BTree
100000
80000
60000
40000
20000
0
2
10
12
Run Number
A.4
Conclusions
The experiments above allow us to draw four conclusions: the absolute amount of time to read and write a store is
large; reading a store is much slower than writing a store; and if an application is likely to exhibit more reading than
writing, an NFS mounted disk should be used; the use of JIT technology significantly increases the speed of using
Java object serialization.
A.4.1
Absolute Time
Using NFS, it takes 1.30s to read the smallest binary tree store of 69KB and 18.04s to read the largest (1024KB).
An increase of 1000 nodes (or 73KB), increases the read time by approximately 1.28s. Writing the smallest binary
tree store takes 0.03s, writing the largest takes 4.94s and an increase of 1000 nodes increases the write time by
approximately 0.03s.
For the raw byte array, the absolute times are much smaller. For example, writing the 1024KB store to disk using
NFS takes 48ms, and reading it takes 38ms.
Therefore, the time to move data to and from disk for Java object serialization is dominated by the manipulation
of the graph.
A.4.2
Reading the largest binary tree store from the NFS disk takes 18.04s, whereas writing the same amount of data to the
same disk takes only 4.94s. Reading is likely to be slower as more objects have to be created as the file is deserialized
and the graph is constructed. For stores that have a large number of different types, this value may be made worse by
the dynamic loading of the class file for the newly read object. Each time a new type is encountered, the class file for
it must be loaded, this could slow down the deserialization of the graph quite considerably.
A.4.3
If the application exhibits more reads than writes of the serialized data, using an NFS mounted disk will improve the
performance of the application.
53
A.4.4
JIT Technology
Comparing the results in section A.2 with those in section A.3 we can see that the use of JIT technology and the
move from Suns Java JDK 1.1.6 to 1.2 significantly increases the speed of using object serialization. For the largest
store in the no-JIT experiment using Java JDK 1.1.6 over NFS the time to write the 1024KB store is 1.81 minutes
(109328ms), and the time to read it is 39.26s (39269ms). This can be contrasted with a time of 4.9s to write the same
amount of data from an equivalent tree using Java JDK 1.2 using JIT and 18.04s to read the same amount of data.
Thus, without JIT, write is significantly slower than reading, a situation that is reversed when using JIT.
54