Académique Documents
Professionnel Documents
Culture Documents
Abstract
Disk access latency and transfer times are often considered to have a major and detrimental impact on
the running time of software. Developers are often advised to favour in-memory operations and
minimise disk access. Furthermore, diskless computer architectures are being studied and designed to
remove this bottleneck all together, to improve application performance in areas such as High
Performance Computing, Big Data, and Business Intelligence. In this paper we use code inspired by real,
production software, to show that in-memory operations are not always a guarantee for high
performance, and may actually cause a considerable slow-down. We also show how small code changes
can have dramatic effects on running times. We argue that a combination of system-level improvements
and better developer awareness and coding practices are necessary to ensure in-memory computing can
achieve its full potential.
1. Introduction
The prevalence of application domains such as High Performance Computing, Big Data, and Business
Intelligence has caused special attention to reducing software running times. However, high software
performance is of interest in nearly all application domains. There are many factors determining
software performance, and disk access is among them. Traditional software development wisdom has
considered frequent disk access to be a source of performance drop. Disks, whether mechanical or SSD,
have orders or magnitude higher latency and transfer times than main memory (RAM). Even casual
computer users know that noticeable slow-down will happen if disk swapping is triggered on their
computers. As a result, considerable effort has been made to minimize disk access, using methods such
as caching [2]. Falling main memory prices has allowed moving further in this direction. Work is in
progress on devising algorithms that perform only, or mainly, in-memory operations [1,5], and some
databases store their data in the main memory as much as possible, either as an option or by default
[3,6].
In most cases an in-memory operation is indeed faster than an equivalent one involving disk access.
Progress in disk manufacturing and software management has mitigated the problem to a certain
amount, but has not completely removed it. We expect clever algorithms to continue to appear to lessen
the reliance on disk access. As price of RAM drops, we see such algorithms applied to bigger datasets.
This venue is so promising that there are major efforts to design computing systems such as HPs The
Machine [7], that have no disks, and rely only on volatile or non-volatile [4] main memory for all their
needs.
In this paper we show that removing or lessening disk access does not necessarily result in increased
software performance, which we simply define as the amount of time it takes a piece of software to
finish running. Actually, the simple examples we use in Section 2 run much faster when frequent disk
access is performed, versus when running in-memory. In Section 3 we argue that achieving the
performance goals promised by in-memory computing and diskless computers may require reexamination of relevant system level algorithms, as well as better training of software practitioners so
they are aware of potential pitfalls. We conclude the paper in Section 4.
10000
1000
100
1
1
1000
1000000
6
5
4
Disk speedup (Linux)
10
100
1000
10000
100000 1000000
As can be seen from Figures 1 and 2, the same phenomenon is observed in both Windows and Linux. This
could be influenced by the Java Virtual Machine, which creates a layer of uniformity in both cases.
2.2 Python Experiment
To test the case where no virtual machine is intervening, we tried the Python code in Appendix 2. Table 3
shows the Linux results, where Python 2.6.6 was used to run the code.
In-Memory
disk-only
Added String Length String Concatenation Time Single Write to Disk Time
Writes to Disk Time
in bytes
(ST1)
(DT1)
(DT2)
1
78.057
0.00179
0.393
10
20.480
0.00077
0.0379
1,000
0.343
0.00145
0.0041
1,000,000
0.0
0.00168
0.0022
Table 3. Average Python running times in seconds, measured under Linux.
Windows results come in Table 4, and were generated using Python 2.7.6.
In-Memory
disk-only
Added String Length String Concatenation Time Single Write to Disk Time
Writes to Disk Time
in bytes
(ST1)
(DT1)
(DT2)
1
113.537
0.00547
0.205
10
11.925
0.00565
0.0254
1,000
0.1238
0.0126
0.0055
1,000,000
0.0
0.00587
0.00489
Table 4. Average Python running times in seconds, measured under Windows.
The change in relative performance is not as linear as in the Java case, but with Python we observe the
same phenomenon under both Windows and Linux, where for shorter concatenated strings the inmemory computation is hundreds of times slower than the disk-only case. As a side note, Python seems
to be more efficient than Java in our string tests, but we are only interested in the relative performance
of the in-memory and disk-only approaches. The point to consider is that even though the two languages
and run time systems are different, the general performance trend is comparable between Java and
Python, pointing to a systemic and language-independent phenomenon.
2.3 Explaining the Results
It is easy to explain the results: In high-level languages such as Java and Python, a seemingly benign
statement such as concatString += addString may actually involve executing many extra cycles behind
the scenes. To concatenate two strings in a language such as C, if there is not enough space to expand
the concatString to the size it needs to be to hold the additional bytes from addString, then the
developer has to explicitly allocate new space with enough storage for the sum of the sizes of the two
strings and copy concatString to the new location, and then finally perform the concatenation. In Java
and Python strings are immutable, and any assignment will result in the creation of a new object and
possibly copy operations, hence the overhead of the string operations. The disk-only code, although
apparently writing to the disk excessively, is only triggering an actual write when operating system
buffers are full. In other words, the operating system already lessons disk access times. A developer
familiar with the language and system internals readily notices the causes of this observed behaviour,
but this behaviour may be easily missed, as indicated by examining similar cases in production code.
The above explanation applies to any data structure that has to be stored contiguously and increases in
size, or is immutable. It is possible to improve the above slow execution. For a mutable string, one can
allocate more storage than is immediately needed for a concatenation operation. Also, if the data
structure, a string in this case, is stored on the stack, then it may be possible to perform concatenation
very efficiently, by placing the added string at the top of the stack and simply adjusting the stack pointer.
Since the stack size can be increased very easily, and no copying of the whole string is required, in this
case an in-memory operation will be efficient.
With our Python code both Python implementations were slow in string concatenation, but that does
not necessarily mean they store the strings in the land-locked heap space. This is because the
concatenation code in Appendix 2, concatString = addString + concatString, places the addString
contents at the beginning of the target string (vs. the end), so moving concatString is required, and a
stack allocation does not help with performance. To verify this statement, we re-ordered the
concatenation statement to concatString = concatString + addString, and ran the code on both Windows
and Linux again. Windows slow concatenation results did not change, while for Linux the modified inmemory version was faster than a disk-only version, hinting at a stack allocation scheme. Table 5
provides the Linux running times for the modified code, where the in-memory versions performance
dramatically improved when string concatenations were not accompanied by copy operations.
In-Memory
disk-only
Added String Length String Concatenation Time Single Write to Disk Time
Writes to Disk Time
in bytes
(ST1)
(DT1)
(DT2)
1
0.284
0.00155
0.423
10
0.0276
0.000135
0.0405
1,000
0.00109
0.00144
0.00283
1,000,000
0.0
0.00160
0.0023
Table 5. Average modified Python running times in seconds, measured under Linux.
As a last test, we declared concatString to be a global variable, which sets the allocation scheme to
heap, so the variable is accessible from other name scopes. In the modified concatenation Python code,
the in-memory execution times increased dramatically to the values reported in Table 3, confirming our
explanation. In other words, a global scope guarantees slow in-memory execution of both the original
and modified Python code.
Java performance numbers did not change when the concatenation order was reversed in the code in
Appendix 1. However, using a mutable data type such as StringBuilder or StringBuffer dramatically
improved the results.
It is important to emphasize that the in-memory performance problem is not caused by heap versus
stack memory allocation, as evidenced by poor in-memory results of the Python code in Appendix 2. The
problem is caused by data copy operations in the main memory, whether in the heap or stack space. We
also see that immutable strings are not inherently a problem, as evidenced by Pythons much better
performance with the modified code.
3. Discussion
These widely varying performance results were obtained by small changes in the code, and
understanding the reasons required closer look at the system-level execution environment. Most recent
code development efforts concern higher-level domains such as web development. Software
practitioners at this level often work with multiple layers of abstraction, well away from the operating
system and language run time levels. In the examples of inefficient code that inspired this paper, and we
suspect is many other similar cases, the developers have done what they have been trained to do,
carefully reducing disk access, but the approach is obviously failing.
We feel that with the push towards computing systems with huge amounts of memory, and ultimately to
diskless systems which rely only on main memory, there is a necessity to have a holistic approach to
ensuring high software performance. To fully utilize the emerging hardware, we need to re-examine how
operating systems and language run times manage and utilize memory. We also need to make sure
current and future developers are familiar with the fundamental concepts and principles that impact
software performance.
4. Conclusion
Although in numerous cases in-memory computing is faster than an equivalent algorithm that accesses
disk, the real-life-inspired counter examples presented in this paper show this to be not always the case.
We argued that in-memory computation cannot guarantee high software performance, and careful
examination of the code, along with knowledge of hidden factors such as system and language library
routines and operating system internals have an important role in the achieved performance, or lack of
it. More specifically in our case, memory management caused a significant slow down for in-memory
computation. Many of the factors affecting performance are outside developers care or control, and
they may not even be aware of the underlying algorithms and implications. This justifies our emphasis on
1) re-examining system-level algorithms with in-memory operations in mind, and 2) better training to
make developers familiar with system-level software intricacies. Doing so would help in-memory
computing better deliver on its potentials and promises.
References
[1] Gill, J., Shifting the BI Paradigm with In-Memory Database Technologies, Business Intelligence Journal,
volume 12 (2): 5862, 2007
[2] Karedla, R., Love, J.S., and Wherry, B.G., Caching strategies to improve disk system performance,
Computer, volume 27.3: 38-46, 1994
[3] Kreibich, J.A., Using SQLite, O'Reilly Media, Inc., 2010
[4] Lacaze, P.C. and Lacroix, J.C., Non-volatile Memories, John Wiley & Sons, 2014
[5] Narayanan, D. and Hodson, O., Whole-system Persistence with Non-volatile Memories, Seventeenth
International Conference on Architectural Support for Programming Languages and Operating Systems
(ASPLOS 2012), London, England, UK, March 37, 2012
[6] Tiwari,S., Professional NoSQL, John Wiley & Sons, 2011
[7] http://www.businessinsider.com/hp-shows-off-new-kind-of-computer-2014-6
fileMean2 += fileTimes2[i];
}
fileMean2 /= fileTimes2.length;
System.err.println("In-memory mean: string time " + stringMean);
System.err.println("In-memory mean: file time " + fileMean1);
System.err.println("Disk-only mean: file time " + fileMean2);
}
}
f = open('test.txt','w')
start = timeit.default_timer()
for i in range(0, numIter):
f.write(addString)
f.flush()
f.close()
stop = timeit.default_timer()
fileTime = stop - start
print "disk-only: file took " + str(fileTime)