Académique Documents
Professionnel Documents
Culture Documents
0 | Page
Contents
Introduction........................................................................................................2
Usage...................................................................................................................3
The tests.............................................................................................................3
Write...................................................................................................................3
Read...................................................................................................................3
Secondary Read.................................................................................................4
Database Implementation..................................................................................9
The GUI..............................................................................................................11
Measurements.................................................................................................14
Serialization.....................................................................................................16
Introduction
1 | Page
Database Benchmark is one of the most powerful open source tools designed to
stress test database indexing technologies with large data flows. The application
performs two main test scenarios:
Write speed the insert or update speed of all generated records (with
sequential or random keys);
Read speed the read speed of all inserted records ordered by their key;
Every tested database system must be capable of performing this simple tests -
insert the generated records and read them twice, ordered by their keys.
To perform the tests each database can use all of its capabilities local settings,
caches, special optimizations and etc.
We do not pretend that the currently provided implementations are the best
possible. If you can propose better ones or if you want to add another database,
contribute to our project in GitHub.
2 | Page
Usage
The tests
For our test suite we have chosen to use the simplest tests for a database: write
and read. Our intent is to test the storage engine themselves under high loads, not
the different capabilities of the database solutions (for example transactions,
sharding, scalability and etc.).
Write
Read
Secondary Read
Write
The write test inserts the generated data into the selected databases. It is
important to know that depending on the selected flows from the GUI, a corresponding
number of threads will be used to insert the data into a single database collection (for
example table and etc.).
The multi-thread insert allows databases such as MongoDB, MySQL and others
to benefit from their multi-client insert capabilities which drastically improve the insert
performance.
Each of the inserting threads tries to add data into the database collection. In
order for the tests to be correct we have included the following rule when
implementing databases:
3 | Page
Read
The read test reads all of the inserted records into the databases ordered by key.
Only one thread is used to read data from the database.
If the databases do not return the records ordered by their keys, the test engine
will detect that and log an error, making the tests invalid.
Secondary Read
The reason we have included a secondary read is simple: some of the databases
that work asynchronously need some time or at least one full read of the records
before the data is finally indexed and they can use their full potential.
In order to provide objective tests which are not just a simple synthetic
benchmarking of the database, we have spent a considerable amount of time working
on our data generation algorithms. This ensures that the generated data will be as
close to the real-life as possible and the database analyst will have objective results
which to use.
The number of records varies from hundreds to few billions. The keys are of type
long (8 bytes) and the records are of type Tick (50-60 bytes, depending on the
generated data).
The record type that we are currently using is inspired from a real data model
used in stock exchanges. The records are generated using random-walk algorithm,
which provides close to real-time values.
4 | Page
Implementing a database into the benchmark
/// <summary>
/// Represents a single database instance.
/// This interface is implemented by all databases which participate in the benchmark.
/// </summary>
public interface IDatabase
{
#region Database Description
/// <summary>
/// Name of the database. It must be a valid directory Name (see DataDirectory for details).
/// </summary>
string Name { get; set; }
/// <summary>
/// The category of the database (SQL, NoSQL, NoSQL\\Key-Value and etc.)
/// </summary>
string Category { get; set; }
/// <summary>
/// A description of the database. Usually the name and version.
/// </summary>
string Description { get; set; }
/// <summary>
/// The official website of the database.
/// </summary>
string Website { get; set; }
/// <summary>
/// The color of the database used for the charts.
/// </summary>
Color Color { get; set; }
/// <summary>
/// Different requirements - for example used libraries, implementation details and etc.
/// </summary>
string[] Requirements { get; set; }
/// <summary>
/// Name of the database collection (table, document and etc.).
/// </summary>
string CollectionName { get; set; }
/// <summary>
/// Each database has it's own data directory. This is the place where the database stores its data.
/// This property is initialized with Application.StartupPath\Databases\IDatabase.Name value.
/// </summary>
string DataDirectory { get; set; }
5 | Page
/// <summary>
/// A connection string if the database requires a remote connection.
/// </summary>
string ConnectionString { get; set; }
#endregion
/// <summary>
/// Initialize and create the database - create configuration files, engines and etc.
/// </summary>
void Init(int flowCount, long flowRecordCount);
/// <summary>
/// Begin writing records into the database. Multiple threads invoke this method (one for each
flow).
/// </summary>
void Write(int flowID, IEnumerable<KeyValuePair<long, Tick>> flow);
/// <summary>
/// Begin reading the records from the database in a single thread. The tick flow must be returned
in ascending by key order.
/// </summary>
IEnumerable<KeyValuePair<long, Tick>> Read();
/// <summary>
/// Close the database.
/// </summary>
void Finish();
/// <summary>
/// Returns the size of the database in bytes.
/// The size property is calculated by the total amount of bytes of all files contained in the working
directory.
/// In case that data files cannot be stored in DataDirectory, you should override the Size property
(see MongoDB case)
/// </summary>
long Size { get; }
#endregion
In order to implement a database, the database class has to inherit the abstract
class Database.
6 | Page
[XmlIgnore]
public Color Color { get; set; }
public string[] Requirements { get; set; }
[Browsable(false)]
public virtual long Size
{
get { return Directory.GetFiles(DataDirectory, "*.*", SearchOption.AllDirectories).Sum(x =>
(new FileInfo(x)).Length); }
}}
Init() method. Here you can prepare your database with the test parameters:
the number of flows that will be inserted and the number of records in each flow.
Write() takes a generated flow of rows (key/record) and has to store it in the
database. The flow may contain duplicate keys (in this case the records that
correspond to the duplicate keys must replace the original ones in the
database). Write() is invoked from in parallel with multiple data flows, with own
ID(name). Each of them is in a separate thread, stores in one appropriate
database structure (table, collection and etc.).
Read() has to return the rows for the specified flow ID. Within a flow the records
must be returned ordered by their keys. Also, Read() must return unique keys
only. The method monitors the returned records form data source for proper key
order.
Finish() is invoked after Write() and Read() completes. Here you can close your
tables, database etc.
DatabaseName is the name of the database. This name is shown in the TreeView
control of the main window. The property must be a valid directory name. It
must be set in the constructor of the Database implementation.
7 | Page
DatabaseDirectory is the place where the database will store its data. This
property is automatically initialized with Application.StartupPath\DatabaseName
after the Database instance is being created.
Size is the size of the database in bytes after insert and read complete. The Size
value must accumulate all files on the disk used by the database to perform the
test for tables, indexes, collections etc. By default the application calculates
this property by the size of all created files in the DatabaseDirectory. However
for server and remote databases the property has to be overridden with
appropriate query invocation (see the MongoDB and MySQL cases).
The Benchmark class ads a few parameters, needed by the GUI for proper test
visualization. These properties must be set in the constructor of the Benchmark
successor.
Color is the color of the database in the application speed & size charts.
8 | Page
Database Implementation
namespace Benchmarking.Databases
{
public class STSdb4Database: Database
{
}
}
public STSdb4Benchmark()
{
SyncRoot = new object();
Step 3: After we have completed those steps, lets start with the main methods.
We have to implement the Init(), Write(), Read() and Finish() methods.
9 | Page
The next one is the Write() method. We have to implement the method
according to the database API. We open a table for the specified flow and insert the
records for that flow in it. Note that if the user makes a parallel insert test, then
multiple threads can invoke this method (one for each data flow), so the database
engine.Commit();
}
}
In The Read () method we have to return the records for the specified flow,
ordered by their keys.
In case of STSdb 4.0 we just return the Forward() method enumerator of the
relevant table.
Once we have added the new STSdb4Database class to the project, it will be
automatically shown in the TreeView control and we can select it for a test.
The GUI
The application interface is simple and easy to use. There is a TreeView control
containing a list of the currently implemented databases. The databases are
categorized by their storage model: Object, Key-Value store, Graph and etc.
10 | P a g e
To start a benchmark you have to choose the databases that will participate in
the current test. You can also choose:
The number of data flows that will be inserted (Tasks). The default is 1.
The number of generated records in each data flow (Records). The default is
100 000.
The type of the generated keys for all record flows (Keys). 0% value means
that all of the generated keys are sequential, and with the increasing of the
percentage the generated flow begins to contain random keys. 100% means
that all keys are random. The default is 100% key randomness.
After you start the application from the Start button, it automatically executes
a Write, Read and Secondary Read tests of the currently selected databases.
During the tests, the results are visible in the line charts. You can choose
between an average chart, showing the changes of the average speed, and a moment
chart with the speed at the moment. Also you can see the moment and average
memory usage.
After the tests are completed, a bar chart report is generated, showing the
Speed (average), the Time and the Size for each database.
If you want further analyses, we have included the option to export the results
as a .CSV, .Json or .PDf file.
11 | P a g e
Application menu
File
Edit
12 | P a g e
Expand All Expand all TreeView nodes.
View
Right
Top
Bottom
Logarithmic
Log Window
13 | P a g e
Tools
Measurements
Input/Output usage
IO Data Bytes/sec shows the rate, in incidents per second, at which the process
was writing bytes to I/O operations. It counts all I/O activity generated by the
process including file, network, and device I/Os.
Memory usage
Page File Bytes Peak shows the maximum amount of virtual memory, in bytes,
that a process has reserved for use in the paging file(s).
Working Set Peak shows the maximum size, in bytes, in the working set of this
process. The working set is the set of memory pages that were touched recently
by the threads in the process.
14 | P a g e
Virtual Bytes Peak shows the maximum size, in bytes, of virtual address space
that the process has used at any one time.
CPU usage
% User Time shows the percentage of time that the processor spent
executing code in user mode. Applications, environment subsystems, and
integral subsystems execute in user mode.
% Processor Time shows the percentage of time that the processor spent
executing a non-idle thread.
Operating System
Database Storage
Additional Information
15 | P a g e
Figure 2 Report Results Form
Serialization
Database Benchmark now has the ability to create projects. Each project stores
all of the selected databases, their individual settings as well as the settings used for
the actual tests.
This provides a very robust way of creating test suites that represent different
real-life scenarios and helps with the database evaluation.
16 | P a g e