Vous êtes sur la page 1sur 17

Developers Guide

0 | Page
Contents

Introduction........................................................................................................2

Usage...................................................................................................................3

The tests.............................................................................................................3

Write...................................................................................................................3

Read...................................................................................................................3

Secondary Read.................................................................................................4

Generating the test data..................................................................................4

Implementing a database into the benchmark............................................5

The basic interfaces...........................................................................................5

Database Implementation..................................................................................9

The GUI..............................................................................................................11

Measurements.................................................................................................14

Serialization.....................................................................................................16

Introduction

1 | Page
Database Benchmark is one of the most powerful open source tools designed to
stress test database indexing technologies with large data flows. The application
performs two main test scenarios:

Insertion of large amount of randomly generated records with sequential


or random keys;

Read of the inserted records, ordered by their keys.

Database Benchmark measures the following basic parameters:

Write speed the insert or update speed of all generated records (with
sequential or random keys);

Read speed the read speed of all inserted records ordered by their key;

Size the database size after insert and read complete.

Every tested database system must be capable of performing this simple tests -
insert the generated records and read them twice, ordered by their keys.

We chose this test scenario because it is a widespread task in many real-time


systems telecoms, stock exchanges, SCADA/HMI, large observation and modeling
systems etc.

To perform the tests each database can use all of its capabilities local settings,
caches, special optimizations and etc.

Currently we have implemented over 25 databases SQL, NoSQL, embedded


and server solutions. Note that we have no intent to collate SQL vs. NoSQL
databases, we just want to show how each of indexing technologies used in them
can be used to solve the defined task.

We do not pretend that the currently provided implementations are the best
possible. If you can propose better ones or if you want to add another database,
contribute to our project in GitHub.

2 | Page
Usage

Database Benchmark can be used as an additional viewpoint when the research


engineers or software architects assess the appropriate background storage engine
for their mission critical systems. The application contains all of the necessary tools
that a database engineer needs to make a detailed performance analysis.

The tests

For our test suite we have chosen to use the simplest tests for a database: write
and read. Our intent is to test the storage engine themselves under high loads, not
the different capabilities of the database solutions (for example transactions,
sharding, scalability and etc.).

The test engine of Database Benchmark performs three tests:

Write

Read

Secondary Read

Write
The write test inserts the generated data into the selected databases. It is
important to know that depending on the selected flows from the GUI, a corresponding
number of threads will be used to insert the data into a single database collection (for
example table and etc.).

The multi-thread insert allows databases such as MongoDB, MySQL and others
to benefit from their multi-client insert capabilities which drastically improve the insert
performance.

Each of the inserting threads tries to add data into the database collection. In
order for the tests to be correct we have included the following rule when
implementing databases:

If the database encounters a duplicate key, it has to replace it or update the


record with the new value. Skipping the duplicate keys will make the results invalid.

3 | Page
Read
The read test reads all of the inserted records into the databases ordered by key.
Only one thread is used to read data from the database.

If the databases do not return the records ordered by their keys, the test engine
will detect that and log an error, making the tests invalid.

Secondary Read
The reason we have included a secondary read is simple: some of the databases
that work asynchronously need some time or at least one full read of the records
before the data is finally indexed and they can use their full potential.

Generating the test data

In order to provide objective tests which are not just a simple synthetic
benchmarking of the database, we have spent a considerable amount of time working
on our data generation algorithms. This ensures that the generated data will be as
close to the real-life as possible and the database analyst will have objective results
which to use.

The number of records varies from hundreds to few billions. The keys are of type
long (8 bytes) and the records are of type Tick (50-60 bytes, depending on the
generated data).

public class Tick


{
public string Symbol { get; set; }
public DateTime Timestamp { get; set; }
public double Bid { get; set; }
public double Ask { get; set; }
public int BidSize { get; set; }
public int AskSize { get; set; }
public string Provider { get; set; }
}

The record type that we are currently using is inspired from a real data model
used in stock exchanges. The records are generated using random-walk algorithm,
which provides close to real-time values.

4 | Page
Implementing a database into the benchmark

The basic interfaces


To simplify and make it easier to implement the tests for different databases we
have created two basic structures:

/// <summary>
/// Represents a single database instance.
/// This interface is implemented by all databases which participate in the benchmark.
/// </summary>
public interface IDatabase
{
#region Database Description

/// <summary>
/// Name of the database. It must be a valid directory Name (see DataDirectory for details).
/// </summary>
string Name { get; set; }

/// <summary>
/// The category of the database (SQL, NoSQL, NoSQL\\Key-Value and etc.)
/// </summary>
string Category { get; set; }

/// <summary>
/// A description of the database. Usually the name and version.
/// </summary>
string Description { get; set; }

/// <summary>
/// The official website of the database.
/// </summary>
string Website { get; set; }

/// <summary>
/// The color of the database used for the charts.
/// </summary>
Color Color { get; set; }
/// <summary>
/// Different requirements - for example used libraries, implementation details and etc.
/// </summary>
string[] Requirements { get; set; }

/// <summary>
/// Name of the database collection (table, document and etc.).
/// </summary>
string CollectionName { get; set; }

/// <summary>
/// Each database has it's own data directory. This is the place where the database stores its data.
/// This property is initialized with Application.StartupPath\Databases\IDatabase.Name value.
/// </summary>
string DataDirectory { get; set; }

5 | Page
/// <summary>
/// A connection string if the database requires a remote connection.
/// </summary>
string ConnectionString { get; set; }

#endregion

#region Database Methods

/// <summary>
/// Initialize and create the database - create configuration files, engines and etc.
/// </summary>
void Init(int flowCount, long flowRecordCount);

/// <summary>
/// Begin writing records into the database. Multiple threads invoke this method (one for each
flow).
/// </summary>
void Write(int flowID, IEnumerable<KeyValuePair<long, Tick>> flow);

/// <summary>
/// Begin reading the records from the database in a single thread. The tick flow must be returned
in ascending by key order.
/// </summary>
IEnumerable<KeyValuePair<long, Tick>> Read();

/// <summary>
/// Close the database.
/// </summary>
void Finish();

/// <summary>
/// Returns the size of the database in bytes.
/// The size property is calculated by the total amount of bytes of all files contained in the working
directory.
/// In case that data files cannot be stored in DataDirectory, you should override the Size property
(see MongoDB case)
/// </summary>
long Size { get; }

#endregion

In order to implement a database, the database class has to inherit the abstract
class Database.

public abstract class Database : IDatabase


{
protected object SyncRoot { get; set; }

public string Name { get; set; }


public string CollectionName { get; set;}
public string DataDirectory { get; set; }
public string ConnectionString { get; set; }

public string Category { get; set; }


public string Description { get; set; }
public string Website { get; set; }

6 | Page
[XmlIgnore]
public Color Color { get; set; }
public string[] Requirements { get; set; }

public abstract void Init(int flowCount, long flowRecordCount);


public abstract void Write(int flowID, IEnumerable<KeyValuePair<long, Tick>> flow);
public abstract IEnumerable<KeyValuePair<long, Tick>> Read();
public abstract void Finish();

[Browsable(false)]
public virtual long Size
{
get { return Directory.GetFiles(DataDirectory, "*.*", SearchOption.AllDirectories).Sum(x =>
(new FileInfo(x)).Length); }
}}

There are four basic methods:

Init() method. Here you can prepare your database with the test parameters:
the number of flows that will be inserted and the number of records in each flow.

Write() takes a generated flow of rows (key/record) and has to store it in the
database. The flow may contain duplicate keys (in this case the records that
correspond to the duplicate keys must replace the original ones in the
database). Write() is invoked from in parallel with multiple data flows, with own
ID(name). Each of them is in a separate thread, stores in one appropriate
database structure (table, collection and etc.).

Read() has to return the rows for the specified flow ID. Within a flow the records
must be returned ordered by their keys. Also, Read() must return unique keys
only. The method monitors the returned records form data source for proper key
order.

Finish() is invoked after Write() and Read() completes. Here you can close your
tables, database etc.

The following properties are also available:

SyncRoot (optional) is an object that can be used for thread synchronization.

DatabaseName is the name of the database. This name is shown in the TreeView
control of the main window. The property must be a valid directory name. It
must be set in the constructor of the Database implementation.

DatabaseCollection contains the name of the structure where data is stored.

7 | Page
DatabaseDirectory is the place where the database will store its data. This
property is automatically initialized with Application.StartupPath\DatabaseName
after the Database instance is being created.

ConnectionString is used to provide additional settings to the database. Usually


it is used by the SQL databases to provide connection parameters - host, port,
user permissions etc.

Size is the size of the database in bytes after insert and read complete. The Size
value must accumulate all files on the disk used by the database to perform the
test for tables, indexes, collections etc. By default the application calculates
this property by the size of all created files in the DatabaseDirectory. However
for server and remote databases the property has to be overridden with
appropriate query invocation (see the MongoDB and MySQL cases).

It is possible to add additional properties to a database by just using the public


accessor on it. The property will appear automatically in the properties.

The Benchmark class ads a few parameters, needed by the GUI for proper test
visualization. These properties must be set in the constructor of the Benchmark
successor.

Category category of the database. The category can contain \ to define


subcategories. The string is used by the application to categorize the databases
in the TreeView control.

Description (optional) can be used to show some additional information in the


charts for the benchmark test

Website (optional) is the official web address of database

Color is the color of the database in the application speed & size charts.

Requirements (optional) contains a list of the library files, actually representing


the database engine (mostly for embedded databases)

8 | Page
Database Implementation

For an example database test implementation we have chosen STSdb 4.0


(because its API is very intuitive and easy to use).

Step 1: To start, we create a STSdb4Database class, successor of the base


Database class. (Create a STSdb4Database.cs file in the Databases folder.)

namespace Benchmarking.Databases
{
public class STSdb4Database: Database
{
}
}

Step2: Now we set up the constructor:

public STSdb4Benchmark()
{
SyncRoot = new object();

DatabaseName = "STSdb 4.0";


DatabaseCollection = "table1";
Category = @"NoSQL\Key-Value Store";
Description = "STSdb 4.0";
Website = "http://www.stsdb.com/";
Color = Color.CornflowerBlue;

Requirements = new string[]


{
"STSdb4.dll"
};
}

Step 3: After we have completed those steps, lets start with the main methods.
We have to implement the Init(), Write(), Read() and Finish() methods.

The Init() method for STSdb 4.0 looks like this:


public override void Init(int flowCount, long flowRecordCount)
{
engine = InMemoryDatabase ? STSdb4.Database.STSdb.FromMemory() :
STSdb4.Database.STSdb.FromFile(Path.Combine(DataDirectory, "test.stsdb4"));
((StorageEngine)engine).CacheSize = CacheSize;

table = engine.OpenXTable<long, Tick>(DatabaseCollection);


}

9 | Page
The next one is the Write() method. We have to implement the method
according to the database API. We open a table for the specified flow and insert the
records for that flow in it. Note that if the user makes a parallel insert test, then
multiple threads can invoke this method (one for each data flow), so the database

public override void Write(int flowID, IEnumerable<KeyValuePair<long, Tick>> flow)


{
lock (SyncRoot)
{
foreach (var kv in flow)
table[kv.Key] = kv.Value;

engine.Commit();
}
}

operations have to be thread-safe.

In The Read () method we have to return the records for the specified flow,
ordered by their keys.

public override IEnumerable<KeyValuePair<long, Tick>> Read()


{
return engine.OpenXTable<long, Tick>(DatabaseCollection).Forward();
}

In case of STSdb 4.0 we just return the Forward() method enumerator of the
relevant table.

In the Finish () method the database must be closed.


public override void Finish()
{
engine.Close();
}

Once we have added the new STSdb4Database class to the project, it will be
automatically shown in the TreeView control and we can select it for a test.

The GUI
The application interface is simple and easy to use. There is a TreeView control
containing a list of the currently implemented databases. The databases are
categorized by their storage model: Object, Key-Value store, Graph and etc.

10 | P a g e
To start a benchmark you have to choose the databases that will participate in
the current test. You can also choose:

The number of data flows that will be inserted (Tasks). The default is 1.

The number of generated records in each data flow (Records). The default is
100 000.

The type of the generated keys for all record flows (Keys). 0% value means
that all of the generated keys are sequential, and with the increasing of the
percentage the generated flow begins to contain random keys. 100% means
that all keys are random. The default is 100% key randomness.

After you start the application from the Start button, it automatically executes
a Write, Read and Secondary Read tests of the currently selected databases.

Figure 1 User Interface

During the tests, the results are visible in the line charts. You can choose
between an average chart, showing the changes of the average speed, and a moment
chart with the speed at the moment. Also you can see the moment and average
memory usage.

After the tests are completed, a bar chart report is generated, showing the
Speed (average), the Time and the Size for each database.

If you want further analyses, we have included the option to export the results
as a .CSV, .Json or .PDf file.

11 | P a g e
Application menu
File

New Project Restores all initial settings.

Open Project Load user settings from chosen directory and


docking settings from application config folder.

Save Project Save all modified settings of controls. User settings


are saved in predetermined directory.

Save Project As Save all modified settings of controls in user


selected directory.

Exit Close application.

Edit

for selected database

Clone Create clone of selected database.

Rename Allows to rename selected database.

Delete Delete selected database.

Restore Default Restore default databases properties.

Expand All Expand all TreeView nodes.

Collapse All Collapse all TreeView nodes.

for selected database type

Delete Delete selected database.

Restore Default Restore default databases properties.

12 | P a g e
Expand All Expand all TreeView nodes.

Collapse All Collapse all TreeView nodes.

View

Databases Windows Open window, contains databases tree


view.

Write Window Open Write window.

Read Window Open Read window.

Secondary Read Open Secondary read window.


Window

Auto Navigate Windows Navigate to the current process


automatically.

Show Legend Show the legend of selected chart.

Legend Position Left

Right

Top

Bottom

Logarithmic

Show Bar Charts Speed Open speed window.

Time Open time window.

Size Open size window.

CPU Open CPU window.

Log Window

Reset Window Layout Return default windows position.

Properties Open properties for selected database.

13 | P a g e
Tools

Contains settings, computer


Export Summary Report
specification and speed information.
results to
CSV Extensive report of results in
Detailed Report
.CSV file.

Contains settings, computer


Export Summary Report
specification and speed information.
results to
JSON Extensive report of results in
Detailed Report
.JSON file.

Contains settings, computer


Export Summary Report
specification and speed information.
results to
PDF Extensive report of results in
Detailed Report
.PDF file.

Online Send test results to dedicated servers, which are


report available for users.

About Database Open window with information about


Benchmark application.

Measurements

Input/Output usage

IO Data Bytes/sec shows the rate, in incidents per second, at which the process
was writing bytes to I/O operations. It counts all I/O activity generated by the
process including file, network, and device I/Os.

Memory usage

Page File Bytes Peak shows the maximum amount of virtual memory, in bytes,
that a process has reserved for use in the paging file(s).

Working Set Peak shows the maximum size, in bytes, in the working set of this
process. The working set is the set of memory pages that were touched recently
by the threads in the process.

14 | P a g e
Virtual Bytes Peak shows the maximum size, in bytes, of virtual address space
that the process has used at any one time.

CPU usage

% User Time shows the percentage of time that the processor spent
executing code in user mode. Applications, environment subsystems, and
integral subsystems execute in user mode.

% Processor Time shows the percentage of time that the processor spent
executing a non-idle thread.

% Privileged Time shows the percentage of non-idle processor time spent


executing code in privileged mode. Privileged mode is a processing mode
designed for operating system components and hardware-manipulating drivers.

The Process performance object consists of counters that monitor running


application program and system processes. All the threads in a process share the
same address space and have access to the same data.

Report Result form (Figure 2) contains the characteristics of the current


machine. The form is divided into four sections:

Operating System

Database Storage

CPU and Memory

Additional Information

15 | P a g e
Figure 2 Report Results Form

Windows Management Instrumentation (WMI) supports to obtain the data in


current form. WMI is the infrastructure for management data and operations on
Windows-based operating systems.

More information about WMI here.

Serialization

Database Benchmark now has the ability to create projects. Each project stores
all of the selected databases, their individual settings as well as the settings used for
the actual tests.

This provides a very robust way of creating test suites that represent different
real-life scenarios and helps with the database evaluation.

16 | P a g e

Vous aimerez peut-être aussi