Vous êtes sur la page 1sur 10

NoSQL Database Topics

This page should be general concepts not specifics of a single NoSQL technology.
Originally "No SQL" was used in the sense of "database technologies that are nonrelational, and/or do not offer an SQL compatible or SQL-like query language", that
were either Column oriented, Key-Value Store oriented, or Document oriented KeyValue Stores. With Couchbase 4.0 set to add an SQL-like language and query service
to its technology, while retaining its document oriented non-relational, and nonschema-enforced design, some are suggesting the term "noSQL" be understood as
an acronym "Not Only SQL".
An introductory whitepaper is attached to this page as a PDF: Why No SQL?
A seminal presentation on Polyglot Persistence, by Martin Fowler, considered a key
NoSQL luminary:

http://martinfowler.com/articles/nosql-intro-original.pdf

Some Ramsoft notes:

The fundamental approach in a Relational database is to give schema


enforcement job, and the determination of your query plans, as jobs
belonging to the database engine. A worklist feature is engineered via
relational models. While users might tune their SQL for performance after
looking at query plans, it's more common in the NoSQL world for developers
to engineer their systems via distributed techniques like Map and Reduce. It's
a fundamental mental shift.

In a NoSQL document database or key-value store, or big table database, the


system is schemaless. Or rather your application still needs a schema but
that schema exists only as a perceived schema, enforced only by whatever
code you build to enforce schema, within your web-services layers.

The system is designed from the ground up to be scalable, and to be


distributed.

Memory based caching is commonly used, making a NoSQL database very


fast for certain kinds of data fetches, even when a lot of rows are returned.

Clustering, sharding, replication, and other scalability and uptime enhancing


features are primary design drivers, instead of features you bolt on
afterwards.

Working with a cloud based architecture requires a smarter client. Clients for
NoSQL are often much smarter than conventional SQL clients.

Working with a NoSQL database.


o

In a key-value store system, there is a single key used to access any


object. The normal and lowest cost way to access any object is by
knowing its key, which is its object unique identifier.

In a document database, it is possible to very quickly find a particular


entity, even without any query engine, by walking a table, and because
this walk is all done without disk access, it can be much faster than in
a conventional SQL database. Sometimes, such as when you need full
text search on a document, or search on a wide set of fields, the end
results are dramatically faster than any SQL RDBMS.

Some NoSQL databases may have an expression language or even an


SQL-like mini language to make writing queries easier. It might be the
wrong approach, or it might be the right approach, to continue to be
"query-oriented" in our development. We need to establish rules for
when to use queries and when to use other techniques. Just because
queries were your only tool in the SQL world, don't let your thinking be
constrained. When all you have is a hammer, everything looks like a
nail.

Because of the schemaless nature of the system, everything in the


database is effectively optional, over time. This needs to be reflected
in our webservice design by having schema validation occur as part of
our webservice. For example, if a user provides a value equivalent to a
Foreign Key, the Foreign Key checking is done by the webservice,
which should raise an exception back to the client if an invalid foreign
key is provided. Strictly speaking the concept of a Foreign Key does not
exist in NoSQL, but it could exist as an element of our WebService
Application Service Design.

TODO: Schema design tips for Document databases.

Couchbase
This page is a bootstrap page to help developers understand Couchbase and start
using it. Couchbase is a document oriented NoSQL Database utilizing JSON as its
native document format.
Couchbase has slightly different capabilities and features than other products that
are also considered NoSQL, such as Hadoop or MongoDB. To understand the
differences, it helps to understand couchbase's architecture, and the problems it is
meant to solve, and how that differs from other tools like Hadoop or MongoDB.
See notes on Couchbase-to-RAMSOFT communications at the
Couchbase_Meeting_May27_2015 page.

First of all, do not confuse Apache CouchDB with Couchbase. They are
different systems, with some historical links, and some of the original
CouchDB developers are now working on Couchbase. Both use an underlying
memory cache server called memcached as one of their cluster node's
available services. See CouchBase vs CouchDB

Current Version of Couchbase as of May 2015, is version 3.0. Version 4.0 is


planned for release mid 2015. Version 1 of couchbase was a merged product
based on Membase and CouchOne Inc. JSON store was introduced in 2.0.

Features in 4.0, including N1QL, which is pronounced Nickel


o

N1QL Tutorial: http://query.pub.couchbase.com/tutorial/

N1QL Cheatsheet: http://docs.couchbase.com/files/Couchbase-N1QLCheatSheet.pdf

Developer Preview 4 of 4.0 is being used by RAMSOFT to evaluate Couchbase


technology. Currently the DP4 release is only available on Linux and MacOS.
No Windows Server exists yet.

Database Schema Design, Replication and Document Oriented Databases.


o

https://www.youtube.com/watch?v=Mj4PKKidREU : Couchbase 103


Data Modeling

TODO: COuchbase Server Setup

TODO: Couchbase Client SDK Setups

TODO: Webservice design with COuchbase; How to move the schema


enforcement into our webservice layers.

TODO: Explain VIEWS, INDEXING and COLLATION.

Using Couchbase Admin Console


Admin consoles are accessed via your web browser, for example:
http://couchbase1.ramsoft.biz:8091
Login is Administrator. Ask any developer for the current Couchbase cluster Admin
password.

Optimistic Locking with CAS


Optimistic locking via Compare and Swap is one technique for consistency, without
transactional locking. This is the shipping 3.0 docs on the subject:
http://docs.couchbase.com/developer/dev-guide-3.0/update-info.html

get, cas, set handles success

one possible failure case we could code up involves short random delay, retry
(get, set again).

The underlying CAS model is the same in 4.0 but the client has been upgraded to
involve less client side checking, that is to say, when doing a replace, I believe in
4.0 you simply perform your operations, and must handle Exceptions raised in the
client when inconsistency is detected.

Using the N1QL Client (cbq) to interactively query and create indexes.
You need to be logged into the couchbase server machine to use cbq, at least as of
DP4. That means you need an ssh client on your windows PC, you can get putty
from the network at \\mainstorage\shared\Downloads\putty
You can ssh into couchbase1.ramsoft.biz, use devuser account as logon, ask Jasmine
or Warren for current password.
The primary unit of document aggregation is the bucket. To query a bucket it must
have at least one index. After you have read the tutorials and documents above,
you will know how to create an index. That is a command executed from cbq.

Using Couchbase From Visual Studio in C#

Demos in svn: http://svn.ramsoft.biz/core/misc/demos/couchbase%20c


%23%20demos

C#/.Net Couchbase SDK Documentation

Hello World Example (Visual Studio 2013 or higher)


o

Create new C# console application

Use NuGet package manager console and type Install-Package


CouchbaseNetClient

Use this code in your Program.cs:

o
o

using System;

using Couchbase.Core;

using Couchbase.Configuration.Client;

o
o

namespace Couchbase.HelloCouchbase

class Program

o
o

static void Main(string[] args)

o
o
o

ClientConfiguration _config = new ClientConfiguration();


_config.Servers.Add(new
Uri("http://couchbase1.ramsoft.biz:8091/pools"));

//_config.BucketConfigs.Add(...);

var _cluster = new Cluster(_config);

o
o

using (var bucket = _cluster.OpenBucket() )

var document = new Document<dynamic>

Id = "Hello",

Content = new

{
Name = "Couchbase"

o
o

};

o
o

var upsert = bucket.Upsert(document);

if (upsert.Success)

var get = bucket.GetDocument<dynamic>(document.Id);

document = get.Document;

var msg = string.Format("{0} {1}!", document.Id,


document.Content.Name);
Console.WriteLine(msg);

o
o
o

Console.WriteLine("couchbase communications success.


press enter to continue.");
} else {

Console.WriteLine("upsert failed. press enter to

continue.");
}

o
o

Console.Read();

o
o

Schema Requirements and Performance Requirements for NoSQL

Definitions

RIS (Radiology Information System) = A software system enabling a team of


office workers to manage a Radiology office's daily operations, most
importantly the scheduling of appointments, and the tracking of all study
information, billing and insurance information, and any other information
related to the operation of a medical office.

PACS (Picture Archiving and Communication System) = A software system


enabling a physician (whether a Radiologist or a general practitioner) to view
a Diagnostic Image (such as an X-Ray), and to view related previous images

for the same patient ('relevant priors') and to view diagnostic reports, or to
dictate or transcribe a diagnostic report.

Worklist = The heart of a RIS and the heart of a PACS. A query consisting of at
least, Study date and time, Patient identification, Workflow Status of a study.
Some example worklists:
o

Dave has a scheduler worklist that lets him see referrals and put them
onto a calendar.

Steve has a worklist that lets him see studies that need updates to
their insurance information.

Janet has a worklist that shows which X-Ray studies need to be read by
her today.

Architecture Diagram and User Types

Performance and Hardware Requirements

We don't necessarily want to deploy a 3 node cluster on every site. How is


the performance of 1, 2 or 3 node cluster using any given NoSQL Solution
(say Couchbase) compare to the same hardware running MS SQL Server? (1
node should be okay, but you need to deploy backup.)

Do you need a 3 node cluster (say 3x some 2015 model of dell rackmount
server) to equal the single node performance of MS SQL Server or some other
conventional SQL solution that has features that Firebird lacks, like
clustering? (No but you lose durability, and you have backup/data loss
concerns now.)

Architecture and Features

Is N1QL useful for updating or just for querying? Answer: 4.0 release is readonly (SELECT). DML features (update/modify) post 4.0, planned.

Are there transactional update features in N1QL in 4.0, or planned? What are
lightweight transactions? Answer: Post 4.0.

How does the introduction of N1QL change how we would design our keys or
our document data? (Not sure yet.)

Is .NET platform client SDK a first class SDK? Why didn't the .net client
support the N1QL in DP4? (Will be updated soon.)

Is there any reason why we can't have hundreds of buckets? We read that's
not recommended, why? (Server Memory usage. Plan to use less than 10
buckets.)

Schema Requirements

audit log

A simple log which is appended to, and read, the table can grow very large
over time, no transactional or locking complications. Up to 40% of our largest
site's data, over time, might be composed of audit log entries.

Exam Image datasets

Exam image datasets consist of medical images that are composed of two
parts
o

Header that contains information about the patient i.e. name, date of
birth, exam description etc. The size of the header is typically < 10 KB

Pixel data that contains the actual image. Compressed Pixel data can
vary from 10 KB to > 4 GB, especially for multi-frame pixel data.

Couchbase as a document size limit of 20 MB and best-practices state to


keep it < 1 MB. This should be enough to store the header of an image minus
pixel data and any large tags that would not be useful for querying anyways.

Currently, we store selected header fields within our RDBMS database along
with a file name pointer to refer to the image on disk.

Image datasets are immutable. They are inserted only and never updated,
though they can be deleted.

scheduler

atomic reservation of time slots demands certain transactional or atomic or


ACID-like capabilities. User A and User B are going to click "Reserve 8 AM slot
on Machine 123" at the same time. One of them needs to win. One of them
needs to get an error message.

schedules are a kind of worklist.

schedules have a calendar.

schedules have a set of resources, that are grouped into rooms, which are
part of facilities. (a natural relational master-detail, three levels, in a
relational model).

resources have rules and other properties like

what times they are available

what times they are unavailable

blocked time can be specified


o

in single block time entries

as part of recurrences or series groups, which affect multiple resources,


or all resources on a particular site or location.

appointment objects are linked to studies, but are additional state for a study.
they connect a study to a machine, at a particular time, and are considered a
"contract" with a client.

eventual consistency issues would be a deal-breaker in the scheduler, you


can't have two sites scheduling the same resource, and not seeing each
other's data instantly (ideally) or at very least they can create provisional
bookings, and must then receive some confirmation or some error that their
booking succeeded at the root level where-ever the booking becomes final.
(round trip ACID, in a distributed environment).

worklist and studies

define worklist : a worklist is a view into the pending, in-progress, and


completed studies

a study is the single most important object in any PACS DB or RISPACS DB.

a worklist is a filtered view of the studies a particular user of our system


needs to see. Currently relies heavily on relational database capabilities to
create our worklist queries, and our worklist queries define the worklist
element of the system.

transforming our studies, and our worklist into a document architecture,


represents the largest sea-change in our design, and it is also the most
complex and multi-layered set of requirements of any area of our database.

the rest of RIS - patients, doctors, insurers, providers, referrers, etc

define our insurance and study and patient models.

define all referring and other healthcare provider models.

discuss an approach to the RIS side of the database that maintains all prior
states of every object, having auditing and undo, and the inability to destroy
or overwrite data, but rather to have it be superceded by newer copies of the
same data, as a design element.

design for audit, design for immutability; A patient doesn't so much have a
current set of insurance information, but rather has a latest set of information
(almost like the tip of a version control system) and all historical states for
that information, whether it is insurance, allergies, or whatever, is still
retrievable.

Vous aimerez peut-être aussi