Vous êtes sur la page 1sur 4

Data storage methods in block chain:

1. Storing everything in blockchain itself: Storing everything in blockchain is the simplest

solution. Currently most of the simple decentralized applications work exactly this way.
However, this approach has significant drawbacks. First of all transactions to
blockchain are slow to confirm. It may seem to be fast for money transfer (anyone can
wait a minute), but it is extremely slow for a rich application data flow. Rich application
may require many thousands transactions per second. Secondly, it is immutable. The
immutability is the strength of blockchain that gives it high robustness but it is a
weakness for a data storage. User may change their profile or replace their photo, still
all the previous data will sit in blockchain forever and can be seen by anyone. The
immutability results in one more drawback - the capacity. If all the applications would
keep their data in blockchain, the blockchain size will grow rapidly, exceeding publicly
available hard drive capacity. Full nodes can require special hardware. It may result in
dangerous centralization of blockchain. That’s why storing data in blockchain only is
not a good option for a rich decentralized application.
2. Peer to peer file system, such as InterPlanetary File System. IPFS allows to share files on
client computers and unites them in the global file system. The technology is based on
BitTorrent protocol and Distributed Hash Table. There are several good moments. It is
really peer to peer - to share anything first put it on your own computer. It will be
downloaded only if anyone needs it. It is content addressable, so it is impossible to
forge content by the given address. Popular files can be downloaded very quickly
thanks to BitTorrent protocol. However it also has some drawbacks. You should stay
online if you want to share your files. At least before someone becomes interested and
wants to download them from you. It serves only static files, they can not be modified
or removed once uploaded. And of course you can not search these files by their
meaningful content.
3. Decentralized cloud file storages: There are also decentralized cloud file storages that
lift some of IPFS limitations. From the user’s point of view these storages are just cloud
storages like Dropbox, for example. The difference is that the content is hosted on
user’s computers who offer their hard drive space for rent, rather than in datacenters.
There are plenty of such projects nowadays. For example, Sia, Storj, Ethereum Swarm.
You don’t need to stay online to share your files anymore. Just upload the file and it is
available in the cloud. These storages are highly reliable, fast enough, have enormous
capacity. Still they serve static files only, no content search anyway and, since they are
built on the rented hardware, they are not free.
4. Distributed Databases: Since we need to store structured data and seek for advanced
query capabilities we may look at the distributed noSql databases. Why noSql? Because
strict transactional SQL databases can not be truly distributed due to the restrictions of
the CAP-theorem. To make a database distributed we must sacrifice either consistency
or availability. NoSQL databases choose availability over consistency replacing it with
so called “eventual consistency” where all the database nodes in the network become
consistent some time later. There are many mature realizations of such databases, for
example MongoDB, Apache Cassandra, RethinkDB and so on. They are very good - fast,
scalable, fault tolerant, support rich query language but still have fatal drawback for
our application. They are not Byzantine-proof. All the nodes of the cluster fully trust
each other. So any malicious node can destroy the whole database.
5. BigChainDB: There is another project called BigChainDB that claims to solve the data
storage and transaction speed problem. It is also a blockchain but with enormous data
capacity and really fast transactions. Let us see how it is possible. BigChainDB is build
upon RethinkDB cluster, I mentioned this NoSQL database on the previous slide.
BigChainDB uses it to store all the blocks and transactions. That is why it shows such a
high throughput - it is the one of the underlying noSQL database. All the BigChainDB
nodes (denoted BDB on the slide) are connected to the cluster and have full write
access to the database. Here comes a problem - the whole BigChainDB is not
byzantine-proof! Any malicious BDB node can destroy the RethinkDB cluster. The
BigChainDB team is aware of this problem and promises to solve it sometime in the
future, however it is the corner stone of the architecture and changing it may not be
possible.Anyway, BigChainDB may be good for a private blockchain. But in my opinion,
to avoid confusion it should have been named BigPrivateBlockchain. It is not an option
for a public storage.
6. Ties DB: The currently available options could be a good public database. The closest to
the ideal are the noSql databases. The only thing they lack is byzantine fault tolerance.
The Ties.Network Database: ties.network is a deep modification of the Cassandra
database and offers a preferable solution: The TiesDB inherits the majority of features
from the underlying noSQL databases and adds byzantine fault tolerance and
incentives. With these features it can become a public database and enable feature-
rich applications on Ethereum and other blockchains with smart contracts. The
database is writable by any user. But the users are identified by their public key and all
the requests are signed. Once created, record remembers its creator who becomes an
owner of the record. After that the record can be modified only by the owner.
Everyone can read all records, because the database is public. All the permissions are
checked on request and replication. Additional permissions can be managed via a
smart contract.

Methods for meaningful data extraction

 Lexical/morphological analysis examines the characteristics of an individual word — including

prefixes, suffixes, roots, and parts of speech (noun, verb, adjective, and so on) — information
that will contribute to understanding what the word means in the context of the text
provided. Lexical analysis depends on a dictionary, thesaurus, or any list of words that
provides information about those words.
 Syntactic analysis uses grammatical structure to dissect the text and put individual words into
context. Here you are widening your gaze from a single word to the phrase or the full
sentence. This step might diagram the relationship between words (the grammar) or look for
sequences of words that form correct sentences or for sequences of numbers that represent
dates or monetary values.
 Semantic analysis determines the possible meanings of a sentence. This can include examining
word order and sentence structure and disambiguating words by relating the syntax found in
the phrases, sentences, and paragraphs.
 Discourse-level analysis attempts to determine the meaning of text beyond the sentence
Data Semantics

The semantic data model is a method of structuring data in order to represent it in a specific logical
way. It is a conceptual data model that includes semantic information that adds a basic meaning to
the data and the relationships that lie between them. This approach to data modeling and data
organization allows for the easy development of application programs and also for the easy
maintenance of data consistency when data is updated.

Abstractions used in a semantic data model:

 Classification - "instance_of" relations

 Aggregation - "has_a" relations

 Generalization - "is_a" relations

A semantic data model may be illustrated graphically through an abstraction hierarchy diagram,
which shows data types as boxes and their relationships as lines. This is done hierarchically so that
types that reference other types are always listed above the types that they are referencing, which
makes it easier to read and understand.

Data Dimensions


Data View


Data Extraction Methods






Smart Home Architecture