Vous êtes sur la page 1sur 16

White Paper

Image Area

SQL Server 2012 Database Engine


Key features and enhancements
Sandeep Kalra, Manoj Chandran Nair, Phaneendra Babu Subnivis

Abstract
Microsoft has come up with yet another fascinating release of SQL Server with a bunch of new features as well as significant
enhancements to the existing feature set. While Always On feature addresses the critical high availability and disaster
recovery requirements, Columnar indexes boosts up the performance of query execution drastically in warehousing
scenarios. With SQL Azure gaining momentum, it is important to have code developed which is compatible across onpremise and cloud. SQL Server Data Tools helps to great extent to achieve this. Further, there are improvements in Full Text
Search, introduction of Semantic search which strengthens the SQL Server searching capabilities on unstructured data,
File table and many more features. This document is aimed to enable designers/architects to get an overview on these key
features and help them in building solutions.

www.infosys.com

Contents

Introduction ...3
Always on

...4

Contained databases

...6

Columnstore indexes

...8

Spatial data enhancements

...10

Search features & enhancements

...11

SQL Server Data Tools (SSDT)

...12

File Table

...14

Conclusion ...16
References ...17
Acknowledgements ...18

02 | Infosys

Introduction
Microsoft SQL Server got established as a prominent database product with release of SQL Server 2005. From there on it had evolved
significantly as a Business Intelligence (BI) suite with key enhancements in Integration Services, Analysis Services and Reporting services
along with strengthening of database management capabilities. In SQL Server 2008 R2, inclusion of concepts like Self Service BI and
features like PowerPivot along with BI suits integration with SharePoint, established it as an enterprise class product in this space. Currently
SQL Server 2012 RTM is out in market with exciting feature set and enhancements.
Every release Microsoft will come up with themes based on which the overall feature set is identified to be part of the release. In SQL Server
2012, it came up with three themes i.e., Mission critical confidence, Breakthrough Insights, and cloud on your terms. While Breakthrough
Insights cover more of features related to BI, this document covers, features coming under the themes Mission critical confidence and
cloud on your terms.
The mission critical confidence theme has a bunch of features that addresses the likes of high availability, performance, maintainability,
accessibility, and search enhancements. Following is the table describing the key feature set being mapped with the themes and brief
description of each of these.

Theme

Mission
critical
confidence

Cloud on
your terms

Feature

Description

Always On

A feature which enables to setup both high availability as well as the


disaster recovery of the database environment. Further, it is allowed to
take database backup/read only usage of secondary copies of the
databases which are configured for failover purposes increasing ROI
for the customers investments.

Columnar
Indexes

This is a relational database engine enhancement which improves data


retrieval performance drastically in warehouse environments.
However, there are certain limitations when this feature is enabled.

Contained
Databases

This is a new concept introduced which helps movement of database


across different on-premise servers and even to SQL Azure. Current
release only partial containment is supported while the end goal is to
achieve full containment.

Search

There are certain significant enhancements made to existing Full Text


Search (FTS) by introducing NEAR operator enriching users to provide
flexible search criteria. Further, there is a new feature namely Semantic
Search introduced which helps in building key phrases based on the
score of occurrence of the same. It also enables users to search for
content based on the meaning.

SQL Server Data


Tools

This is an integrated environment to develop database as well as the


Business Intelligence (BI) applications. It even facilitates development
targeting to specific platform like on-premise/off-premise with
support to intellisense.

Spatial data

There are improvements to existing spatial data capability by


representing full globe wherever required. Support for representing
circular shape has also been included. Further there are improvements
made in indexes as well as precision to store longitude and latitude
information.

File Table

An enhancement to FileStream data type where in the data is stored


across database and File System (NTFS) in an integrated fashion. It also
helps in accessing the data directly from the File System. It offers
extensive flexibility to access data both from database as well as the
application side.

Infosys | 03

Let us look into more details of above mentioned features and get a feel of their capabilities and usage.

Always on
This is one of the key features planned to be part of SQL Server 2012 which was code named as HADRON initially. Later it has been publicized
as Always On. Before getting into more details, let us have a quick look at the limitations that the current high availability options i.e., Failover
Clustering, and database mirroring:
Failover clustering:

Mandate to implement at the instance level. Hence for databases of huge sizes, maintenance could be a problem

Second node is unusable in case of Active Passive mode of implementation of Failover clustering. The passive node cannot be used for
any of data access or backup purposes.

Database mirroring:

Painful to implement for applications accessing multiple databases (Ex: Large SharePoint installations)

For disaster recovery, combination of above two solutions should be implemented with/without combining with Log Shipping with either of
the one of the above two choices.
With AlwaysOn, Microsoft provides a solution which addresses both High Availability and Disaster recovery since this has capability of being
implemented at multisite as well as the capability of using the secondary node for offloading reporting as well as the backup requirements.

Improvements in Failover Clustering Instance (FCI):


Failover clustering across multiple subnets With SQL Server 2008 and Windows Server 2008 onwards geo clustering (clustering across data
centers/servers across disparate geographic locations) was made possible. Having said that, the downside of it is that the data centers should
be connected through VLAN so as to meet the basic requirement of having all the nodes in the cluster belongs to the same subnet. Below
figure depicts the architecture of the same:

Figure 1: Failover Clustering

04 | Infosys

In SQL Server 2012, the multi-subnet failover has been introduced which means there will be no pre-requisite for the nodes which participate
in clustering needs to be part of the same VLAN. Below figure depicts the architecture of the same:

Figure 2: Multi-subnet failover architecture

In case of multi subnet failover cluster configuration, failover time across subnet would be relatively longer when
compared with the normal cluster. Hence it is advised to have a failover cluster at each of the location in the multi subnet
scenario. This way in case of local failures, the failover happens with in the same cluster which would be faster. The failover
across subnet happens only in cases of disaster (entire local cluster goes down).

Robust failure detection and management of the servers :


Current version of cluster health-check is based on the state of SQL Server service. As long as the service is running, the instance is considered
to be UP/live and when the service stops/doesnt respond, then it is considered as node failure which subsequently initiates a failover.
In SQL Server 2012, this has been made more robust. It is being done in three ways:

Monitor health status There are 3 types in which the SQL Server health is monitored

State of SQL Server service The Windows Server Failover Clustering (WSFC) service monitors the SQL Server service on the active node and
detects when the service stops

Responsiveness of SQL Server instance While installing SQL Server, WSFC initiates a separate thread to exclusively monitor the
responsiveness of the. The server response is checked against the value configured for HealthCheckTimeout property

SQL Server component diagnostics The system stored procedure sp_server_diagnostics periodically collects the diagnostics on the server
instance. The diagnostics are collected for the components system, Resource, query process, io_subsystem and events. The first three
components are used for failure detection. Remaining two is used for diagnostic purposes only. This information is written into SQL Server
Cluster diagnostics log.

Infosys | 05

Determine failures The SQL Server database engine resource assembly determines whether the detected health status is a condition for failure
using FailureConditionLevel property. This property defines the detected health status and cause for failure overs/restart. There are multiple
states based on which the failure condition is determined. Refer to the link: http://msdn.microsoft.com/en-us/library/ff878664(v=sql.110).
aspx#respond for more details on this.

Respond to failures The WSFC responds to failover depending on the failure conditions detected. An attempt to restart the failed node is
made depending on whether FCI retains WSFC quorum. If FCI loses the WSFC quorum the entire FCI should be brought offline which means it
had lost its high availability.

Availability groups:
An availability group is a feature that supports failover environment for a discrete set of user databases that can failover together. The database
set is hosted by an availability replica. There are two types of replicas i.e., primary replica and secondary replica. Primary replica enables
application to connect and perform read-write. Secondary replica serves as a potential failover environment for the availability group. The
transaction log of primary replica is periodically (based on the configuration setup) applied on secondary replica to replicate the data across
the environment and get it ready for handling a failover scenario. In an environment, there can be only one primary replica and one to 4
secondary replicas. The secondary replicas can be configured for read only operations and off load the reporting requirements to it. Any of the
secondary nodes can be configured for database backup operations as well. Another important point is that (each of ) the secondary replica
should exist in separate node of the same WSFC cluster.
Data synchronization across primary and secondary replicas happens in two ways i.e., Synchronous and Asynchronous (same as in database
mirroring). In case of synchronous setup, data in primary replica gets completed only after it gets committed in the secondary replicas. For
asynchronous setup, data initially gets committed in primary replica but gets moved onto secondary replica subsequently in batches. The later
one is preferred as we dont want to have primary replica hampered due to the data synchronization.
An availability group can be defined as a group of databases put together for the purpose of high availability. For ex., consider an application
which is dependent on several databases (say 5 databases) to access and store data, while designing high availability for this application, one
should ensure that all these 5 databases should be part of the availability. Earlier, this type of requirements was addressed by configuring
database mirroring at each database level (when clustering is not an option to go for because of its cost). With the availability groups feature, all
these databases are put together defined as an availability group. Further, the high availability configuration is done so that whenever failover
happens, it takes care of all five databases availability.
Basic requirement for Availability group configuration setup is to have WSCF. When the availability groups are setup, it should have one SQL
Server instance hosting the main group which is active and titled as Primary role. The primary availability group will have read write access
and that is the server with which the application interacts and performs the required operations on the databases. Along with this there would
be another set of availability group defined as Secondary role. This would serve as a failover copy. There is a possibility of configuring one to
four secondary groups.

Contained databases
Contained database is a new feature in SQL Server 2012 which supports the concept of containment and tries to align the SQL Server (onpremise) and SQL Azure (cloud) technologies. In this case, the database is considered as an application package which can be easily moved
between SQL servers. All the database settings and metadata required to define the database resides inside the database itself and there is no
dependency on the server instance.
A database can be fully contained, partially contained or uncontained. A full containment is provided by the SQL Azure databases and which
can be easily moved to other SQL Server instances. As of now, for on-premise SQL Servers only partial containment is supported. The two
prominent features in SQL Server 2012 that support containment are

06| Infosys

Collation: The collation information for the database which was stored in the master database on the server instance is now stored inside
the database itself. The database will retain the collation even though it is moved onto another server with a different collation. The same
collation will be applicable for the temporary tables that are created inside the database.

Contained Authentication: In SQL Server Management Studio, now there is no need to create SQL Login objects to authenticate the users at
the server instance. With the use of contained user objects, a user can directly log into the contained database in SSMS and start working.
The user will still remain as a guest user on the server instance and wont be able to access other databases on the server instance.

The option for creating contained databases needs to be enabled on the SQL Server 2012 server instance as shown below. Once the above
setting is enabled, then the database can be set to containment type as Partial or None in Database Properties --> Options
The concept of containment is to realize database as a package which can be moved between SQL servers. By categorizing the databases
objects as contained and uncontained, we made sure that the constraint for application boundary has been imposed. Still the task of deploying
the contained database to another database server needs to be carried out. DACPAC helps us to move databases between environments easily.

Figure 3: Contain DB Settings

Relevance of DACPAC with contained databases


DACPAC or data-tier application is a feature introduced in SQL Server 2008 R2 where in a SQL Server database can be implemented as a Visual
Studio project and built into a deployment package. This deployment package can subsequently be used to deploy the database across the
environment. As part of SQL Server 2012 and specifically SQL Server Data Tools, some enhancements have been introduced for creating datatier applications (for more information, please refer to the section on Juneau).
A SQL database either contained or uncontained can be register as a data-tier application by right clicking on the database and selecting
Tasks --> Register as Data-tier Application. This will create a DAC entry in the Management folder on the server instance as shown below. The
validation check during the DAC creation makes sure that no uncontained objects or features are implemented in the database thus aligning
it to SQL Azure.

Figure 4: Data Tier Application

Infosys | 07

Once the new DAC is created, using Visual Studio, it can be used to upgrade by right clicking the existing DAC in the Managements folder and
selecting Upgrade Data-tier Application.
Contained databases and DAC are symmetrical to SQL Azure and would improve the movement of databases between on-premise and cloud
versions of SQL Server. Though containment is a nice feature to start with, there are some limitations which the user has to face while using
them like :
Cannot use replication, CDC and change tracking
Numbered procedures
Schema-bound objects that depend on built-in functions with collation changes
Databases cannot be accessed in the same security context and need to make use of contained authentication.
Going forward, more features will be supported in SQL Azure and any new features will be released in both SQL Server versions to align the
technologies which would help to achieve full containment further improves the seamless portability of databases between on-premise and
cloud databases. Also realizing your database as a deployment package using data-tier applications, issues with maintaining different versions
of databases will be reduced and will help in ease of deployment.

Columnstore indexes
Column store indexes introduced in SQL Server 2012 is a new
way of storing and accessing the data residing in SQL tables. It
provides improved query processing abilities and speed up the
process of data analysis for business users. This would be hugely
helpful in data warehouse scenarios where significant amount
of time is spent on creating aggregations, summary tables or
views in order to provide business value to the end user or
while using MIS based reports on large relational databases.
Using columnstore indexes, it is now possible to query the data
warehouse tables containing millions of records directly and
analyze the results quickly.
When column store indexes are created on a particular fact
table, it stores each column in the table in separate disk pages
rather than storing rows in a disk page as in case of traditional
indexes. The figure below shows the arrangement of columns
in disk pages.
The benefit of such an arrangement is that only the columns
needed to execute the query which in reduces I/O. Also
these frequently accessed columns remain in memory which
improves query performance and execution. Also due to
column wise data storage, the values can be highly compressed
due to similar values in a particular column. SQL Server makes
use of Vertipaq (now called xVelocity) technology which
provides high compression and better buffer hit rates. This same
technology is used in Powerpivot and the newly introduced
Business Intelligence Semantic Model (BISM) tabular models in
Analysis services.

Figure 5: Columnstore Index Creation

The syntax for creating a column store index is as follows:


CREATE NONCLUSTERED COLUMNSTORE INDEX NewNCCSI ON DemoTable(Column1,Column2,Column3)
Alternatively, you can create it in Management Studio as well by right-clicking on Indexes folder and selecting New Index Non-Clustered
Column Store Index. The below window will open where you can add the columns to be included as part of the index.

08 | Infosys

Note that there are only specific data types that can be added as part of the index (int, bigint, float, char, varchar, nvarchar, decimal, money, bit,
Date, time etc). Also there can be only one non-clustered column store index per table and it cannot contain SPARSE Columns.
The above screenshots show that with column store index, the query hardly
took a second the fact table contains around 17 million records. Columns store
indexes can cater to more complex scenarios wherein, if the query is properly
structured, it is bound to provide huge performance benefits. However please
note that, even though the index is created, it depends upon the optimizer
when to use it. The optimizer may still make use of the traditional clustered/
non-clustered indexes as seemed fit for a particular scenario.
Column store indexes will benefit in scenarios where in there is a need to
quickly analyze large amounts of data residing in a data warehouse and
obtain useful information instead of maintaining dimensional models and
aggregations. This would in turn reduce the development costs and increase
business value with rapid exploration. However, column store indexes cannot
be thought as a replacement for UDM and cubes which would still exist
and be first choice for data warehouse scenarios. Also as mentioned in the
previous paragraph, column store indexes will be only used in cases where
the query optimizer feels it is apt; it wont be used if the query uses a seek
operation.. Also another point worth noting is that the table with a column
store index cannot be updated and the index needs to removed and rebuild
after each incremental update.

Figure 6: Columnstore Index Execution Plan

Spatial Data enhancements


Spatial data types were introduced in SQL Server 2008 to provide the ability to store location information (geospatial data) in SQL databases.
Along with the data types, came a host of T-SQL methods and functions which can be used analyzing and querying location information stored
in these data types. Additionally there were provisions to improve the performance of these queries using spatial indexes.

As part of the SQL Server 2012 suite, there have been some major upgrades to the spatial data types

Introduction of Circular Arcs: Circular arc objects can be created in SQL Server alongside Line strings and Polygons. Collection of zero
or more continuous arc segments is termed as Circular String. A geometric collection object can be created to include CircularStrings and
LineString objects, this collection is termed as Compound Curve. Same way, collection of CircularStrings, LineStrings, CompundCurve
having at least 4 points and same X and Y coordinates of start and end point can be termes as CurvePolygon. Using circular arcs, the locations
on the earth can be represented more accurately e.g. curved roads or water bodies. Also there are new extended methods on geometry
instances like BufferWithCurves(), CurveToLineWithTolerance() and new geography Data type method references like STNumCurves(),
STCurveN(), STCurveToLine() which can used for circular arc segments. Improved Precision: The precision for storing the longitude and
latitude values in spatial data objects have been improved to 48 bit compared to 27-bit in earlier versions. This would help in plotting the
locations on the map area with more accuracy or while using spatial methods during analysis.

Full globe Support: The restriction that a spatial object should be no larger than a single hemisphere has been removed. In SQL Server
2012, objects can be created which can represent the complete earth as an object. This is done with the use of the key word FULLGLOBE
during object creation. This could help in represent large objects which earlier had to implemented as combination of different spatial
objects.

Spatial Index Improvements: In SQL Server 2012, during spatial index creation, there is option now to set the appropriate grid system
to be used. The two options available are auto grid and manual grid. Wherein manual grid stands for the traditional grid system (with 4
grid levels) used in SQL Server 2008, auto grid has been introduced in SQL Server 2012 with 8 grid levels for better approximation of grid
objects. Also the grid can be specified based on the spatial data type: GEOGRAPHY_AUTO_GRID or GEOMETRY_AUTO_GRID.

Infosys | 09

Another feature as part of the performance improvements is the SPATIAL_WINDOWS_MAX_CELLS query hint which can be used during spatial
analysis as follows

SELECT *
FROM table t with(SPATIAL_WINDOW_MAX_CELLS=1024)
WHERE t.geom.STIntersects(@window)=1;
This query hint helps in controlling the tradeoff between performance and index efficiency. The default values for geometry and geography
data types are 512 and 768 respectively but this can be changed to suit specific needs. Compression is also supported now for spatial indexes.
For DBAs two special helper stored procedures to evaluate the histogram is available:

sp_help_spatial_geography_histogram Helps determine how many cells a given feature intersects. Outputs a grid at a given level and
resolution. Useful for performance tuning.

sp_help_spatial_geometry_histogram Helps determing how geography index cells are created and in turn determines how many
features intersect each cell.

Both of these features helps in visually analyzing and tuning of indexes.


Overall, there have been significant improvements for spatial data in SQL Server 2012 and this in turn opens up a lot of options for creating
spatial applications. The support for curve objects, full globe and performance improvements in spatial indexes provide opportunities to
build complex spatial queries which otherwise had to be implemented on the application front. Also with improved precision, location
information can be accurately represented and plotted onto maps and visualizations to provide informed value and help is making better
decisions.

Search features & enhancements


Traditionally the Full Text Search (FTS) feature is used to perform search on objects which are stored in blob kind of storage in the database.
This has a separate search engine which took care of performing the search based on catalogs and indexes that were built while creating the
objects. Till SQL Server 2005, the FTS engine was outside SQL Server core database engine because of which there were certain concerns on
things like scalability, security etc. However, this has been addressed in SQL Server 2008 release where in the FTS is fully integrated with the
database engine and the functionality was further enhanced. While there were quite a few enhancements to this feature in SQL Server 2008 R2
release addressing some of the performance related issues, in SQL Server 2012, they introduced semantic search along with FTS to cater the
requirements of searching data stored in blobs. Below section describes about the enhancements done to FTS in SQL Server 2012 release and
also provides an overview of Semantic search capabilities.

Full Text Search (FTS):


Following are some of the key enhancements made in FTS as part of SQL Server 2012 release:

Optimized query performance while index updates happening in parallel. When index update happens, there will be a shared lock on
schema which was causing this bottleneck. There has been significant boost in performance (10X in most of the cases as published by
Microsoft) without having to change the structures.

Support for property scoped searches is enhanced where in users can issue queries on the documents which are stored in a table with FTS is
enabled. Users can query those document properties like author of the document using CONTAINS clause without maintaining the author
of the document as separate column.

There is an introduction of custom NEAR operator which enables users to provide the search with following options:

Search for words with a number of words between them. For example, to identify documents which have a pattern of words SQL
Server and expertise and 5 words between these two terms. This can be written as WHERE CONTAINS (Resume, NEAR((SQL Server
), expertise, 5, false)). Here, order of SQL Server and expertise is not mandatory for the record to be part of the result set.

If it is written as WHERE CONTAINS (Resume, NEAR((SQL Server ), expertise, 5, true)) means that the order of words is mandatory.
While searching, SQL Server results only that content which starts with SQL Server followed by expertise.

For more information on these enhancements, go through the links given in the References section.

10 | Infosys

Semantic search:
FTS helps users to search those documents which are meeting the search criteria. However, it has a limitation that it cannot fetch the related
documents for the criteria set. This is where semantic search comes to rescue. Statistical Semantic Search goes one step ahead and enables
users to discover statistically relevant insights through efficient keyword searches . Semantic Search picks up all key phrases across the
documents with ranking based on the occurrence of the word/key phrase, that can be used further to carry out search operations and provide
the relevant documents for the criteria set.
Currently, there are 3 functions provided to accomplish the semantic search:
EMANTICKEYPHRASETABLE: Used to find keyphrases in the document.

SEMANTICSIMILARITYTABLE: Used to find similar or related documents.

SEMANTICSIMILARITYDETAILSTABLE: Returns a table containing the key phrases common across two documents whose content is
semantically similar.

Note:

Semantic search needs a full text index to be defined

Documents should be stored as part of File Table in order to work with semantic search. For more information, refer to FileTable section in
this document.

SQL Server Data Tools (SSDT)


SQL Server Data Tool (Code Name: Juneau) is a new integrated tool for database development to be made available in the next version of Visual
Studio. This tool brings to developers the familiar Visual Studio Tools along with standard database features like Intellisense, T-SQL Editor, Visual
table designer and lot more. With this new tool, the database development will be the same experience as it is in the application development.
People familiar with previous versions of database projects will find this tool as an evolution of the existing project templates. It also has SQL
Server Management Studio (SSMS) kind of browsing structure for hardcore database developers.

Core Features

Figure7: SQL Server Data Tools (Source: TechEd 2011)


SSDT is aimed at professional database and application developers, as well as administrators. Some of the core feature set involves:
Online Database development with a SSMS like tree view in server explorer
Import from existing database, snapshot or DACPAC project
Intellisense support across all the database object files
New designer tool for visual experience
Refactoring table names, field names, etc. without losing data
Navigation tools for offline development in T-SQL Editor
Errors caused immediately show up in the Error list pane and are platform specific
Previewing database updates with an option to publish on target database
Schema comparison between current project and existing state of the database
Option to sync the changes once schema comparison is reviewed
Online vs Offline Project Development

Infosys | 11

Online Mode
SSDT provides an online development platform in the same way as there in management studio (SSMS), where you can connect to an
instance of SQL Server through Server Explorer. You will see the same hierarchy tree structure as there in SSMS, and it also includes the
T-SQL Editor and Intellisense support.
For Troubleshooting errors, SSDT gives the facility of editing in either the T-SQL editor or table designer. The errors immediately show up in
error pane, which enables to follow errors for further troubleshooting.

Offline Mode
You can also create a database project using SSDT and import an existing database for offline development. This will import the schemas
and database objects in solution. While working in offline mode, the same T-SQL editor and designer tools will still work. This means you
would still get the management studio kind of interface to work with different databases in different versions of SQL Server (including SQL
Azure).
The T-SQL Editor for offline development also provides for two useful navigation tools for Visual Studio developers, Go to Definition and
Find All References. Right clicking a particular table in a SP/Function will give you the option to Find All References of the table in the
project. You can also right click on the table name and click Go To Definition, which will open the T-SQL for table definition.

Project Deployment on target platform, Code Analysis


Using the project properties dialog box in Visual Studio, one can change the target platform on which the project needs to be deployed.
Currently supported platforms are SQL Server 2005, SQL Server 2008/2008 R2, SQLs Denali (2012) and SQL Azure. You can also create a
DACPAC file as output or a SQL script file.
Right clicking on the project and running Code Analysis will give you errors in scripts based on the platform you selected. Same is the case
when you build the project.
Once all the errors are removed, you can publish the project. You also have options to define your publishing preferences, which can then
be saved in a profile to be reused in future deployments.

Figure 8: SSDT Project Deployment

12 | Infosys
Figure 8: SSDT Project Deployment

SQL Compare and Sync


This is a roll-forward of the Visual Studio Database edition power tools, which has been integrated now to the main product. The SQL
compare box provides an out of box functionality to visually compare two different databases (Right click on solution -> Schema Compare).
You also have the option to update the databases with the changes to get them in sync or generate the script to be executed at later point.
Overall, SQL Server Data Tools provides functionalities for app-tier as well as data-tier applications where developers can perform their
development work against any SQL platform from within the visual studio. This tool can also be used by database administrators to script
out objects which can be moved across different servers, store the versions in TFS and also get a utility to compare the changes done by
developers with existing database before applying them on the actual environment.

Figure 9: SSDT Schema Compare

File Table
Most of us are aware of the Filestream object feature introduced in SQL 2008, which integrates the relational database engine with the windows
file system to provide efficient storage and management of data. FileStream was one of major feature introduced in SQL to store unstructured
data and maintain transactional consistency between structured and unstructured data.
File Table is a new feature introduced in SQL Server 2012, which builds on the already existing functionality of FileStream objects. File Table
takes the concept of Filestream object one step ahead and enables the non-transactional access to file table data. This means the file table
data can be accessed through SQL in a transactional way, as well as by the Windows APIs as if it was accessing a file object. File table basically
converts SQL tables into folders which can be accessed through Windows explorer. The directory structure and the file attributes are stored
into Table as columns. Files can be bulk loaded, updated as well as managed in T-SQL like any other column. SQL Server also supports backup
and restore job for this.

stream_id

file_stream

name

.........

NTFS
Folder

Figure 10: File Table

Infosys | 13

File Table is a great step in the direction of maintaining unstructured data in SQL which is currently residing as files in the servers. Enterprises
and other applications can move their file sources into file tables which will enable integration with administration capabilities provided out
of box in SQL Server. At the same time, existing Windows applications which access these files through file system will continue to function as
such through non transactional access to file table data.

File Table Schema


A FileTable represents a hierarchy of directories and files. Path_Locator and parent_path_locator fields are used to representing the files and the
corresponding hierarchy they belong to. The is_directory attribute is used to identify if the row represents files or directory.
File table contains a fixed schema and is currently nor updatable. Every row contains the following items:

Figure 11: File Table [Pre-Defined Schema of File Table]

File table internals


A File Table is a specialized user table which already has a pre-defined and fixed schema. The only column values which File Table lets you
specify are FileTable Directory Name and FileName Collation. Additional objects are automatically created while creating a file table. To get a
list of objects, run this query:
SELECT object_name(object_id) Object Name,object_name(parent_object_id) as File Table
FROM sys.filetable_system_defined_objects

Loading files into file table is as simple as drag and drop files in the file table directory in Windows Explorer. You can also use command line
options such as XCopy, RoboCopy etc. to achieve the same. SQL Server has in-built triggers which intercept the Windows APIs, and automatically
insert/delete entries as and when new files are inserted/deleted into the File Table directory.
Overall, file table looks like a powerful tool which brings conglomeration of Windows APIs with file data stored in SQL server. File table lets
an application integrate its storage components with DB space, and provides integrated SQL Server services which can be used for data
management and data analysis for unstructured data and metadata.

14 | Infosys

Conclusion

References

As always, we will soon witness yet another exciting SQL Server release which has many prominent features which would enable user/developer
community and customer in many ways not only from the database engine and storage perspective but also with respect to the BI and cloud
flavors of its offerings. We hope that we were successful in enabling the readers to get an initial handle on the new capabilities/enhancements
of SQL Server 2012 in this document which is an output of our experiences gained while working with its early bits. We will continue to
contribute to the community about our experiences in the BI space of SQL Server 2012 in another article.

Spatial Search
http://msdn.microsoft.com/en-us/library/ff929187.aspx

SSDT
http://www.jamesserra.com/archive/2011/07/sql-server-%E2%80%9Cdenali%E2%80%9D-sql-server-developer-toolscodename-juneau/
http://msdn.microsoft.com/en-us/data/gg427686

Full Text Search and Semantic search


http://technet.microsoft.com/en-gb/library/cc721269(SQL.100).aspx#_Toc202506226
http://blogs.msdn.com/b/sqlfts/archive/2011/04/12/sql-server-2008-r2-fulltext-search-fix-for-improving-queriesperformance-during-concurrent-index-updates-http-support-microsoft-com-kb-958947.aspx
http://msdn.microsoft.com/en-us/library/hh213079(v=SQL.110).aspx
http://blogs.msdn.com/b/sqlfts/archive/2011/06/02/fulltext-search-improvements-in-sql-server-denali-ctp1.aspx
http://msdn.microsoft.com/en-us/library/ms187787(v=SQL.110).aspx
http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-statistical-semantic-search-in-sql-servercodename-denali-release.aspx
http://msdn.microsoft.com/en-us/library/ms143544.aspx
http://msdn.microsoft.com/en-us/library/gg492075.aspx

File Table
http://www.infosys.com/microsoft/resource-center/Documents/SQLServer-FILESTREAM-BLOBs.pdf
http://msdn.microsoft.com/en-us/library/gg492084(v=SQL.110).aspx
http://ozamora.com/2010/11/denali-the-next-release-sql-server/
http://coolthingoftheday.blogspot.com/2011/07/sql-server-denali-filetables-feature.html
http://lennilobel.wordpress.com/2011/09/11/its-a-file-system-its-a-database-table-its-sql-server-denali-filetable/

Infosys | 15

Acknowledgements
We would like to thank Vinod Kumar, Pinal Dave and Vikram Rajkondawar from Microsoft for taking out time from their busy schedule to review
our paper and providing thoughtful comments which helped us improve the content of this paper. Also we would like to thank our manager
Naveen Kumar for his kind support and providing us with the resources for implementing and trying out the new features of SQL Server.

Vous aimerez peut-être aussi