Académique Documents
Professionnel Documents
Culture Documents
there are three types of data modelling modes and they are
1) tabular model
2) multidimensional model
Dimension tables will have surrogate keys wherein fact tables would have measures
open SSDT and create an analysis services project and provide the path where you
want to create your project.
give a proper name to project and the solution
once the wizard opens, there you can see the below things
data sources - what is impersonation information - this means you want to tell SSAS
engine on how to connect to databases through your data sources.
cube
dimensions
mining structures
assemblies
miscllenius
SQL queries:
What is Collation?
-----------------------------------------------------------------------------------
-------------
Collations in SQL Server provide sorting rules, case, and accent sensitivity
properties for your data. Collations that are used with character data types such
as char and varchar dictate the code page and corresponding characters that can be
represented for that data type. Whether you are installing a new instance of SQL
Server, restoring a database backup, or connecting server to client databases, it
is important that you understand the locale requirements, sorting order, and case
and accent sensitivity of the data that you are working with.
What is Database?
-----------------------------------------------------------------------------------
-------------
1. Hash:
With a hash index, data is accessed through an in-memory hash table. Hash
indexes consume a fixed amount of memory, which is a function of the bucket count.
2. Memory-optimized nonclustered:
For memory-optimized nonclustered indexes, memory consumption is a function
of the row count and the size of the index key columns
3. Clustered:
A clustered index sorts and stores the data rows of the table or view in
order based on the clustered index key. The clustered index is implemented as a B-
tree index structure that supports fast retrieval of the rows, based on their
clustered index key values.
4. Nonclustered:
A nonclustered index can be defined on a table or view with a clustered index
or on a heap. Each index row in the nonclustered index contains the nonclustered
key value and a row locator. This locator points to the data row in the
clustered index or heap having the key value. The rows in the index are stored in
the order of the index key values, but the data rows are not guaranteed to be in
any particular order unless a clustered index is created on the table.
5. Unique:
A unique index ensures that the index key contains no duplicate values and
therefore every row in the table or view is in some way unique.
Columnstore indexes work well for data warehousing workloads that primarily
perform bulk loads and read-only queries. Use the columnstore index to achieve up
to 10x query performance gains over traditional row-oriented storage, and up
to 7x data compression over the uncompressed data size.
10. Spatial:
A spatial index provides the ability to perform certain operations more
efficiently on spatial objects (spatial data) in a column of the geometry data
type. The spatial index reduces the number of objects on which relatively
costly spatial operations need to be applied.
11. XML:
A shredded, and persisted, representation of the XML binary large objects
(BLOBs) in the xml data type column.
12. Fulltext:
A special type of token-based functional index that is built and maintained
by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for
sophisticated word searches in character string data.
The SQL Server Database Engine automatically modifies indexes whenever insert,
update, or delete operations are made to the underlying data. Over time these
modifications can cause the information in the index to become scattered in the
database (fragmented). Fragmentation exists when indexes have pages in which the
logical ordering, based on the key value, does not match the physical ordering
inside the data file. Heavily fragmented indexes can degrade query performance and
cause your application to respond slowly, especially scan operations. You can
remedy index fragmentation by reorganizing or rebuilding an index.
Rebuilding an index drops and re-creates the index. This removes fragmentation,
reclaims disk space by compacting the pages based on the specified or existing fill
factor setting, and reorders the index rows in contiguous pages. When ALL is
specified, all indexes on the table are dropped and rebuilt in a single
transaction.
Reorganizing an index uses minimal system resources. It defragments the leaf level
of clustered and nonclustered indexes on tables and views by physically reordering
the leaf-level pages to match the logical, left to right, order of the leaf nodes.
Reorganizing also compacts the index pages. Compaction is based on the existing
fill factor value.
The fill-factor option is provided for fine-tuning index data storage and
performance. When an index is created or rebuilt, the fill-factor value determines
the percentage of space on each leaf-level page to be filled with data, reserving
the remainder on each page as free space for future growth. For example, specifying
a fill-factor value of 80 means that 20 percent of each leaf-level page will be
left empty, providing space for index expansion as data is added to the underlying
table. The empty space is reserved between the index rows rather than at the end of
the index.
The fill-factor value is a percentage from 1 to 100, and the server-wide default is
0 which means that the leaf-level pages are filled to capacity.
What is Filestream?
-----------------------------------------------------------------------------------
-----------------------------
FILESTREAM is not automatically enabled when you install or upgrade SQL Server. You
must enable FILESTREAM by using SQL Server Configuration Manager and SQL Server
Management Studio. To use FILESTREAM, you must create or modify a database to
contain a special type of filegroup. Then, create or modify a table so that it
contains a varbinary(max) column with the FILESTREAM attribute. After you complete
these tasks, you can use Transact-SQL and Win32 to manage the FILESTREAM data.
What is Sequence?
-----------------------------------------------------------------------------------
-----------------------------
https://docs.microsoft.com/en-us/sql/relational-databases/sequence-
numbers/sequence-numbers?view=sql-server-2017
https://docs.microsoft.com/en-us/sql/relational-databases/spatial/spatial-data-sql-
server?view=sql-server-2017
Spatial data represents information about the physical location and shape of
geometric objects. These objects can be point locations or more complex objects
such as countries, roads, or lakes.
SQL Server supports two spatial data types: the geometry data type and the
geography data type.
The geometry type represents data in a Euclidean (flat) coordinate system.
The geography type represents data in a round-earth coordinate system.
Both data types are implemented as .NET common language runtime (CLR) data types in
SQL Server.
Tables are database objects that contain all the data in a database. In tables,
data is logically organized in a row-and-column format similar to a spreadsheet.
Each row represents a unique record, and each column represents a field in the
record.
2. temporary table
Temporary tables are stored in tempdb. There are two types of temporary tables:
local and global. They differ from each other in their names, their visibility, and
their availability. Local temporary tables have a single number sign (#) as the
first character of their names; they are visible only to the current connection for
the user, and they are deleted when the user disconnects from the instance of SQL
Server. Global temporary tables have two number signs (##) as the first characters
of their names; they are visible to any user after they are created, and they are
deleted when all users referencing the table disconnect from the instance of SQL
Server.
3. System Table:
SQL Server stores the data that defines the configuration of the server and all its
tables in a special set of tables known as system tables. Users cannot directly
query or update the system tables. The information in the system tables is made
available through the system views.
4. Wide Table:
https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables?view=sql-
server-2017
Wide tables use sparse columns to increase the total of columns that a table can
have to 30,000. Sparse columns are ordinary columns that have an optimized storage
for null values. Sparse columns reduce the space requirements for null values at
the cost of more overhead to retrieve nonnull values. A wide table has defined a
column set, which is an untyped XML representation that combines all the sparse
columns of a table into a structured output. The number of indexes and statistics
is also increased to 1,000 and 30,000, respectively. The maximum size of a wide
table row is 8,019 bytes. Therefore, most of the data in any particular row should
be NULL. The maximum number of nonsparse columns plus computed columns in a wide
table remains 1,024.
5. Partitioned table:
https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-
tables-and-indexes?view=sql-server-2017
Partitioned tables are tables whose data is horizontally divided into units which
may be spread across more than one filegroup in a database. Partitioning makes
large tables or indexes more manageable by letting you access or manage subsets of
data quickly and efficiently, while maintaining the integrity of the overall
collection. By default, SQL Server 2017 supports up to 15,000 partitions.
SQL Server supports table and index partitioning. The data of partitioned tables
and indexes is divided into units that can be spread across more than one filegroup
in a database. The data is partitioned horizontally, so that groups of rows are
mapped into individual partitions. All partitions of a single index or table must
reside in the same database. The table or index is treated as a single logical
entity when queries or updates are performed on the data.
Benefits of partition:
Partitioning large tables or indexes can have the following manageability and
performance benefits.
You can transfer or access subsets of data quickly and efficiently, while
maintaining the integrity of a data collection. For example, an operation such as
loading data from an OLTP to an OLAP system takes only seconds, instead of the
minutes and hours the operation takes when the data is not partitioned.
You can perform maintenance operations on one or more partitions more quickly. The
operations are more efficient because they target only these data subsets, instead
of the whole table. For example, you can choose to compress data in one or more
partitions or rebuild one or more partitions of an index.
You may improve query performance, based on the types of queries you frequently run
and on your hardware configuration. For example, the query optimizer can process
equi-join queries between two or more partitioned tables faster when the
partitioning columns in the tables are the same, because the partitions themselves
can be joined.
When SQL Server performs data sorting for I/O operations, it sorts the data first
by partition. SQL Server accesses one drive at a time, and this might reduce
performance. To improve data sorting performance, stripe the data files of your
partitions across more than one disk by setting up a RAID. In this way, although
SQL Server still sorts data by partition, it can access all the drives of each
partition at the same time.
In addition, you can improve performance by enabling lock escalation at the
partition level instead of a whole table. This can reduce lock contention on the
table. To reduce lock contention by allowing lock escalation to the partition, set
the LOCK_ESCALATION option of the ALTER TABLE statement to AUTO.
Components of partition:
Partition function:
A database object that defines how the rows of a table or index are mapped to a set
of partitions based on the values of certain column, called a partitioning column.
That is, the partition function defines the number of partitions that the table
will have and how the boundaries of the partitions are defined. For example, given
a table that contains sales order data, you may want to partition the table into
twelve (monthly) partitions based on a datetime column such as a sales date.
Partition scheme:
Partitioning column
The column of a table or index that a partition function uses to partition the
table or index. Computed columns that participate in a partition function must be
explicitly marked PERSISTED. All data types that are valid for use as index columns
can be used as a partitioning column, except timestamp. The ntext, text, image,
xml, varchar(max), nvarchar(max), or varbinary(max) data types cannot be specified.
Also, Microsoft .NET Framework common language runtime (CLR) user-defined type and
alias data type columns cannot be specified.
Aligned index
An index that is built on the same partition scheme as its corresponding table.
When a table and its indexes are in alignment, SQL Server can switch partitions
quickly and efficiently while maintaining the partition structure of both the table
and its indexes. An index does not have to participate in the same named partition
function to be aligned with its base table. However, the partition function of the
index and the base table must be essentially the same, in that:
The arguments of the partition functions have the same data type.
They define the same number of partitions.
They define the same boundary values for partitions.
Partitioning Clustered Indexes
When partitioning a clustered index, the clustering key must contain the
partitioning column. When partitioning a nonunique clustered index, and the
partitioning column is not explicitly specified in the clustering key, SQL Server
adds the partitioning column by default to the list of clustered index keys. If the
clustered index is unique, you must explicitly specify that the clustered index key
contain the partitioning column.
Partitioning NonClustered Indexes
When partitioning a unique nonclustered index, the index key must contain the
partitioning column. When partitioning a nonunique, nonclustered index, SQL Server
adds the partitioning column by default as a nonkey (included) column of the index
to make sure the index is aligned with the base table. SQL Server does not add the
partitioning column to the index if it is already present in the index.
Non-aligned index
An index partitioned independently from its corresponding table. That is, the index
has a different partition scheme or is placed on a separate filegroup from the base
table. Designing an non-aligned partitioned index can be useful in the following
cases:
The base table has not been partitioned.
The index key is unique and it does not contain the partitioning column of the
table.
You want the base table to participate in collocated joins with more tables using
different join columns.
Partition elimination
The process by which the query optimizer accesses only the relevant partitions to
satisfy the filter criteria of the query.
https://docs.microsoft.com/en-us/sql/relational-databases/tables/use-sparse-
columns?view=sql-server-2017
Sparse columns are ordinary columns that have an optimized storage for null values.
Sparse columns reduce the space requirements for null values at the cost of more
overhead to retrieve nonnull values. Consider using sparse columns when the space
saved is at least 20 percent to 40 percent. Sparse columns and column sets are
defined by using the CREATE TABLE or ALTER TABLE statements.
Sparse columns can be used with column sets and filtered indexes.
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/create-filtered-
indexes?view=sql-server-2017
https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/track-data-
changes-sql-server?view=sql-server-2017
SQL Server 2017 provides two features that track changes to data in a database:
change data capture and change tracking. These features enable applications to
determine the DML changes (insert, update, and delete operations) that were made to
user tables in a database. Change data capture and change tracking can be enabled
on the same database; no special considerations are required.
What is the difference between change data capture and change tracking?
-----------------------------------------------------------------------------------
---------------------
The following table lists the feature differences between change data capture and
change tracking. The tracking mechanism in change data capture involves an
asynchronous capture of changes from the transaction log so that changes are
available after the DML operation. In change tracking, the tracking mechanism
involves synchronous tracking of changes in line with DML operations so that change
information is available immediately.
Feature Change data capture Change tracking
Tracked changes
DML changes Yes No
Tracked information
Historical data Yes No
Whether column was changed Yes Yes
DML type Yes Yes