Vous êtes sur la page 1sur 28

Q: Explain inner and outer SQL joins?

A: Joins allow database users to combine data from one table with data from one or more
other tables (or views, or synonyms). Tables are joined two at a time making a new table
containing all possible combinations of rows from the original two tables.

Inner joins: Chooses the join criteria using any column names that happen to match
between the two tables. The example below displays only the employees who are
executives as well.

?
1 SELECT emp.firstname, exec.surname
?
1 FROM employees emp, executives exec
?
1 WHERE emp.id = exec.id;

Left Outer joins: A problem with the inner join is that only rows that match between
tables are returned. The example below will show all the employees and fill the null data
for the executives.

?
1 SELECT emp.firstname, exec.surname
?
1 FROM employees emp left join executives exec ON emp.id = exec.id;

Right Outer join: A problem with the inner join is that only rows that match between
tables are returned. The example below will show all the executives and fill the null data
for the employees.

?
1 SELECT emp.firstname, exec.surname
?
1 FROM employees emp right join executives exec ON emp.id = exec.id;
Full outer join: To cause SQL to create both sides of the join

?
1 SELECT emp.firstname, exec.surname
?
1 FROM employees emp full join executives exec ON emp.id = exec.id;

Self join: A self-join is a join of a table to itself. If you want to find out all the employees
who live in the same city as employees whose first name starts with “Peter”, then one
way is to use a sub-query as shown below:

?
1 SELECT emp.firstname, emp.surname
?
1 FROM employees emp
?
1 WHERE city IN (SELECT city FROM employees where firstname like ‘Peter’)

Q. Explain a sub-query? How does a sub-query impact on performance?


A. It is possible to embed a SQL statement within another. When this is done on the
WHERE or the HAVING statements, we have a subquery construct.

Q. What is subquery useful for?


A. It is used to join tables and there are cases where the only way to correlate two tables
is through a subquery.

?
1 SELECT emp.firstname, emp.surname
?
1 FROM employees emp
?
1 WHERE emp.id NOT IN (SELECT id FROM executives);

There are performance problems with sub-queries, which may return NULL values. The
above sub-query can be re-written as shown below by invoking a correlated sub-query.

?
1 SELECT emp.firstname, emp.surname
?
1 FROM employees emp
?
1 WHERE emp.id NOT EXISTS (SELECT id FROM executives);
The above query can be re-written as an outer join for a faster performance as shown
below:

?
1 SELECT emp.firstname, exec.surname
?
FROM employees emp left join executives exec on emp.id = exec.id AND
1 exec.id IS NULL;

The above execution plan will be faster by eliminating the need for a sub-query.

Q. Can you give SQL examples for the following scenarios?


A.

Scenario 1: Retrieve first name and sum of order qty for order sum greater than 25, and
group the order sum by first name.?

?
1 SELECT FIRSTNAME,SUM(QTY)
?
1 FROM orders
?
1 GROUP BY FIRSTNAME
?
1 HAVING SUM(QTY)>25;

Scenario 2: Retrieve all employees whose name has a String "au"?

?
1 SELECT *
?
1 FROM employees emp
?
1 WHERE emp.firstname LIKE ‘%au%’;

Scenario 3: select account number and adviser code for a given adviser code, but restrict
the returned values to supplied min and max limit. For example, record 1 to record 10,
record 11 to record 20, etc.

The SQL for the above scenario needs to use some custom SQL parameters and
functions. The example below uses the ROWNUM variable that keeps track of the row
numbers in Oracle. The nested query shown below can limit the returned results based
on a lower and upper limit.
?
1 select * from
(select a.ACCOUNT_NO, a.ADVISER_CODE, ROWNUM rnum from
2 ( Select * from accounts where ADVISER_CODE=:advCode order by
3 advCode) a
?
1
where ROWNUM <= :max_row)
2 where rnum >= :min_row
3

Q: In your experience, what are some of the common mistakes developers make?
A:

1. Cartesian joins

SQL Joins are used to relate information in different tables. A Join condition is a part of
the sql query that retrieves rows from two or more tables. A SQL Join condition is used in
the SQL WHERE Clause of select, update, delete statements.

The Syntax for joining two tables is:

?
1 SELECT col1, col2, col3
2 FROM table_name1, table_name2
3 WHERE table_name1.col2 = table_name2.col1;

If a sql join condition is omitted as shown below

?
1 SELECT col1, col2, col3
2 FROM table_name1, table_name2

or if the condition is invalid, then the join operation will result in a Cartesian product.
The Cartesian product returns a number of rows equal to the product of all rows in all the
tables being joined. For example, if the first table has 20 rows and the second table has 10
rows, the result will be 20 * 10, or 200 rows. This query will take a long time to execute.

2. Use of SELECT *

For example, a common misuse of SELECT * is to extract a set of all employees and to
insert them into another table called Contractors with the same structure

?
1 INSERT INTO Contractors
2 SELECT * FROM Employees WHERE emp_type = 'C';
The above query does the job, however, one day business requirements change and two
new columns are added to the Employees table:

?
1 ALTER TABLE Products
2 ADD effective_start_date DATETIME, effective_end_date DATETIME;

All of sudden the query that extracts from the Employees table and insert records into the
Contractor table results in error.

"Insert Error: Column name or number of supplied values does not match table
definition."

The fix is to explicitly list the column names in the query:

?
1 INSERT INTO Contractors (emp_id, emp_name)
2 SELECT emp_id, emp_name FROM Employees WHERE emp_type = 'C';

3. Embedding User Interface (UI) layer logic into Data layer via SQL. For example

SELECT '<a href="http://www.blogger.com/...">' + name ' </a>'

The above code is a bad practice because it tightly couples your UI Layer with the Data
Layer.

4. Not using Prepared statements. Prepared statements are more secured and efficient
than the ordinary statements. Prepared statements prevent SQL injection attacks.

5. Using the predicate "LIKE" in indexed columns. The "LIKE" predicate typically
performs a search without the normal performance benefit of indexes. Using '=', '<>', etc
instead of "LIKE" will increase performance. Also should be aware of that case
sensitivity (e.g., 'A' versus 'a') may be different based upon database Server or
configuration.

6. Over use of cursors in stored procedures. If possible, avoid using SQL stored proc
cursors. They generally use a lot of Server resources and reduce the performance and
scalability of your applications. If you need to perform row-by-row operations, try to find
another method to perform the task.

Here are some alternatives to using a cursor:


 Use WHILE LOOPS
 Use temp tables
 Use materialized views (allowing you to pre-join complex views and pre-
compute summaries for super-fast response time. )
 Use derived tables
 Perform multiple queries
 Use correlated sub-queries
 Use the CASE statement

Q. Can you give some database performance tuning tips based on your experience?
A.

1. Materialized views are one of the important SQL tuning tools in Oracle. Instead of
the entire company accessing a single database server, user load can be distributed across
multiple database servers with the help of materialized views in Oracle. Through the use
of multi tier materialized views, you can create materialized views based on other
materialized views, which enables you to distribute user load to an even greater extent
because clients can access materialized view sites instead of master sites. To decrease the
amount of data that is replicated, a materialized view can be a subset of a master table or
master materialized view.

 Materialized views are schema objects that can be used to summarize,


precompute, replicate, and distribute data.
 It allows you to pre-join complex views and pre-compute summaries for super-
fast response times. Unlike an ordinary view, which does not take up any storage
space or contain any data, a materialized view provides indirect access to table
data by storing the results of a query in a separate schema object.
 You can define a materialized view on a base table, partitioned table or view and
you can define indexes on a materialized view.
 A materialized view can be stored in the same database as its base tables
(improves query performance through query rewrite) or in a different database.

It is also worth noting that this capability may not suit best for too frequent activities as in
online transaction processing (i.e. OLTP) environments. In other databases equivalent
functionalities can be achieved through triggers on base tables that would update/insert
aggregate or dimension tables and queries can be executed against these aggregated or
dimension tables as oppose to the base tables.

2. As a rule of thumb, every table should have a clustered index. Generally, but not
always, the clustered index should be on a column that increases in one direction (i.e.
monotonic) such as an identity column, or some other column where the value is
increasing and is unique. In many cases, the primary key is the ideal column for a
clustered index. Create an index on any column that is a foreign key. If you know it will
be unique, set the flag to force the index to be unique.
3. Avoid temp tables as much as you can, but if you need a temp table, create it explicitly
using Create Table #temp.

4. Only return the columns and the rows you need.

5. Avoid full table scan where possible. The full table scan can be caused by

 No WHERE condition.
 No index on any type of field in the WHERE clause.
 NOT IN predicate that is easier to write (replace NOT IN with a left outer join).
 WHERE clauses like column_name is not null, condition 1 or condition 2,
column_name between ... and ..., not equality comparisons
 Use of SQL “LIKE clause” operator to find a string within a large table column
(e.g. VARCHAR(2000), CLOB, BLOB).
 DISTINCT, ANY, and ALL.

SQL interview questions and answers - scenarios based

Q. How will you go about identifying duplicate records in a table


A. The following SQL query will do the trick

?
1 SELECT code, user_name, COUNT(user_name) AS NumOccurrences
2 FROM tbl_user
3 GROUP BY code, user_name
4 HAVING ( COUNT(user_name) > 1 )

Q. How would you go about deleting the duplicate records?


A. You could do it in a number of steps as shown below.
 Create a temporary table.
 Insert the unique records into the temporary table.
 Delete the records from the original table.
 Insert the saved single records from the temporary table back to the original table.
Q. How will you go about searching for table and column names that you don't know
where they really are? For example, search for a column name to find out in which tables
they do exist.
A. You need to query the database system tables. For example, in Sybase, you can query
it as shown below.

?
1 <span class="IL_AD" id="IL_AD6">select a</span>.name, b.name
2 from sysobjects a, syscolumns b
3 where a.id = b.id
4 and b.name like '%split_income%'

Q. How will you go about writing an SQL query for the following scenario?

Valuation table with the following columns portfolioid, accountid, balance,


inactiveflag, valuationdttm, and typecd. The portfolio table has columns
portfolioid, and portfoliocd. The account table has columns accountid and
accountcd.

Write an SQL query to extract out the accountcd and the corresponding balance for a
given portfoliocd and valuationdttm. Please note that there will be multiple balance
records for each account, and your query must only extract out a single balance record
per account based on the rule 'extract the record with minimum value for typecd'.

A. As you can see in the sample answer below, inner joins are used to join with the
relevant tables. A sub query is used to calculate the min(typecd) to extract the record
with minimum value for typecd.

?
1 select acc.accountcd, val.balance
2 from valuation val
inner join portfolio pf on pf.portfolioid = val.portfolioid
3 inner join account acc on acc.accountid = val.accountid
4 where pf.portfoliocd = 'AR30'
5 and val.valuationdttm = '28 Dec 2012'
6 and val.inactiveflag = 'N'
and acc.inactiveflag = 'N'
7
and val.typecd = (select min(val2.typecd) from valuation val2 where
8 val2.valuationdttm = val.valuationdttm and val2.inactiveflag = 'N' and
9 val2.accountid = val.accountid group by accountid)
10 order by acc.accountcd
Q. If you need to map actual values retrieved from the database to some other value and
then sort by these translated values as well, how will you go about accomplishing this in
your SQL code?

For example, StatusCd is the column in the Portfolio table, and it can have the values
of New, and Processed. But the SQL query should return a status of 'Excluded' if the
ExcludedFlag column is set yes, and 'Sent' if the SentDateTime is not null. iIf none of
the above conditions are met, then return the StatusCd as in the database. The sorting
needs to be carried out in the order of 'New', 'Processed', 'Sent', and then 'Excluded'.

A. This can be achieved with a switch/case statement. The syntax of switch/case


statement can vary among databases. Here is a sample SQL based on Sybase database
server.

?
SELECT PortfolioCd, SentDateTime, ExcludedFlag, StatusCd as
1 ActualStatusCd,
2 case when p.ExcludedFlag = 'Y' then 'Excluded'
3
4 else case when p.SentDateTime is null then
p.StatusCd
5 else 'Sent'
6 end
7
8 end as EvaluatedStatusCd
9
10 FROM Portfolio p WHERE valuationdttm > '09 Jan 2013' and InActiveFlag
11 = 'N'
12
13 ORDER BY case when p.ExcludedFlag = 'Y' then '4'
14 else case when p.SentDateTime is not null then '3'
else case when p.StatusCd = 'New' then '1'
15 when p.StatusCd = 'Processed'
16 then '2'
17 end
18 end
19 end,
20
PortfolioCd
21

Q. How would you retrieve a date time column converted to string and formatted as
dd/mm/yy hh:mm:ss
A. You can use specif functions provided by your database server. These functions are
specific to the database server you are using, hence your code cannot be ported to other
database servers. Here is an example in Sybase.
?
1 SELECT PortfolioCd,
convert(char(11), p.SentDateTime, 103) + convert(char(12),
2 p.SentDateTime, 108) as SentDateTime
3 FROM Portfolio p
4 WHERE valuationdttm > '09 Jan 2013' and InActiveFlag = 'N'

In the above example, the convert function is used to convert the date time field to char.
The 103 in Sybase means dd/mm/yy format and and 108 to convert to the time format
hh:mm:ss.

Q. How will you go about tuning your SQL and stored procedures?
A. You can use tools like DB Artisan, TOAD, etc to analyze the query plan. The code (in
Sybase) below gives you the elapsed time.

Q. How will you go about tuning your SQL and stored procedures?
A. You can use tools like DB Artisan, TOAD, etc to analyze the query plan. The code
below gives you the elapsed time.
?
1 DECLARE @start datetime, @stop datetime
2 SET @start = GETDATE()
3
4 exec MY_PROC 'AC345', '02 Jan 2013', null, 'N'
5
6 SET @stop = GETDATE()
select datediff(ms, @start, @stop)
7
Proper indexing is key to get good performancee out of your SQL queries.

Q. What are all the different types of indexes?


A. There are three types of indexes

Unique Index: does not allow the field to have duplicate values if the column is unique
indexed. Unique index can be applied automatically when primary key is defined.

Clustered Index: reorders the physical order of the table and search based on the key
values. Each table can have only one clustered index.

NonClustered Index: does not alter the physical order of the table and maintains logical
order of data. Each table can have 999 non-clustered indexes.

SQL Interview Questions and Answers on deleting records


The following is a very popular SQL job interview question.

Q. What is the difference between "Truncate" and "Delete" commands?


A.
 TRUNCATE TABLE_NAME always locks the table and page but not each row,
whereas DELETE statement is executed using a row lock, each row in the table is
locked for deletion.
 Truncate removes all the records in the table whereas delete can be used with
WHERE clause to remove records conditionally. That is remove only a handful
number of records.
 Truncate performance is much faster than Delete, as its logging is minimal wheres
the Delete command logs every record.
 Truncate does not retain the identity, whereas DELETE command retains the
identity. When you use Truncate, If the table contains an identity column, the
counter for that column is reset to the seed value that is defined for the column.
 Truncate cleans up the object statistics and clears the allocated space whereas
Delete retains the object statistics and allocated space.
 TRUNCATE is a DDL (Data Definition Language) and DELETE is a DML (Data
Manipulation Language).
 Data removed by TRUNCATE command cannot be generally rolled back unless
the database server specifically supports it. The DELETE command can rollback
a transaction.
 The TRUNCATE command does not fire any triggers, whereas the DELETE
command fires any triggers defined on the table. For example, to keep an audit
trail of records that have been deleted by inserting the deleted records into an
audit table via the DELETE triggers.

Q. When will you use a truncate command?


A.TRUNCATE is useful for purging a table with huge amount of data. Alternatively, you
can drop the table and recreate it that makes sense. Firing a delete command instead of a
truncate command to empty a table with millions of records can result in locking the
whole table and also can take longer time to complete, and at times cause the machine to
hang.

The truncate command is executed as shown below.

?
1 TRUNCATE TABLE table_name

Q. Which command will you use to periodically purge data from your tables as part of a
house keeping job?
A. Use a DELETE command within a transaction with a WHERE clause to remove data
that are older than 7 years. Remove large amount of data in batches as opposed to in a
single transaction.
Q. How will you delete a few records from single table
A.

?
1 DELETE FROM parent p WHERE p.parent_name = 'Peter'

Q. How will you delete a few records from parent and child tables where the parent table
with parent_name = 'Peter'?
A.

Firstly, you need to delete the child records because the integrity constraint won't let you
delete the parent record when there are child records.

?
1 DELETE child
2
3 FROM parent p, child c
4
5 WHERE p.parent_id = c.parent_id
6
7 AND p.parent_name = 'Peter'

Now, the parent table can be deleted as shown below

?
1 DELETE FROM parent p WHERE p.parent_name = 'Peter'

Note: Please note the difference in syntax when you make a join with the child. When
there is only a single table involved, it is "DELETE FROM table_name", but when there
is a join, it is "DELETE table_name" and then the "FROM" with the join clauses.

Q. What do you do with the PURGE command?


A. The purge command is used to clear the recycle bin. It is generally used with the
DROP command. For example,

?
1 drop table tablename purge;

the above command will clear away the table from database as well as from the
recycle bin. After executing the purge command, you cannot retrieve the table
using a flashback query.

QL Subquery interview questions and answers


Here are some beginner to intermediate level SQL interview questions and answers. This
post compliments my other posts.
 SQL Interview Questions and Answers
 SQL interview questions and answers - scenarios based
 SQL Interview Questions and Answers on deleting records
 SQL Interview Questions and Answers: storing a tree structure in a
database table
 Excel spreadsheet to generate SQL
 SQL Tutorial with HSQLDB

Q. What is a subquery?
A. Subquery or Inner query or Nested query is a query in a query. A subquery is usually
added in the WHERE clause of the sql statement. A subquery can be nested inside a
SELECT, INSERT, UPDATE, or DELETE statement or inside another subquery.
Subqueries are an alternate way of returning data from multiple tables.

Q. Can you create a subquery in a From clause?


A. Yes. Subqueries can be used in From, Where and Having clauses. For example, in
Sybase

?
1 select *
2 from
3(
4 select 'A' as colVal
5 union
select 'B' as colVal
6 ) data
7
Returns:

?
1 colVal
2 ------
3A
4B

Joining virtual tables is one of the most powerful feature of subqueries. Virtual in this
context means the result set you are joining is built on the fly. Here is a more advanced
example:

?
1 declare @clientId varchar(30),
@reportDate date,
2
3
4
5
6 set nocount on
7
8 select reportId from
Report_Status s,
9
ReportKey k,
10 ReportGroupKey gk,
11
12 --subquery in from clause
13 (select max(s.createddttm) as maxdate, k1.clientId from
14 Report_Status s,
ReportKey k1,
15 ReportGroupKey gk
16
where k1.InactiveFlag ='N'
17
and gk.InactiveFlag ='N'
18 and gk.KeyId = k1.Id
19 and gk.Id = s.GroupKeyId
20 group by k1.clientId
21 ) maxdates
22
23 where k.InactiveFlag ='N'
24 and gk.InactiveFlag ='N'
25 and gk.KeyId = k.Id
and gk.Id = s.GroupKeyId
26 and s.CreatedDtTm = maxdates.maxdate
27 and k.ClientId = @clientId
28 and maxdates.ClientId = k.ClientId
29 and k.reportDate = @reportDate
30
31

Q.What is a correlated subquery?


A. A query is called correlated subquery when both the inner query and the outer query
are interdependent. For every row processed by the inner query, the outer query is
processed as well. The inner query depends on the outer query before it can be processed.

?
1 SELECT outer.product_name FROM product outer
WHERE outer.product_id = (SELECT inner.product_id FROM order_items
2 inner
3 WHERE outer.product_id = inner.product_id);

If a subquery is not dependent on the outer query it is called a non-correlated subquery.


Q. What are the advantages and disadvantages of using a subquery?
A.

Advantages:

 Subqueries allow you to use the results of another query in the outer query.
 Subqueries in some complex SQL queries can simplify coding and improve
maintainability by breaking down the complex query into a series of logical steps.
 In some cases, subqueries are easier to understand than complex joins and unions.

Disadvantages:

 When a subquery is used, the query optimizer of the database server may have to
perform additional steps like sorting the results, etc. Hence, in some cases
subqueries can be less efficient than using joins. So, favor joins to subqueries.

SQL Interview Questions and Answers: storing a tree structure in a


database table

Q. How will you represent a hierarchical structure shown below in a relational database?
or How will you store a tree data structure into DB tables?
A.The hierarchical data is an example of the composite design pattern. The entity
relationship diagrams (aka ER diagram) are used to represent logical and physical
relationships between the database tables. The diagram below shows how the table can be
designed to store tree data by maintaining the adjacency information via
superior_emp_id.

as you can see the "superior_emp_id" is a foreign key that points to the emp_id in the
same table. So, Peter has null as he has no superiors. John and Amanda points to Peter
who is their manager or superior and so on.

The above table can be created using SQL DDL (Data Definition Language) as shown
below.

?
1 CREATE TABLE employee (
2
emp_id NUMBER (4) CONSTRAINT emp_pk PRIMARY KEY,
3 emp_name VARCHAR2 (40) NOT NULL,
4 title VARCHAR2 (40),
5 dept_id NUMBER (2) NOT NULL,
6 superior_emp_id NUMBER (4) CONSTRAINT emp_fk REFERENCES
7 employee(emp_id)
8
9 CONSTRAINT emp_pk
10 PRIMARY KEY NONCLUSTERED (emp_id)
11
12 )

This can be represented as an object model to map relational data as shown below

?
1
public class Employee {
2
3 private Long id;
4 private String name;
5 private String title;
6 private Employee superior;
private Set subordinates;
7
8
//getters and setters are omitted
9
10 }
11

Q. How will you find out the superior for an emplyee?


A. You can use a self-join to find the manager of an employee

?
1 Select e.emp_id,e.emp_name, title from
2 employee e, employee s
3 where e.superior_emp_id = s.employee_id
4 and e.emp_id = 3

This should return

?
1 1, Peter, cio

Q. Is there any other way to to store tree structure in a relational database?


A. Yes, it can be done using the "modified preorder tree traversal" as described below.
As shown in the diagram above, each node is marked with a left and right numbers using
a modified preorder traversal as shown above. This can be represented in a database
table as shown below.

As you can see the numbers indicate the relationship between each node. All left values
greater than 6 and right values less than 11 are descendants of 6-11 (i.e Id: 3 Amanda).
Now if you want to extract out the 2-6 sub-tree for Amanda you can write the SQL as
follows

?
SELECT * FROM employee WHERE left_val BETWEEN 6 and 11 ORDER BY
1 left_val ASC;

Which will return Amanda, Ralph, and Jeanne.

If you want to get ancestors to a given node say 7-8 Ralph, you can write the SQL as
follows

?
SELECT * FROM employee WHERE left_val < 7 and right_val > 8 WHERE ORDER
1 BY left_val ASC;

Which will return: Peter and Amanda.

If you want to find out the number of descendants for a node, all you need is the left_val
and right_val of the node for which you want to find the descendants count. The formula
is

No. of descendants = (right_val - left_val -1) /2

So, for 6 -11 Amanda, (11 - 6 - 1) /2 = 2 descendants


for 1-12 Peter, (12 - 1 -1 ) / 2 = 5 descendants.
for 3-4 Mary, (4 -3 - 1) / 2 = 0, means it is a child and has no descendants.

The modified preorder traversal is a little more complicated to understand, but is very
useful.

Excel spreadsheet to generate SQL

When you have some data in tabular (e.g. Excel spreadsheet) format and would like to
insert into a database table, you need to write an SQL insert query. Manually writing SQL
query for multiple records can be cumbersome. This is where Excel spreadsheet comes in
handy as demonstrated below. A single SQL query can be copied down where the
formulas get copied with incrementing column numbers.

The Excel concatenate character & is used to achieve this. The $ means fix. $a1 means
fix excel column A. When you copy the formula, the row numbers will be incremented
like 2,3,4, etc, but the column will remain fixed to A. In the example below
 $A$1 = first_name
 $B$1 = surname
 $C$1 = age
Note: Both column and row are prefixed with $, which means both are fixed.

The Excel formula is

?
="insert into person ("&$A$1&", "&$B$1&", "&$C$1&") values
1 ('"&$A2&"','"&$B2&"',"&$C2&")"
The above Excel expression is easier to understand if broken down as shown below
where the concatenation character & plays a major role in combining static text within
quotes with dynamic formulas like $A$1.

?
1 "insert into person ("
2
3 &
4
$A$1
5
6 &
7
8 ", "
9
10 &
11
12 $B$1
13
&
14
15 ", "
16
17 $C$1
18
19 &
20
21 ") values ('"
22
23 &
24
25
26
27 $A2
28
29 &
30 "','"
31
32 &
33
34 $B2
35
36 &
37
38 "',"
39
&
40
41 $C2
42
43 &
44
45 ")"
46
47

The generated SQL statement will be


?
insert into person (first_name, surname, age) values
1 ('Peter','Smith',35)

This SQL can be copied down in Excel to get SQL for all rows

?
insert into person (first_name, surname, age) values
1 ('Peter','Smith',35)
insert into person (first_name, surname, age) values ('John','Smith
2 ',12)
3 insert into person (first_name, surname, age) values
('Eddy','Wayne',32)

You can create other SQL statements using the above technique from a table of data for
better productivity.

If you have date column, the use the following formula to convert it to a text.
?
1 TEXT($D2,"dd/mm/yyyy")

Database interview questions and answers

Q. What do you understand by the terms clustered index and non-clustered index?
A. When you create a clustered index on a table, all the rows in the table are stored in the
order of the clustered index key. So, there can be only one clustered index per table. Non-
clustered indexes have their own storage space separate from the table data storage.
Clustered and non-clustered indexes are stored as binary search tree (i.e. keep data sorted
and has the average performance of O(log n) for delete, inserts, and search) structures
with the leaf level nodes having the index key and it's row locator for a faster retrieval.

Q. What is the difference between primary key and unique key?


A. Both primary key and unique key enforce uniqueness of the column on which they are
defined. But by default, a primary key creates a clustered index on the column, whereas a
unique key creates a non clustered index by default. Another major difference is that, a
primary key doesn't allow NULL values, but unique key allows a single NULL.

Q. What are the pros and cons of an index?


A.

PROS

 If an index does not exist on a table, a table scan must be performed for each table
referenced in a database query. The larger the table, the longer a table scan takes
because a table scan requires each table row to be accessed sequentially. So,
indexes can improve search performance, especially for the reporting
requirements.

CONS

 Excessive non-clustered indexes can consume additional storage space.


 Excessive non-clustered indexes can adversely impact performance of the
INSERT, UPDATE, and DELETE statements as the indexes need to recreated
after each of the above operation.

So, it is essential to have a right balance based on the usage pattern.


Q. What are the pros and cons of stored procedures?
A.

PROS

 pre-compiled and less network trips for faster performance


 less susceptible to SQL injection attacks
 more precise control over transactions and locking
 can abstract complex data processing from application by acting as a facade
layer.

CONS

 There are chances of larger chunks of business logic and duplications creeping
into stored procedures and causing maintenance issues. Writing and maintaining
stored procedures is most often a specialized skill set that not all developers
possess. This situation may introduce bottlenecks in the project development
schedule.
 Less portable.The stored procedures are specific to a particular database.
 Scaling a database is much harder than scaling an application.
 The application performance can be improved by caching the relevant data to
reduce the network trips.

So, when should stored procedures be used ?

Stored procedures are ideal when there is a complex piece of business logic that needs
complex data logic to be performed involving a lot of database operations. If this logic is
required in many different places, then store procedure makes even more sense. For
example, batch jobs and complex report generation that performs lots of database
operations.

So, when shouldn't stored procedures be used ?

When you are performing basic CRUD (Create, Read, Update, and Delete) operations.
For example, in a Web application a user creates some data, read the created data, and
then updates or deletes some of the created data.

Q. How would you go about writing a stored procedure that needs to loop through a
number of selected rows?
A. You need to use a cursor. A cursor is basically a pointer to row by operation. For
example, you can create a cursor by selecting a number of records into it. Then, you can
fetch each row at a time and perform some operations like invoking another stored proc
by passing the selected row value as an argument, etc. Once uou have looped through all
the records, you need to close and deallocate the cursor. For example, the stored
procedure below written in Sybase demonstrates the use of a cursor.

Apply to the database "mydatabase"

?
1 use mydatabase
2 go

Drop the stored procedure if it already exists

?
1
2 IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
BEGIN
3
4 DROP PROCEDURE dbo.temp_sp
5 IF OBJECT_ID('dbo.temp_sp') IS NOT NULL
6 PRINT '<<< FAILED DROPPING PROCEDURE dbo.temp_sp >>>'
7 ELSE
PRINT '<<< DROPPED PROCEDURE dbo.temp_sp >>>'
8 END
9 go
10

Create the stored procedure that uses cursor

?
1 create proc temp_sp
2
3 as
DECLARE @ADVISERID char(10)
4 DECLARE advisers_cur cursor
5 for select adviser_id FROM tbl_advisers where adviser_id like
6 'Z%' -- select adviser_ids starting with 'Z'
7 for read only
8
9
10 open advisers_cur -- open the cursor
FETCH advisers_cur INTO @ADVISERID -- store value(s) from the
11
cursor into declared variables
12
13 --@@sqlstatus is a sybase implcit variable that returns
14 success/failure status of previous statement execution
15 WHILE (@@sqlstatus = 0)
BEGIN
16 SELECT @ADVISERID -- select the adviser_id
17 stored into @ADVISERID
FETCH advisers_cur INTO @ADVISERID --store value(s) from the
18 cursor into declared variables
19 END
20
21 close advisers_cur
22 deallocate cursor advisers_cur
23
go

Execute the stored procedure that uses a cursor

?
1 exec mydatabase..temp_sp

Q. Why should you deallocate the cursors?


A. You need deallocate the cursor to clear the memory space occupied by the cursor. This
will enable the cleared space to be availble for other use.

Q. How would you go about copying bulk data in and out of a database?
A. The process is known as bulk copy, and the tools used for this are database specific.
For example, in Sybase and SQLServer use a utility called "bcp", which allows you to
export bulk data into comma delimited files, and then import the data in csv or any other
delimited formats back into different database or table. In Oracle database, you achieve
this via the SQLLoader. The DB2 database has IMPORT and LOAD command to achieve
the same.

Q. What are triggers? what are the different types of triggers?


A. Triggers are stored procedures that are stored in the database and implicitly run, or
fired, when something like INSERT, UPDATE , or DELETE happens to that table. There
are 3 types of DML triggers that happens before or after events like INSERT, UPDATE,
or DELETE. There could be other database specific triggers.

Q. When to not use a trigger, and when is it appropriate to use a trigger?


A.

When to not use a trigger?

The database triggers need to be used very judiciously as they are executed every time an
event like insert, update or delete occur. Don't use a trigger where
 database constraints like unique constraint, not null, primary key, check
constraints, etc can be used to check for data validity.
 triggers are recursive.

Where to use a trigger?

 Maintaining complex integrity constraints (referential integrity) or business rules


where other types of constraints cannot be used. Because triggers are executed as
part of the SQL statement (and its containing transaction) causing the row change
event, and because the trigger code has direct access to the changed row, you
could in theory use them to correct or reject invalid data.
 Auditing information in a table by recording the changes. Some tables are
required to be audited as part of the non-functional requirement for changes.
 Automatically signaling other programs that action needs to take place when
changes are made to a table.
 Collecting and maintaining aggregate or statistical data.

Q. If one of your goals is to reduce network loads, how will you about achieving it?
A.
 you can use materialized views to distribute your load from a master site to other
regional sites. Instead of the entire company accessing a single database server,
user load is distributed across multiple database servers with the help of multi-tier
materialized views. This enables you to distribute the load to materialized view
sites instead of master sites. To decrease the amount of data that is replicated, a
materialized view can be a subset of a master table or master materialized view.

 Write stored procedures to minimize network round trips.

 Carefully crafting your SQL to return only required data. For example Don't do
select * from tbl_mytable. Instead, specify the columns you are interested in. For
example, select firstname, surname from tbl_mytable.

 You can set the fetch size to an appropriate value to get the right balance between
data size and number of network trips made.

Q. What are the other uses of materialized views?


A.
 Materialized view is one of the key SQL tuning approaches to improve
performance by allowing you to pre-join complex views and pre-compute
summaries for super-fast response time.
 Materialized views are schema objects that can be used to summarize,
precompute, replicate, and distribute data. E.g. to construct a data warehouse,
reporting, etc. A materialized view can be either read-only, updatable, or writable.
Users cannot perform data manipulation language (DML) statements on read-only
materialized views, but they can perform DML on updatable and writable
materialized views.

 A materialized view provides indirect access to table data by storing the results of
a query in a separate schema object. Unlike an ordinary view, which does not take
up any storage space or contain any data. You can define a materialized view on a
base table, partitioned table or view and you can define indexes on a materialized
view.

Q. If you are working with a legacy application, and some of the database tables are not
properly designed with the appropriate constraints, how will you go about rectifying the
situation?
A. One possible solution is to write triggers to perform the appropriate validation. Here is
an example of an insert trigger.

?
1
2 CREATE TRIGGER TableA_itrig
3 ON TableA FOR INSERT
4 AS
5 BEGIN
6
7 IF @@rowcount = 0
RETURN
8
9 IF NOT EXISTS
10 (
11 SELECT *
12 FROM inserted ins, TableB ol
13 )WHERE ins.code = ol.code
14
15 BEGIN
16 RAISERROR 20001, "The associated object is not found"
17 ROLLBACK TRAN
18 RETURN
19 END
20 END
21
22
Q. If you are working on a new application that requires stringent auditing requirements,
how would you go about achieving it?
A. Since it is a new application, there are a number of options as listed below.

 The application is designed from the beginning so that all changes are logged
either synchronously or asynchronously. Asynchronously means publishing the
auditing messages to a queue or topic, and a separate process will receive these
messages and write a database or flat file. All data changes go through a data
access layer of the application which logs all changes

 The database is constructed in such a way that logging information is included in


each table, perhaps set via a trigger. This approach may adversely impact
performance when inserts and updates are very frequent.

Q. What if you have to work with an existing legacy application?


A. Use triggers.

Vous aimerez peut-être aussi