Vous êtes sur la page 1sur 53

BUSINESS INTELLIGENCE LAB REPORT

(A Report Submitted in Partial Fulfillment of the Requirements for the Degree of


Master of Business Administration (Banking Technology) in Pondicherry University)

Submitted By

Ms. A.LALITHA
Registration No. : 15381033

Under the Guidance of

Associate Professor Dr. S. JANAKIRAMAN


and
Professor Dr. Usha Varatharajan

MBA: BANKING TECHNOLGY

DEPARTMENT OF BANKING TECHNOLGY


PONDICHERRY UNIVERSITY
PONDICHERRY - 605014

DEPARTMENT OF BANKING TECHNOLOGY


SCHOOL OF MANAGEMENT
PONDICHERRY UNIVERSITY
PUDUCHERRY 605 014

NAME

A. LALITHA

REG. NO.

15381033

SUBJECT

BUSINETGLCARPO

eProncjtihag

HeadofthDprmn

SubmiteodhrfVva-cExnl

NTEIRALXM

EXTRNALMI

1. Introduction:

1.1. Business Intelligence


Business intelligence (BI) refers to computer-based techniques used in
identifying, extracting, and analyzing business data, such as sales revenue by
products and/or departments, or by associated costs and incomes. BI
technologies provide historical, current and predictive views of business
operations. Common functions of business intelligence technologies are
reporting, online analytical processing, analytics, data mining, business
performance management, benchmarking and text mining and predictive
analytics.
Business intelligence aims to support better business decision-making.
Thus a BI system can be called a decision support system (DSS). BI uses
technologies, processes, and applications to analyze mostly internal, structured
data and business processes while competitive intelligence gathers, analyzes
and disseminates information with a topical focus on company competitors.
Business intelligence understood broadly can include the subset of competitive
intelligence.

1.2. Datawarehousing
A data warehouse (DW) is a database used for reporting. The data is
offloaded from the operational systems for reporting. The data may pass
through an operational data store for additional operations before it is used in
the DW for reporting. A data warehouse maintains its functions in three layers:
staging, integration, and access. Staging is used to store raw data for use by
developers (analysis and support). The integration layer is used to integrate
data and to have a level of abstraction from users. The access layer is for
getting data out for users.
This definition of the data warehouse focuses on data storage. The main
source of the data is cleaned, transformed, catalogued and made available for
use by managers and other business professionals for data mining, online
analytical processing, market research and decision support (Marakas & OBrien
2009). However, the means to retrieve and analyze data, to extract, transform
and load data, and to manage the data dictionary are also considered essential
components of a data warehousing system. Many references to data
warehousing use this broader context. Thus, an expanded definition for data

warehousing includes business intelligence tools, tools to extract, transform and


load data into the repository, and tools to manage and retrieve metadata.

1.3. Microsoft SQL Server R2


Microsoft SQL Server is a relational model database server produced by
Microsoft. Its primary query languages are T-SQL and ANSI SQL. SQL Server 2008
R2 adds certain features to SQL Server 2008 including a master data
management system branded as Master Data Services, a central management
of master data entities and hierarchies. Also Multi Server Management, a
centralized console to manage multiple SQL Server 2008 instances and services
including relational databases, Reporting Services, Analysis Services &
Integration Services.

1.4. Business Intelligence Development Studio


Business Intelligence Development Studio (BIDS) is the IDE from Microsoft
used for developing data analysis and Business Intelligence solutions utilizing
the Microsoft SQL Server Analysis Services, Reporting Services and Integration
Services. It is based on the Microsoft Visual Studio development environment
but customizes with the SQL Server services-specific extensions and project
types, including tools, controls and projects for reports, ETL dataflows, OLAP
cubes and data mining structure.

2. Personal Loans
Personal loans are unsecured loans which people can use for a variety of
purposes, such as paying tax bills, covering school tuition, or making car repairs.
Many banks and other lenders offer personal loans to people with good credit
records who can demonstrate an ability to repay them. This type of loan is often
touted as a useful tool for consolidating debt, for people who have multiple
outstanding accounts which are difficult to manage. By using a single loan to
pay off debt, people can consolidate their debt into one monthly payment, and
they may also achieve a lower interest rate, which is a distinct benefit.
Consolidating debt also tends to increase one's credit rating.
There are two types of personal loans. A closed-end loan is a one-time
loan of a set amount, with a fixed rate and repayment schedule. This type of
loan often has a repayment period of one to two years, depending on the
amount which is borrowed, and borrowers can choose to make additional
payments to pay the loan off more quickly. For one-time expenses, a closed-end
loan can be very useful.
4

3. Problem Definition
The objective is to perform Extract, Transform & Load (ETL) operations on
the set of input files containing the details of Personal Loans of a particular bank.
Each input file is of a specific format like XML, txt, csv, etc. The first part of an
ETL process involves extracting the data from these sources and carrying out
transformations on these data.
The load phase loads the data into the end target, usually the data
warehouse (DW). As the load phase interacts with a database, the constraints
defined in the database schema as well as in triggers activated upon data
load apply (for example, uniqueness, referential integrity, mandatory fields),
which also contribute to the overall data quality performance of the ETL process.

4. Stage I: Building The Warehouse


For the ETL process the inputs are the files;
1. allocation.csv (contains the details of the loans allocated)
2. customers.txt (contains the details of the customers who have availed
loans)
3. employees.txt (contains the organizational details of the employees)
4. payments.xml (contains the payments or dues made by the customers)
Below is the structure of each of the input file;

4.1. Creating ETL Flows


1. Open SQL Server Business Intelligence Development Studio from
the start menu and go to: File --> New --> Project --> Integration Services
Project
2. From the Toolbox go to Control Flow Items and add a Data Flow Task
in the Control Flow Tab (figure shown below).

3. Click on the Data Flow Task and switch to the Data Flow tab.
4. From the Toolbox go to Data Flow Sources and add a Flat File Source,
from the Data Flow Transformations add a Derived Column and from
the Data Flow Destinations add a OLE DB Destination (Data Flow
Diagram) given below.

Note: use the green arrows to do the mapping and the red rows for the
reject rows.
5. Double click the Flat File Source and click the New button and browse
for the File name (figure shown below).

6. Switch to the Columns to check whether the data is segmented as


expected. Use the Column delimiter to switch to the appropriate
delimiter. (In this example its \t a tab that separated between the fields).

7. Switch to the Advanced and change the column names as desired. Check
the Preview tab to see whether the data is in the format as expected
(figure shown below). Click OK.

8. Double click on the Derived Column and Expand the Columns tree. Drag
and drop all the items to the Derived Column Name in the below
window. Make sure to use to Replace appropriate columns and click OK
(figure shown below). This is the stage where any transformations can be
carried out using the functions given in the right pane.

9. Double click the OLE DB Destination and press New to create a new
connection to the database.

10

10.
Choose the Data access mode as Table or View - fast load and
uncheck the Table Lock constraint. If the table is not yet created from the
SQL Management Studio then create a new table by clicking the New
button in the Name of the table or view to create a new table (figure
shown below).

11.
Go to the mappings and choose the mappings from the input
columns to the appropriate columns in the table.

11

Press F5 or go to Debug --> Start Debugging to debug the project. If


everything goes well then the customers data will be loaded to the tables.
The ETL flows for the Personal Loan project are shown below;
Control Flow:

Employees Data Flow:

12

Customers Data Flow:

Payments Data Flow:

Allocations Data Flow:

13

When the project is run, all the flows are executed at one go and the details are
loaded into the database (figure given below). When the stages are shown in
green color, the loading has been completed successfully.

5. Stage II: Building The Warehouse


5.1. Creating Cubes
1. Open SQL Server Business Intelligence Development Studio from the
start menu and go to: File --> New --> Project --> Analysis Services Project
2. In the Solution Explorer, create a new Data Source and Data Source Views
by right clicking on the appropriate icons and New Data Source and New
Data Source View.
3. While creating the Data Source View, add all the tables that need to be
included in creating the cube.
4. The resulting Data Source View will be something like the one below.

14

5. Next right click on Cubes in the Solution Explorer and select the New
Cube menu. In the cube wizard select to use the Use existing tables option
and choose the tables that need to be imported to form the cube. Below is
the figure depicting the cube structure.

5.2. Building the Reports


1. Open SQL Server Business Intelligence Development Studio from the
start menu and go to: File --> New --> Project --> Report Server Project.
2. In the Solution Explorer, create a new Shared Data Source and using the
wizard (Use the server name as localhost.
3. Right click on Reports in Solution Explorer, click Add New Report.
4. In the wizard, use the created Shared Data Source.
5. Select the Query Builder button and in the new window, select Add Table
and choose the tables that need to be queried to produce reports (figure
given below).

15

6. Select the fields that need to be chosen for querying. Apply filters or sorting
or group by if necessary. The query will be automatically built in the below
pane. Click OK.
HerathQuisExdc
Query 1: BASED ON DATE CREATION AND THE NAME
SELECT Customers.DateCreated, Employees.CompanyName
FROM Allocations INNER JOIN
Customers ON Allocations.CustomerId = Customers.CustomerId INNER JOIN
Employees ON Allocations.EmployeeId = Employees.EmployeeId INNER JOIN
Payments ON Allocations.AllocationId = Payments.AllocationId
WHERE (Customers.DateCreated > '2010-12-01')

16

QUERY 2: REGION WISE PAYMENT RECORD FOR COMPANIES


SELECT
dbo.Customers.Region, SUM(dbo.Allocations.Amount) AS [Total Loan
Alloted], SUM(dbo.Payments.PayAmount * dbo.Allocations.Period)
AS [Total Payments Recieved], dbo.Employees.CompanyName,
SUM(DISTINCT dbo.Payments.PayAmount * dbo.Allocations.Period)
- SUM(DISTINCT dbo.Allocations.Amount) AS [Interest Paid]
FROM
dbo.Customers INNER JOIN
dbo.Allocations ON dbo.Customers.CustomerId =
dbo.Allocations.CustomerId INNER JOIN
dbo.Payments ON dbo.Customers.CustomerId =
dbo.Payments.CustomerId INNER JOIN
dbo.Employees ON dbo.Customers.EmployeeId =
dbo.Employees.EmployeeId
GROUP BY dbo.Customers.Region, dbo.Employees.CompanyName

17

QUERY 3: COMPANY REPAYMENT RECORD BASED ON REPAYMENT PERIODS


SELECT
dbo.Employees.CompanyName, SUM(dbo.Allocations.Amount) AS [Total
Loan Amount], SUM(dbo.Allocations.Emi * dbo.Allocations.Period)
AS [Total Repayment Amount], SUM(dbo.Allocations.Emi *
dbo.Allocations.Period) AS [Amount Repaid], dbo.Allocations.InterestRate,
dbo.Allocations.Period
FROM
dbo.Allocations INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId INNER JOIN
dbo.Payments ON dbo.Allocations.AllocationId =
dbo.Payments.AllocationId
GROUP BY dbo.Employees.CompanyName, dbo.Allocations.InterestRate,
dbo.Allocations.Period
ORDER BY dbo.Employees.CompanyName, COUNT(DISTINCT
dbo.Payments.PaymentDate)

18

QUERY 4: COMPANIES REPAYMENT RECORD IN RECESSION PERIOD

SELECT
dbo.Employees.CompanyName, SUM(dbo.Allocations.Amount) AS [Total
Loan Amount], SUM(dbo.Allocations.Emi) * dbo.Allocations.Period AS [Total Emi To Pay],
SUM(DISTINCT dbo.Payments.PayAmount) AS [Amount Repaid],
dbo.Allocations.InterestRate, dbo.Allocations.Period
FROM
dbo.Allocations INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId INNER JOIN
//
dbo.Payments ON dbo.Allocations.AllocationId =
dbo.Payments.AllocationId
GROUP BY dbo.Employees.CompanyName, dbo.Allocations.InterestRate,
dbo.Allocations.Period
ORDER BY dbo.Employees.CompanyName, COUNT(DISTINCT
dbo.Payments.PaymentDate)

19

QUERY 5: REPAYMENT RECORD OF COMPANIES BASED ON GENDER

SELECT
dbo.Customers.Gender, SUM(dbo.Allocations.Amount) AS [Total Loan
Amount Sanctioned], SUM(dbo.Allocations.EMI * dbo.Allocations.Period)
AS [Total Amount To Be Paid], SUM(dbo.Payments.PayAmount) AS [Total
Amount Repaid], dbo.Employees.CompanyName,
SUM(dbo.Allocations.EMI * dbo.Allocations.Period) SUM(dbo.Payments.PayAmount) AS [Total Payments Remaining]
FROM
dbo.Allocations INNER JOIN
dbo.Customers ON dbo.Allocations.CustomerId =
dbo.Customers.CustomerId INNER JOIN
dbo.Payments ON dbo.Allocations.AllocationId =
dbo.Payments.AllocationId INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId
GROUP BY dbo.Customers.Gender, dbo.Employees.CompanyName

20

QUERY 6: LIST OF LOAN DEFAULTERS

QUERY 7: PERSONAL LOANS ISSUED DURING RECESSION PERIOD


SELECT Allocations.CustomerId, MAX(DISTINCT Customers.Name) AS Expr1,
Allocations.Amount, Allocations.EMI, Allocations.InterestRate, Customers.DateCreated
FROM
Allocations INNER JOIN
Customers ON Allocations.CustomerId = Customers.CustomerId INNER JOIN
Employees ON Allocations.EmployeeId = Employees.EmployeeId INNER JOIN
Payments ON Allocations.AllocationId = Payments.AllocationId
GROUP BY Allocations.CustomerId, Allocations.Amount, Allocations.EMI,
Allocations.InterestRate, Customers.DateCreated

21

Query 8: REPAYEMENT OF LOANS BASED ON INTEREST RATE

SELECT
dbo.Customers.Address, SUM(dbo.Allocations.InterestRate) AS [Total Loan
Alloted], SUM(dbo.Payments.PayAmount * dbo.Allocations.Period)
AS [Total Payments Recieved], dbo.Employees.CompanyName,
SUM(DISTINCT dbo.Payments.PayAmount * dbo.Allocations.Period)
- SUM(DISTINCT dbo.Allocations. InterestRate) AS [Interest Paid]
FROM
dbo.Customers INNER JOIN
dbo.Allocations ON dbo.Customers.CustomerId =
dbo.Allocations.CustomerId INNER JOIN
dbo.Payments ON dbo.Customers.CustomerId =
dbo.Payments.CustomerId INNER JOIN
dbo.Employees ON dbo.Customers.EmployeeId =
dbo.Employees.EmployeeId
GROUP BY dbo.Customers.Address, dbo.Employees.CompanyName

22

Query 9: REPAYMENT OF LOANS BASED ON MONTHLY SALARY AND EMI


SELECT
dbo.Employees.MonthlySalary, SUM(dbo.Allocations.Amount) AS [Total Loan
Amount], SUM(dbo.Allocations.EMI) * dbo.Allocations.Period AS [Total EMITo Pay],
SUM(DISTINCT dbo.Payments.PayAmount) AS [Amount Repaid],
dbo.Allocations.InterestRate, dbo.Allocations.Period
FROM
dbo.Allocations INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId INNER JOIN
dbo.Payments ON dbo.Allocations.AllocationId =
dbo.Payments.AllocationId
GROUP BY dbo.Employees.MonthlySalary, dbo.Allocations.InterestRate,
dbo.Allocations.Period
ORDER BY dbo.Employees.CompanyName, COUNT(DISTINCT
dbo.Payments.PaymentDate)

23

Query 10: Personal loans issued MINIMUM during recession period


SELECT
dbo.Employees.CompanyName, SUM(dbo.Allocations.Amount) AS [Total
Loan Amount], SUM(dbo.Allocations.EMI) * dbo.Allocations.Period AS [Total EMI To Pay],
MIN(DISTINCT dbo.Payments.PayAmount) AS [Amount Repaid],
dbo.Allocations.InterestRate, dbo.Allocations.Period
FROM
dbo.Allocations INNER JOIN
dbo.Employees ON dbo.Allocations.EmployeeId =
dbo.Employees.EmployeeId INNER JOIN
dbo.Payments ON dbo.Allocations.AllocationId = dbo.Payments.AllocationId
GROUP BY dbo.Employees.CompanyName, dbo.Allocations.InterestRate,
dbo.Allocations.Period
ORDER BY dbo.Employees.CompanyName, COUNT(DISTINCT
dbo.Payments.PaymentDate)

24

CASENLYI
aUgesofBnkivScr
Case:
5. Usage of Banking Services
In day to day life we come across many banking services if the customers
are using ATMs, Debit Card, Credit card, Current account, loan account

25

etc. Here by integrating the account no we can able to know whether the
customer uses the services daily, weekly, monthly , twice a month etc.
Same time by Integrating the loan account with customer details we come
to know how many customers use debit card, how many uses Internet
Banking, how many uses Mobile Banking , how many of them having
loans, how many of the customers repaid correctly, not repaid etc.
By integrating this data we will able to decide the customers loyalty
towards the bank , which banking services is most widely used.
6. Stage I: Building The Warehouse
For the ETL process the inputs are the files;
Below is the structure of each of the input file;
1. BankDetails.csv (contains the Bank Name and loan id)
2. ProductDetails.txt(contains the details of Loan amount , Balance )
3. Region.txt(contains the details of the services offered by bank)

26

27

1. Open SQL Server Business Intelligence Development Studio from


the start menu and go to: File --> New --> Project --> Integration Services
Project
From the Toolbox go to Control Flow Items and
add a Data Flow Task in the Control Flow Tab
(figure shown below).

28

Contol Flow Processes for Bank Details

DatflowpcesrBnki

29

ExtracingDfombse

30

nDefihgtColums

31

32

After the Integration Process Select New Project and Go for Analysis

33

Load the Database

Open the data source wizard and Impersonation Information Dialog Box will be opened choose use
the service account.

34

Click ok and complete the Wizard.

After creating a Data Source Wizard Create a cube using existing tables
The process ofcreating cubes is shown below

35

36

37

38

Cube is Deployed successfully

After generating Cube goto new project Select Report Service Project

39

In datasource Microsoft SQLServer(sqlclient) and server name (local) and test the connection if the
test connection is successful then go to next step.

Select the excisting Datasource

40

In the Query builder add the existing Tables and establish a logical relationship between them

Click ok then choose the report type as Tabular and then click next

41

Based on the query choose the available fields

Click preview report and finished the query will be executed successfully.

42

Query 1: Statewise loan repayment


use sample
Select
Status
Product,State,LoanId
From
ProductDetails
where LoanId <=25000

43

Query 2: loan id starts from 25,000 or more than that

use sample
Select
Status
Product,State,LoanId
From
ProductDetails
where LoanId >=25000

44

Query 3: Its the Average Product used by different state customers


use sample
Select
iStatus
Productavg,State,LoanId
From
ProductDetails
where LoanId between 8000 and 20000

45

46

Data Mining
Introduction About Weka the Data Mining Tool.
It consists of following
1.
2.
3.
4.

Explorer
Experimenter
Knowledge Flow
Work Bench

For the above related case we were mining the state customers how many of them
uses Atm cards, credit cards, Debit cards, internet Banking etc using Weka tool.
1. For starting the preprocess we have to export the Csvfile to Arff file. And we have
to define the class attribute.
2. Explorer can read the csv files directly by selecting the file type as csv in file
Selection Menu.
3. Select the preprocess and click an open file and select the file type to open the
file

47

By clicking the various attributes we can see the graphical representation of this
process also.
And we can add the missing values in the table fields.

48

49

4. We can mine the data base or classify the database using various algorithms,
here we have choosen Nave Buyers Algorithm.
Select the Classifier tab
Select the appropriate test options. Here am choosing Cross validation.

After choosing click start the result will be displayed with a short delay.
We can use the result view to
o Load / Save models
o Save results buffer

5. After classifying we can cluster our database and we can visualize the cluster
Scattered clustering
We can access the various clustering algorithms from the cluster tab
We can change the parameters to our needs.
I have choosen k means clustering algorithm with k=3
We can visualize the Dataand ease of changing the attributes against the axisand we
can adjust the noise as well

50

Here it is a general cluster and we can see the scattered cluster across various
attributes

This cluster is based on Pay


status

51

The below cluster is based on paystatus on weekly basis

52

Linear Clustering

53

Vous aimerez peut-être aussi