Vous êtes sur la page 1sur 27

BUSINESS INTELLIGENCE AND APPLICATIONS QUESTION PAPER- 2009 PREPARED BY: Simranjeet (019), Gunjan Garg (042), Aanchal

Garg (066)

Q1 (a) What are the hardware and software requirements to build the decision support system? Differentiate between MIS and DSS. Ans: A Decision Support System (DSS) is a collection of integrated software applications and hardware that form the backbone of an organizations decision making process. Companies across all industries rely on decision support tools, techniques, and models to help them assess and resolve everyday business questions. The decision support system is data-driven, as the entire process feeds off of the collection and availability of data to analyze. Business Intelligence (BI) reporting tools, processes, and methodologies are key components to any decision support system and provide end users with rich reporting, monitoring, and data analysis. High-level Decision Support System Requirements:

Data collection from multiple sources (sales data, inventory data, supplier data, market research data. etc.)

Data formatting and collation A suitable database location and format built for decision support -based reporting and analysis

Robust tools and applications to report, monitor, and analyze the data

Decision support systems have become critical and ubiquitous across all types of business. In todays global marketplace, it is imperative that companies respond quickly to market changes. Companies with comprehensive decision support systems have a significant competitive advantage.

Decision Support Systems delivered by MicroStrategy Business Intelligence Business Intelligence (BI) reporting tools, processes, and methodologies are key components to any decision support system and provide end users with rich reporting, monitoring, and data analysis. MicroStrategy provides companies with a unified reporting, analytical, and monitoring platform that forms the core of any Decision Support System. The software exemplifies all of the important characteristics of an ideal Decision Support System:

Supports individual and group decision making: MicroStrategy provides a single platform that allows all users to access the same information and access the same version of truth, while providing autonomy to individual users and development groups to design reporting content locally.

Easy to Develop and Deploy: MicroStrategy delivers an interactive, scalable platform for rapidly developing and deploying projects. Multiple projects can be created within a single shared metadata. Within each project, development teams create a wide variety of re-usable metadata objects. As decision support system deployment expands within an organization, the MicroStrategy platform effortlessly supports an increasing concurrent user base.

Comprehensive Data Access: MicroStrategy software allows users to access data from different sources concurrently, leaving organizations the freedom to choose the data warehouse that best suits their unique requirements and preferences.

Integrated software: MicroStrategys integrated platform enables administrators and IT professionals to develop data models, perform sophisticated analysis, generate analytical reports, and deliver these reports to end users via different channels (Web, email, file, print and mobile devices). This eliminates the need for companies to spend countless effort purchasing and integrating disparate software products in an attempt to deliver a consistent user experience.

Flexibility: MicroStrategy SDK (Software Development Kit) exposes its vast functionality through an extensive library of APIs. Micro Strategy customers can choose to leverage the power of the softwares flexible APIs to design and deploy solutions tailored to their unique business needs.

MIS: 1) Management Information system operates on operational efficiency i.e. it concentrates to do the things in right manner. 2) It allows the communication across the managers from different areas in a business organization. 3) It allows flow of information in both upward and downward direction. 4) MIS is original form of management information.

DSS: 1) Decision support system helps in making effective decisions as it allows doing only right things. 2) It is concerned about leadership and senior management in an organization providing effective judgment support. 3) It flows only in upward direction. 4) DSS is actually advancement of MIS.

(b) Explain the term groupware. How it is linked to the term group decision support system? Ans: Groupware refers to programs that help people work together collectively while located remotely from each other. Programs that enable real time collaboration are called synchronous groupware. Groupware services can include the sharing of calendars, collective writing, e-mail handling, shared database access, electronic meetings with each person able to see and display information to others, and other activities. Sometimes called collaborative software, groupware is an integral component of a field of study known as Computer-Supported Cooperative Work or CSCW. Groupware is often broken down into categories describing whether or not work group members collaborate in real time (synchronous groupware and asynchronous groupware). Some product examples of groupware include Lotus Notes and Microsoft Exchange, both of which facilitate calendar sharing, e-mail handling, and the replication of files across a distributed

system so that all users can view the same information. Electronic "face-to-face" meetings are facilitated by CU-See Me and Microsoft NetMeeting. Groupware or group support systems (GSS) have evolved over time. One definition available in the literature is that GSS are computer-based information systems used to support intellectual, collaborative work (Jessup and Valacich, 1993). This definition is too broad for one discussion, because it does not specifically address the role of groups. Another definition emerges as "tools designed to support communications among members of a collaborative work group" (Hosseini, 1995, p. 368). Another way to describe a GSS is as "the collective of computer-assisted technologies used to aid group efforts directed at identifying and addressing problems, opportunities and issues" (Huber, Valacich, and Jessup, 1993, p. 256). Groupware exists to facilitate the movement of messages or documents so as to enhance the quality of communication among individuals in remote locations. It provides access to shared databases, document handling, electronic messaging, work flow management, and conferencing. In fact, groupware can be thought of as a development environment in which cooperative applicationsincluding decisionscan be built. Groupware achieves this through the integration of eight distinct technologies: messaging, conferencing, group document handling, work flow, utilities/development tools, frameworks, services, and vertical market applications. Hence, it provides the foundation for the easy exchange of data and information among individuals located far apart. Although no currently available product has an integrated and complete set of capabilities

Q2 (a) Differentiate among structured decisions, semi structured decisions and unstructured decisions. What type of decisions are supported by DSS and why? Ans: STRUCTURED DECISIONS. Many analysts categorize decisions according to the degree of structure involved in the decision-making activity. Business analysts describe a structured decision as one in which all three components of a decisionthe data, process, and evaluation are determined. Since structured decisions are made on a regular basis in business environments, it makes sense to place a comparatively rigid framework around the decision and the people making it. Structured decision support systems may simply use a checklist or form to ensure that all necessary data is collected and that the decision making process is not skewed by the absence of necessary data. If the choice is also to support the procedural or process component of the decision, then it is quite possible to develop a program either as part of the checklist or form. In fact, it is also possible and desirable to develop computer programs that collect and combine the data, thus giving the process a high degree of consistency or structure. When there is a desire to make a decision more structured, the support system for that decision is designed to ensure consistency. Many firms that hire individuals without a great deal of experience provide them with detailed guidelines on their decision making activities and support them by giving them little flexibility. One interesting consequence of making a decision more structured is that the liability for inappropriate decisions is shifted from individual decision makers to the larger company or organization.

UNSTRUCTURED DECISIONS. At the other end ofthe continuum are unstructured decisions. While these decisions have the same components as structured onesdata, process, and evaluationthere is little agreement on their nature. With unstructured decisions, for example, each decision maker may use different data and processes to reach a conclusion. In addition, because of the nature of the decision there may only a limited number of people within the organization that are even qualified to evaluate the decision. Generally, unstructured decisions are made in instances in which all elements of the business environmentcustomer expectations, competitor response, cost of securing raw materials, etc.

are not completely understood (new product and marketing strategy decisions commonly fit into this category). Unstructured decision systems typically focus on the individual or team that will make the decision. These decision makers are usually entrusted with decisions that are unstructured because of their experience or expertise, and therefore it is their individual ability that is of value. One approach to support systems in this area is to construct a program that simulates the process used by a particular individual. In essence, these systemscommonly referred to as "expert systems"prompt the user with a series of questions regarding a decision situation. "Once the expert system has sufficient information about the decision scenario, it uses an inference engine which draws upon a data base of expertise in this decision area to provide the manager with the best possible alternative for the problem," explained Jatinder N.D. Gupta and Thomas M. Harris in the Journal of Systems Management. " The purported advantage of this decision aid is that it allows the manager the use of the collective knowledge of experts in this decision realm. Some of the current DSS applications have included long-range and strategic planning policy setting, new product planning, market planning, cash flow management, operational planning and budgeting, and portfolio management." Another approach is to monitor and document the process that was used so that the decision maker(s) can readily review what has already been examined and concluded. An even more novel approach used to support these decisions is to provide environments that are specially designed to give these decision makers an atmosphere that is conducive to their particular tastes. The key to support of unstructured decisions is to understand the role that individuals experience or expertise plays in the decision and to allow for individual approaches.

SEMI-STRUCTURED DECISIONS. In the middle of the continuum are semi-structured decisions, and this is where most of what are considered to be true decision support systems are focused. Decisions of this type are characterized as having some agreement on the data, process, and/or evaluation to be used, but are also typified by efforts to retain some level of human judgement in the decision making process. An initial step in analyzing which support system is required is to understand where the limitations of the decision maker may be manifested (i.e., the data acquisition portion, the process component, or the evaluation of outcomes). Grappling with the latter two types of decisionsunstructured and semi-structuredcan be particularly problematic for small businesses, which often have limited technological or work

force resources. As Gupta and Harris indicated, "many decision situations faced by executives in small business are one-of-a-kind, one-shot occurrences requiring specifically tailored solution approaches without the benefit of any previously available rules or procedures. This unstructured or semi-structured nature of these decisions situations aggravates the problem of limited resources and staff expertise available to a small business executive to analyze important decisions appropriately. Faced with this difficulty, an executive in a small business must seek tools and techniques that do not demand too much of his time and resources and are useful to make his life easier." Subsequently, small businesses have increasingly turned to DSS to provide them with assistance in business guidance and management.

Key Dss Functions Gupta and Harris observed that DSS is predicated on the effective performance of three functions: information management, data quantification, and model manipulation: "Information management refers to the storage, retrieval, and reporting of information in a structured format convenient to the user. Data quantification is the process by which large amounts of information are condensed and analytically manipulated into a few core indicators that extract the essence of data. Model manipulation refers to the construction and resolution of various scenarios to answer 'what if' questions. It includes the processes of model formulation, alternatives generation and solution of the proposed models, often through the use of several operations

research/management science approaches." Entrepreneurs and owners of established enterprises are urged to make certain that their business needs a DSS before buying the various computer systems and software necessary to create one. Some small businesses, of course, have no need of a DSS. The owner of a car washing establishment, for instance, would be highly unlikely to make such an investment. But for those business owners who are guiding a complex operation, a decision support system can be a valuable tool. Another key consideration is whether the business's key personnel will ensure that the necessary time and effort is spent to incorporate DSS into the establishment's operations. After all, even the best decision support system is of little use if the business does not possess the training and knowledge necessary to use it effectively. If, after careful study of questions of DSS utility, the small business owner decides that DSS can help his or her company, the necessary

investment can be made, and the key managers of the business can begin the process of developing their own DSS applications using available spreadsheet software. (b) What is the role of checkpoint in SQL server? Explain the syntax of create table. Ans: SQL Server checkpoint A SQL Server checkpoint is the process of writing all dirty data file pages out to disk. A dirty page is page that has changed in memory (buffer cache) since they were read from disk or since the last checkpoint. This is done regardless of the transaction that made the change. SQL Server uses a protocol called Write Ahead Logging (WAL) and it is this process that writes all log records describing a change to the data page to disk before the actual page is written to disk. Checkpoints can occur concurrently on any number of databases on an instance. How do database checkpoint occur? Before a backup the database engine automatically performs a checkpoint, this ensures that all database changes are contain in the backup. You issue a manual checkpoint command, a checkpoint is the run against the database in use SQL Server is shutdown. If the checkpoint is skipped (SHUTDOWN WITH NOWAIT) the restart will take much longer ALTER DATABASE is used to add or remove a database file. If you change the recovery model from bulk-logged to full or full to simple recovery model.

If your database is in full or bulk logged recovery mode checkpoints are run periodically as specified by the recovery interval server setting

In simple recovery checkpoints are run when the log becomes 70% full or based on the recovery interval setting, whichever comes first. The CREATE TABLE statement is used to create a table in a database. SQL CREATE TABLE Syntax CREATE TABLE table_name ( column_name1 data_type, column_name2 data_type, column_name3 data_type, .... )

Q3 (a) Describe the data types supported by SQL server? Ans: Numeric: Stores numeric values. Monetary: It stores numeric values with decimal places. It is used specially for currency values. Data and Time: It stores date and time information. Character: It supports character based values of varying lengths. Binary: It stores data in strict binary (0 or 1) Representation. Special purpose: SQL Server contains Complex data types to handle the XML Documents, Globally unique identifiers etc. Table: It is used to hold a result set for subsequent processing. This data type cannot be used for a column. The only time you use this data type is when declaring table variables in triggers, stored procedures, and functions. Xml: It stores an XML document of up to 2 GB in size. You can specify options to force only well-formed documents to be stored in the column. (b) Consider the table i) Find the average salary for each dept no. ii) Display the employees who work in dept no. 1 and 20 iii) Display the employee names who were hired in the year 2007. iv) Add a new record to the employee table.

Ans: a) Select * Avg_sal from employee; b) Select *from employee where dept no = 10 and dept no =20; c) Select ename from employee where hiredate =2007;

Q4 (a) Draw and explain the data warehouse architecture for banking sector? Ans:

(b) What are the advantages of portioning of the factual table in data warehouse? Ans: Data Partitioning is also the process of logically and/or physically partitioning data into segments that are more easily maintained or accessed. Current RDBMS systems provide this kind of distribution functionality. Partitioning of data helps in performance and utility processing.

Data Partitioning can be of great help in facilitating the efficient and effective management of highly available relationaldata warehouse . But data partitioning could be a complex process which has several factors that can affect partitioning strategies and design, implementation, and management considerations in a data warehousing environment.

A data warehouse which is powered by a relational database

management system can provide

for a comprehensive source of data and an infrastructure for building Business Intelligence (BI) solutions. Typically, an implementation of a relational data warehouse can involve creation and management of dimension tables and fact tables. A dimension table is usually smaller in size compared to a fact table but they both provide details about the attributes used to describe or explain business facts. Some examples of a dimension include item, store, and time. On the other hand, a fact table represents a business recording like item sales information for all the stores. All fact table need to be periodically updated using data which are the most recently collected from the various data sources.

Since data warehouses need to manage and handle high volumes of data updated regularly, careful long term planning is beneficial. Some of the factors to be considered for long term planning of a data warehouse include data volume, data loading window,

Index maintenance window, workload characteristics, data aging strategy, archive and backup strategy and hardware characteristics

There are two approaches to implementing a relational data warehouse: monolithic approach and partitioned approach. The monolithic approach may contain huge fact tables which can be difficult to manage.

There are many benefits to implementing a relational data warehouse using the data partitioning approach. The single biggest benefit to a data partitioning approach is easy yet efficient maintenance. As an organization grows, so will the data in the database. The need for high availability of critical data while accommodating the need for a small database maintenance window becomes indispensable. Data partitioning can answer the need to small database maintenance window in a very large business organization. With data partitioning, big issues pertaining to supporting large tables can be answered by having the database decompose large chunks of data into smaller partitions thereby resulting in better management. Data partitioning also results in faster data loading, easy monitoring of aging data and efficient data retrieval system.

Data partitioning in relational data warehouse can implemented by objects partitioning of base tables, clustered and non-clustered indexes, and index views. Range partitions refer to table partitions which are defined by a customizable range of data. The end user or database administrator can define the partition function with boundary values, partition scheme having file group mappings and table which are mapped to the partition scheme.

Q5 (a) What are the roles of Summary tables and mata deta in the data warehouse? Ans: The first image most people have of the data warehouse is a large collection of historical, integrated data. While that image is correct in many regards, there is another very important element of the data warehouse that is vital - metadata. Metadata is data about data. Metadata has been around as long as there have been programs and data that the programs operate on. Figure 1 shows metadata in a simple form. While metadata is not new, the role of metadata and its importance in the face of the data warehouse certainly is new. For years the information technology professional has worked in the same environment as metadata, but in many ways has paid little attention to metadata. The information professional has spent a life dedicated to process and functional analysis, user requirements, maintenance, architectures, and the like. The role of metadata has been passive at best in this milieu.

But metadata plays a very different role in data warehouse. Relegating metadata to a backwater, passive role in the data warehouse environment is to defeat the purpose of data warehouse. Metadata plays a very active and important part in the data warehouse environment. The reason why metadata plays such an important and active role in the data warehouse environment is apparent when contrasting the operational environment to the data warehouse environment insofar as the user community is concerned.

The information technology professional is the primary community involved in the usage of operational development and maintenance facilities. It is expected that the information technology community is computer literate, and able to find his/her way around systems. The community served by the data warehouse is a very different community. The data warehouse serves the DSS analysis community. It is anticipated that the DSS analysis community is not computer literate. Instead the expectation is that the DSS analysis community is a businessperson community first, and a technology community second.

Simply from the standpoint of who needs help the most in terms of finding one's way around data and systems, it is assumed the DSS analysis community requires a much more formal and

intensive level of support than the information technology community. For this reason alone, the formal establishment of and ongoing support of metadata becomes important in the data warehouse environment.

But there is a secondary, yet important, reason why metadata plays an important role in the data warehouse environment. In the data warehouse environment, the first thing the DSS analyst needs to know in order to do his/her job is what data is available and where it is in the data warehouse. In other words, when the DSS analyst receives an assignment, the first thing the DSS analyst needs to know is what data there is that might be useful in fulfilling the assignment. To this end the metadata for the warehouse is vital to the preparatory work done by the DSS analyst. Contrast the importance of the metadata to the DSS analyst to the importance of metadata to the information technology professional. The information technology professional has been doing his/her job for many years while treating metadata passively.

(b) Why is starflake schema desired in data warehouse? Differentiate between factual data and reference data? Ans: The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. A star schema consists of fact tables and dimension tables. Fact tables contain the quantitative or factual data about a business-the information being queried. This information is often numerical, additive measurements and can consist of many columns and millions or billions of rows. Dimension tables are usually smaller and hold descriptive data that reflects the dimensions, or attributes, of a business. SQL queries then use joins between fact and dimension tables and constraints on the data to return selected information. Fact and dimension tables differ from each other only in their use within a schema. Their physical structure and the SQL syntax used to create the tables are the same. In a complex schema, a given table can act as a fact table under some conditions and as a dimension table

under others. The way in which a table is referred to in a query determines whether a table behaves as a fact table or a dimension table. Even though they are physically the same type of table, it is important to understand the difference between fact and dimension tables from a logical point of view.

A single Fact table (center of the star) surrounded by multiple dimensional tables (the points of the star).

Advantages:

Simplest DW schema Easy to understand Easy to Navigate between the tables due to less number of joins. Most suitable for Query processing

Disadvantages:

Occupies more space Highly Denormalized

Q6 Describe the techniques of data mining? Ans: Association Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship between items in the same transaction. Thats is the reason why association technique is also known as relation technique. The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together. Retailers are using association technique to research customers buying habits. Based on historical sale data, retailers might find out that customers always buy crisps when they buy beers, and therefore they can put beers and crisps next to each other to save time for customer and increase sales. Classification Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we develop the software that can learn how to classify the data items into groups. For example, we can apply classification in application that given all records of employees who left the company, predict who will probably leave the company in a future period. In this case, we divide the records of employees into two groups that named leave and stay. And then we can ask our data mining software to classify the employees into separate groups. Clustering Clustering is a data mining technique that makes meaningful or useful cluster of objects which have similar characteristics using automatic technique. The clustering technique defines the classes and puts objects in each class, while in the classification techniques, objects are assigned

into predefined classes. To make the concept clearer, we can take book management in library as an example. In a library, there is a wide range of books in various topics available. The challenge is how to keep those books in a way that readers can take several books in a particular topic without hassle. By using clustering technique, we can keep books that have some kinds of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in that topic, they would only have to go to that shelf instead of looking for entire library. Prediction The prediction, as it name implied, is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables. For instance, the prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction. Sequential Patterns Sequential patterns analysis is one of data mining technique that seeks to discover or identify similar patterns, regular events or trends in transaction data over a business period. In sales, with historical transaction data, businesses can identify a set of items that customers buy together a different times in a year. Then businesses can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past. Decision trees Decision tree is one of the most used data mining techniques because its model is easy to understand for users. In decision tree technique, the root of the decision tree is a simple question or condition that has multiple answers. Each answer then leads to a set of questions or conditions that help us determine the data so that we can make the final decision based on it.:

Q7 (a) Discuss the applications of data mining in telecommunication sector? Ans: The telecommunications industry was one of the first to adopt data mining technology. This is most likely because telecommunication companies routinely generate and store enormous amounts of high-quality data, have a very large customer base, and operate in a rapidly changing and highly competitive environment. Telecommunication companies utilize data mining to improve their marketing efforts, identify fraud, and better manage their telecommunication networks. However, these companies also face a number of data mining challenges due to the enormous size of their data sets, the sequential and temporal aspects of their data, and the need to predict very rare eventssuch as customer fraud and network failuresin real-time. The popularity of data mining in the telecommunications industry can be viewed as an extension of the use of expert systems in the telecommunications industry (Liebowitz, 1988). These systems were developed to address the complexity associated with maintaining a huge network infrastructure and the need to maximize network reliability while minimizing labor costs. The problem with these expert systems is that they are expensive to develop because it is both difficult and time consuming to elicit the requisite domain knowledge from experts. Data mining can be viewed as a means of automatically generating some of this knowledge directly from the data. Telecommunications is one of the most data-intensive industries in the world, and a great opportunity exists for telecom managers to analyze the large amounts of data that have been collected in their network databases in order to improve the short-term and long-term operations of their organizations. One highly effective tool to aid in this data analysis is the proven process of data mining. Data mining has been used for years to analyze data in two or more fields within a relational database. Typically, specialized software is needed and doesnt have to be costly. Moderately priced software is available that can get almost anyone started. Data mining enables the user to view data from many different perspectives, categorize the data in new ways and summarize the resulting relationships between seemingly incongruous pieces of data that the software has

identified. Careful analysis of these relationships can provide managers with the ability to optimize internal network operations and better manage external customer-facing activities such as churn and marketing. Ironically, with such an enviable wealth and diversity of data at their fingertips, many telecom managers have been reluctant to use data mining to their advantage. Arguments for this can run the gamut from too expensive to not enough time to lack of upper-management commitment. However, raising arguments like these will result, quite simply, in lost opportunities for many telcos. Conversely, if data mining is undertaken in a controlled fashion, many new opportunities will surface that will enable a company to become more profitable and competitive. And after all, the data already exists, and managers need to make better use of it. An important point to remember when considering data mining is that first and foremost, data mining is always a business activity. In order to arrive at meaningful results, a carrier needs to align data mining with the goals and objectives of its business. Otherwise the results will be irrelevant and lacking context. Data mining should also be focused on exploring different hypotheses, taking into account disparate data, where the end result of the data mining exercise could potentially drive building a better customer experience, creating new operational processes and improving the overall bottom line. Telecom data about customers such as call detail and customer information can be profitably data mined. Mining these data types can help you determine customer behavior and identify opportunities to support the goals of expansion of your customer base and reduction of customer churn. Customer churn is an area worth mining, as it is becoming ever more important to retain customers and improve wallet share. Mining churn rates tied to the number of trouble tickets issued in a 12-month period might uncover a correlation between the two. Perhaps after three trouble tickets, the customer leaves the network, rather than after two tickets. This analysis might prompt the carrier to flag all customers with two trouble tickets and inaugurate some action whose purpose is to retain those customers. Following those remedial actions the carrier should mine once again, and determine how many customers at possible risk remained with the carrier.

Data mining can also be invaluable in the development of marketing programs. For instance, a carrier might have a goal to increase the number of subscribers who pay their bills online, thereby reducing the cost of paper, printing, postage and handling. Mining the carriers database of online bill paying subscribers and the database of active users purchasing content on the telco portal might determine that a low percentage of subscribers using the content portal pay their bills online. A decision could then be made to place a link on a content portal page that says pay your bill here." If an upswing in online payments occurs, the carrier could mine again and determine the amount that billing costs were reduced, if at all, and whether the goal of decreased billing costs was achieved. If a carrier has a business goal of increased revenue from advertisers, data mining can be used to analyze a carriers data on portal usage by time of day. If usage is consistently up one or two times a day, every day, then the carrier might want to consider charging advertisers higher fees for the privilege of advertising during those times. And lets not forget about all the data mining that can be done around networks and equipment. Data mining of mean time between failures (MTBF) might result in a correlation between MTBF and other pieces of CPE. Vendor servers might be deployed in a network, and over a period of time it is determined that 30 percent of these servers consistently crash after 12.5 months. The decision can then be made to either replace the vendor servers every 12.2 months or find more reliable servers from another vendor. Above all else, once a carrier begins data mining, the tool should never be abandoned. Data mining should consistently be used as a means to proactively achieve the carriers business goals and objectives. There is a wealth of information in every carriers database just waiting to be mined, and a wealth of new ways to improve every carriers business as a result of data mining. The data is there. Its now up to every carrier to have the forethought to analyze it, not think of it as a resource only after business goals are not met and revenue or a competitive advantage has been lost. (b) How are the terms knowledge discovery in databases and data mining linked? Explain the phases of KDD.

Ans: Knowledge Discovery in Databases brings together current research on the exciting problem of discovering useful and interesting knowledge in databases. It spans many different approaches to discovery, including inductive learning, bayesian statistics, semantic query optimization, knowledge acquisition for expert systems, information theory, and fuzzy 1 sets. The rapid growth in the number and size of databases creates a need for tools and techniques for intelligent data understanding. Relationships and patterns in data may enable a manufacturer to discover the cause of a persistent disk failure or the reason for consumer complaints. But today's databases hide their secrets beneath a cover of overwhelming detail. The task of uncovering these secrets is called "discovery in databases." This loosely defined subfield of machine learning is concerned with discovery from large amounts of possible uncertain data. Its techniques range from statistics to the use of domain knowledge to control search. Following an overview of knowledge discovery in databases, thirty technical chapters are grouped in seven parts which cover discovery of quantitative laws, discovery of qualitative laws, using knowledge in discovery, data summarization, domain?specific discovery methods, integrated and multi-paradigm systems, and methodology and application issues. An important thread running through the collection is reliance on domain knowledge, starting with general methods and progressing to specialized methods where domain knowledge is built in.

Q8 (a) Explain the term knowledge synthesis, knowledge storage and knowledge maps? Ans: Knowlegde synthesis This is referred to as the 'second-generation knowledge. Knowledge synthesis is a social as well as an individual process. Sharing tacit knowledge requires individuals to share their personal beliefs about a situation with others. At that point of sharing, justification becomes public. Each individual is faced with the tremendous challenge of justifying his or her beliefs in front of others and it is this need for justification, explanation, persuasion and human connection that makes knowledge synthesis a highly fragile process. To bring personal knowledge into a social context, within which it can be amplified or further synthesized, it is necessary to have a field that provides a place in which individual perspectives are articulated, and conflicts are resolved in the formation of higher- level concepts. In a typical organization, the field for interaction is often provided in the form of an autonomous, self-directed work team, made of members from different functional units. It is a critical matter for an organization to decide when and how to establish such a team of interaction in which individuals can meet and interact. This team triggers organization knowledge synthesis mainly through several steps. First, it facilitates the building of mutual trust among members, and accelerates creation of an implicit perspective shared by members as tacit knowledge. The key factor for this step is sharing experience among members. Second, the shared implicit perspective is conceptualized through continuous dialogue among members. The dominant mode of knowledge conversion here is externalization. Tacit field-specific perspectives are converted into explicit concepts that can be shared beyond the boundary of the team. Dialogue directly facilitates this process by activating externalization at the individual levels. It is a process in which one builds concepts in cooperation with others. It provides the opportunity for ones hypothesis or assumption to be tested. As Markova and Foppa (1990) argue, social intercourse is one of the most powerful media for verifying ones own ideas. As such, participants in the dialogue can engage in the mutual co-development of ideas. Next comes the step of justification, which is the process of convergence and screening, which determines the extent to which the knowledge created within the team is truly worthwhile for the organization. Typically, an individual justifies the truthfulness of his or her beliefs based on

observations of the situation; these observations, in turn, depend on a unique viewpoint, personal sensibility, and individual experience. When someone creates knowledge, he or she makes sense out of a new situation by holding justified beliefs and committing to them. Under this definition, knowledge is a construction of reality rather than something that is true in any abstract or universal way. The creation of knowledge is not simply a compilation of facts but a uniquely human process that cannot be reduced or easily replicated. It can involve feelings and belief systems of which we may not even be conscious. Nevertheless, justification must involve the evaluation standards for judging truthfulness. There might also be value premises that transcend factual or pragmatic considerations. The inducements to initiate a convergence of knowledge may be multiple and qualitative rather than simple and quantitative standards. Finally, we arrive at the stage of cross-leveling knowledge. During this stage, the concept that has been created and justified is integrated into the knowledge base of the organization, which comprises a whole network of organizational knowledge. Knowledge maps A knowledge map portrays a perspective of the players, sources, flows, constraints and sinks of knowledge within an organization. It is a navigation aid to both explicit (codified) information and tacit knowledge, showing the importance and the relationships between knowledge stores and the dynamics. The final 'map' can take multiple forms, from a pictorial display to yellow pages directory, to linked topic or concept map, to inventory lists or a matrix of assets against key business processes. Need of knowledge maps to encourage re-use and prevent re-invention, saving search time and acquisition costs to highlight islands of expertise and suggest ways to build bridges to increase knowledge sharing and exchange to discover effective and emergent communities of practice where informal learning is happening to provide baseline data for measuring progress with KM projects and justifying expenditures

to reduce the burden on experts by helping staff to find critical solutions & information quickly to improve customer response, decision making and problem solving by providing access to applicable information, internal and external experts to highlight opportunities for learning and leverage of knowledge through distinguishing the unique meaning of 'knowledge' within that organization to provide an inventory and evaluation of intellectual and intangible assets and assess competitive advantage to supply research for designing a knowledge architecture, making key strategic choices, selecting suitable software or a building corporate memory to garner support for new knowledge initiatives designed to improve the knowledge assets. to find key sources, opportunities and constraints to knowledge creation and flows.

Knowledge Storage

Data warehouses are the main component of KM infrastructure. Organizations store data in a number of databases. The data warehousing process extracts data captured by multiple business applications and organizes it in a way that provides meaningful knowledge to the business, which can be accessed for future reference. For example, data warehouses could act as a central storage area for an organizations transaction data. Data warehouses differ from traditional transaction databases in that they are designed to support decision-making and data processing and analyses rather than simply efficiently capturing transaction data. Knowledge warehouses are another type of data warehouse but which are aimed more at providing qualitative data than the kind of quantitative data typical of data warehouses. Knowledge warehouses store the knowledge generated from a wide range of databases including: data warehouses, work processes, news articles, external databases, web pages and people (documents, etc.). Thus, knowledge warehouses are likely to be virtual warehouses where knowledge is dispersed across a number of servers. Databases and Knowledge bases can be distinguished by the type and characteristics of the data stored. While data in a database has to be

represented in explicit form (generally speaking the information can only be extracted as it is stored in the system), the knowledge-based systems support generation of knowledge that does not explicitly exist in the database. In this way, the data in knowledge bases can be incomplete, fuzzy, and include a factor of uncertainty. The knowledge in the knowledge bases is stored based on rules, allowing a computer to make conclusions like: if all vegetables are plants and if a tomato is a vegetable, then a tomato is also a plant. In this way it is not necessary to store a list of all plants, or all vegetables, in order to get the answer to a question. Data marts represent specific database systems on a much smaller scale representing a structured, searchable database system, which is organised according to the users needs. For example, a supermarket chain may wish to analyse a small, specific piece of information, such as what quantity and type of beer is most consumed during the summer? In this case it is not necessary to process all data about all products in order to undertake this analysis. Data repository is a database used primarily as an information storage facility, with minimal analysis or querying functionality . Content and Document Management Systems represent the convergence of full-text retrieval, document management, and publishing applications. It supports the unstructured data management requirements of knowledge management (KM) initiatives through a process that involves capture, storage, access, selection, and document publication. Content management tools enable users to organize information at an object level rather than in large binary objects or full documents.

Vous aimerez peut-être aussi