Vous êtes sur la page 1sur 31

BI Architecture Drivers to Capitalize on New Technologies

Hadoop:

The goal is to design and build a data warehouse / business intelligence (BI) architecture that
provides a flexible, multi-faceted analytical ecosystem for each unique organization.

A traditional BI architecture has analytical processing first pass through a data warehouse.

In the new, modern BI architecture, data reaches users through a multiplicity of organization data
structures, each tailored to the type of content it contains and the type of user who wants to
consume it.

The data revolution (big and small data sets) provides significant improvements. New tools like
Hadoop allow organizations to cost-effectively consume and analyze large volumes of semi-
structured data. In addition, it complements traditional top-down data delivery methods with more
flexible, bottom-up approaches that promote predictive or exploration analytics and rapid
application development.

In the above diagram, the objects in blue represent traditional data architecture. Objects in pink
represent the new modern BI architecture, which includes Hadoop, NoSQL databases, high-
performance analytical engines (e.g. analytical appliances, MPP databases, in-memory databases),
and interactive, in-memory visualization tools.

Most source data now flows through Hadoop, which primarily acts as a staging area and online
archive. This is especially true for semi-structured data, such as log files and machine-generated
data, but also for some structured data that cannot be cost-effectively stored and processed in SQL
engines (e.g. call center records).

From Hadoop, data is fed into a data warehousing hub, which often distributes data to downstream
systems, such as data marts, operational data stores, and analytical sandboxes of various types,
where users can query the data using familiar SQL-based reporting and analysis tools.

Today, data scientists analyze raw data inside Hadoop by writing MapReduce programs in Java and
other languages. In the future, users will be able to query and process Hadoop data using familiar
SQL-based data integration and query tools.

The modern BI architecture can analyze large volumes and new sources of data and is a significantly
better platform for data alignment, consistency and flexible predictive analytics.

Thus, the new BI architecture provides a modern analytical ecosystem featuring both top-down and

bottom-up data flows that meet all requirements for reporting and analysis.

In the top-down world, source data is processed, refined, and stamped with a predefined data
structure--typically a dimensional model--and then consumed by casual users using SQL-based
reporting and analysis tools. In this domain, IT developers create data and semantic models so
business users can get answers to known questions and executives can track performance of
predefined metrics. Here, design precedes access. The top-down world also takes great pains to
align data along conformed dimensions and deliver clean, accurate data. The goal is to deliver a
consistent view of the business entities so users can spend their time making decisions instead of
arguing about the origins and validity of data artifacts.

Creating a uniform view of the business from heterogeneous sets of data is not easy. It takes time,
money, and patience, often more than most departmental heads and business analysts are willing to
tolerate. They often abandon the top-down world for the underworld of spreadmarts and data
shadow systems. Using whatever tools are readily available and cheap, these data hungry users
create their own views of the business. Eventually, they spend more time collecting and integrating
data than analyzing it, undermining their productivity and a consistent view of business information.

The bottom up world is a different process. Modern BI architecture creates an analytical ecosystem
that brings prodigal data users back into the fold. It allows an organization to perform true ad hoc
exploration (predictive or exploratory analytics) and promotes the rapid development of analytical
applications using in-memory departmental tools. In a bottom-up environment, users can't
anticipate the questions they will ask on a daily or weekly basis or the data they'll need to answer
those questions. Often, the data they need doesn't yet exist in the data warehouse.

The modern BI architecture creates analytical sandboxes that let power users explore corporate and
local data on their own terms. These sandboxes include Hadoop, virtual partitions inside a data
warehouse, and specialized analytical databases that offload data or analytical processing from the
data warehouse or handle new untapped sources of data, such as Web logs or machine data. The
new environment also gives department heads the ability to create and consume dashboards built
with in-memory visualization tools that point both to a corporate data warehouse and other
independent sources.

Combining top-down and bottom-up worlds is challenging but doable with determined commitment.

BI professionals need to guard data semantics while opening access to data.

Business users need to commit to adhering to data standards.

Further, well designed data governance programs are an absolute requirement

Recently in Hadoop and NoSQL Category


Part IV - Architecting for Analytics

The prior article in this series discussed the human side of analytics. It explained
how companies need to have the right culture, people, and organization to succeed
with analytics. The flip side is the "hard stuff"- the architecture, platforms, tools, and
data--that makes analytics possible. Although analytical technology gets the
lionshare of attention in the trade press--perhaps more than it deserves for the value
it delivers--it nonetheless forms the bedrock of all analytical initiatives. This article
examines the architecture, platforms, tools, and data needed to deliver robust
analytical solutions.

Architecture

The term "analytical architecture" is an oxymoron. In most organizations, business


analysts are left to their own devices to access, integrate, and analyze data. By
necessity, they create their own data sets and reports outside the purview and
approval of corporate IT. By definition, there is no analytical architecture in most
organizations--just a hodge-podge of analytical silos and spreadmarts, each with
conflicting business rules and data definitions.
Analytical sandboxes. Fortunately, with the advent of specialized analytical
platforms (discussed below), BI architects have more options for bringing business
analysts into the corporate BI fold. They can use these high-powered database
platforms to create analytical sandboxes for the explicit use of business analysts.
These sandboxes, when designed properly, give analysts the flexibility they need to
access corporate data at a granular level, combine it with data that they've sourced
themselves, and conduct analyses to answer pressing business questions. With
analytical sandboxes, BI teams can transform business analysts from data pariahs to
full-fledged members of the BI community.

There are four types of analytical sandboxes:

Staging Sandbox. This is a staging area for a data warehouse that contains raw,
non-integrated data from multiple source systems. Analysts generally prefer to query
a staging area that contains all the raw data than each source system individually.
Hadoop is a staging area for large volumes of unstructured data that a growing
number of companies are adding to their BI ecosystems.

Virtual Sandbox. A virtual sandbox is a set of tables inside a data warehouse


assigned to individual analysts. Analysts can upload data into the sandbox and
combine it with data from the data warehouse, giving them one place to go to do all
their analyses. The BI team needs to carefully allocate compute resources so
analysts have enough horsepower to run ad hoc queries without interfering with other
workloads running on the data warehouse.

Free-standing sandbox. A free-standing sandbox is a separate database server


that sits alongside a data warehouse and contains its own data. It's often used to
offload complex, ad hoc queries from an enterprise data warehouse and give
business analysts their own space to play. In some cases, these sandboxes contain
a replica of data in the data warehouse, while in others, they support entirely new
data sets that don't fit in a data warehouse or run faster on an analytical platform.

In-memory BI sandbox. Some desktop BI tools maintain a local data store, either in
memory or on disk, to support interactive dashboards and queries. Analysts love
these types of sandboxes because they connect to virtually any data source and
enable analysts to model data, apply filters, and visually interact with the data
without IT intervention.

Next-Generation BI Architecture. Figure 1 depicts a BI architecture with the four


analytical sandboxes colored in green. The top half of the diagram represents a
classic top-down, data warehousing architecture that primarily delivers interactive
reports and dashboards to casual users (although the streaming/complex event
processing (CEP) engine is new.) The bottom half of the diagram depicts a bottom-
up analytical architecture with analytical sandboxes along with new types of data
sources. This next-generation BI architecture better accommodates the needs of
business analysts and data scientists, making them full-fledged members of the
corporate BI ecosystem.

Figure 1. The New BI Architecture

The next-generation BI architecture is more analytical, giving power users greater


options to access and mix corporate data with their own data via various types of
analytical sandboxes. It also brings unstructured and semi-structured data fully into
the mix using Hadoop and nonrelational databases.

Analytical Platforms

Since the beginning of the data warehousing movement in the early 1990s,
organizations have used general-purpose data management systems to implement
data warehouses and, occasionally, multidimensional databases (i.e., "cubes") to
support subject-specific data marts, especially for financial analytics. General-
purpose data management systems were designed for transaction processing (i.e.,
rapid, secure, synchronized updates against small data sets) and only later modified
to handle analytical processing (i.e., complex queries against large data sets.) In
contrast, analytical platforms focus entirely on analytical processing at the expense
of transaction processing.

The analytical platform movement. In 2002, Netezza (now owned by IBM),


introduced a specialized analytical appliance, a tightly integrated, hardware-software
database management system designed explicitly to run ad hoc queries against
large volumes of data at blindingly fast speeds. Netezza's success spawned a host
of competitors, and there are now more than two dozen players in the market. (see
Table 1).
Table 1. Types of Analytical Platforms

Today, the technology behind analytical platforms is diverse: appliances, columnar


databases, in memory databases, massively parallel processing (MPP) databases,
file-based systems, nonrelational databases and analytical services. What they all
have in common, however, is that they provide significant improvements in price-
performance, availability, load times and manageability compared with general-
purpose relational database management systems. Every analytical platform
customer I've interviewed has cited an order-of-magnitude performance gains that
most initially don't believe.

Moreover, many of these analytical platforms contain built-in analytical functions that
make life easier for business analysts. These functions range from fuzzy matching
algorithms and text analytics to data preparation and data mining functions. By
putting functions in the database, analysts no longer have to craft complex, custom
SQL or offboard data to analytical workstations, which limits the amount of data they
can analyze and model.

Companies use analytical platforms to support free-standing sandboxes (described


above) or as replacements for data warehouses running on MySQL and SQL Server,
and occasionally major OLTP databases from Oracle and IBM. They also improve
query performance for ad hoc analytical tools, especially those that connect directly
to databases to run queries (versus those that download data to a local cache.)
Analytical Tools

In 2010, vendors turned their attention to meeting the needs of power users after
ten years of enhancing reporting and dashboard solutions for casual users. As a
result, the number of analytical tools on the market has exploded.

Analytical tools come in all shapes and sizes. Analysts generally need one of every
type of tool. Just as you wouldn't hire a carpenter to build an addition to your house
with just one tool, you don't want to restrict an analyst to just one analytical tool. Like
a carpenter, an analyst needs a different tool for every type of job they do. For
instance, a typical analyst might need the following tools:

Excel to extract data from various sources, including local files, create reports, and
share them with others via a corporate portal or server (managed Excel).
BI Search tools to issue ad hoc queries against a BI tool's metadata.
Planning tools (including Excel) to create strategic and tactical plans, each
containing multiple scenarios.
Mashboards and ad hoc reporting tools to create ad hoc dashboards and reports
on behalf of departmental colleagues
Visual discovery tools to explore data in one or more sources of data and create
interactive dashboards on behalf of departmental colleagues
Multidimensional OLAP (MOLAP) tools to explore small and medium sets of data
dimensionally at the speed of thought and run complex dimensional calculations.
Relational OLAP tools to explore large sets of data dimensionally and run complex
calculations
Text analytics tools to parse text data and put it in a relational structure for analysis.
Data mining tools to create descriptive and predictive models.
Hadoop and MapReduce to process large volumes of unstructured and semi-
structured data in a parallel environment.

Figure 2. Types of Analytical Tools

Figure 2 plots these tools on a graph where the x axis represents calculation
complexity and the y axis represents data volumes. Ad hoc analytical tools for casual
users (or more realistically super users) are clustered in the bottom left corner of the
graph, while ad hoc tools for power users are clustered slightly above and to the
right. Planning and scenario modeling tools cluster further to the right, offering
slightly more calculation complexity against small volumes of data. High-powered
analytical tools, which generally rely on machine learning algorithms and specialized
analytical databases, cluster in the upper right quadrant.

Data

Business analysts function like one-man IT shops. They must access, integrate,
clean and analyze data, and then present it to other users. Figure 2 depicts the
typical workflow of a business analyst. If an organization doesn't have a mature data
warehouse that contains cross-functional data at a granular level, they often spend
an inordinate amount of time sourcing, cleaning, and integrating data. (Steps 1 and 2
in the analyst workflow.) They then create a multiplicity of analytical silos (step 5)
when they publish data, much to the chagrin of the IT department.

Figure 2. Analyst Workflow

In the absence of a data warehouse that contains all the data they need, business
analysts must function as one-man IT shops where they spend an inordinate amount
of time iterating between collecting, integrating, and analyzing data. They run into
trouble when they distribute their hand-crafted data sets broadly.

Data Warehouse. The most important way that organizations can improve the
productivity and effectiveness of business analysts is to maintain a robust data
warehousing environment that contains most of the data that analysts need to
perform their work . This can take many years. In a fast-moving market where the
company adds new products and features continuously, the data warehouse may
never catch up. But, nonetheless, it's important for organizations to continuously add
new subject areas to the data warehouse, otherwise business analysts have to
spend hours or days gathering and integrating this data themselves.

Atomic Data. The data warehouse also needs to house atomic data, or data at the
lowest level of transactional detail, not summary data. Analysts generally want the
raw data because they can repurpose in many different ways depending on the
nature of the business questions they're addressing. This is the reason that highly
skilled analysts like to access data directly from source systems or a data warehouse
staging area. At the same time, less skilled analysts appreciate the heavy lifting done
by the IT group to clean and integrate disparate data sets using common metrics,
dimensions, and attributes. This base level of data standardization expedites their
work .

Once a BI team integrates a sufficient number of subject areas in a data warehouse


at an atomic level of data, business analysts can have a field day. Instead of
downloading data to an analytical workstation, which limits the amount of data they
can analyze and process, they can now run calculations and models against the
entire data warehouse using analytical functions built into the database or that
they've created using database development toolkits. This improves the accuracy of
their analyses and models and saves them considerable time.

Summary

The technical side of analytics is daunting. There are many moving parts that all
have to work synergistically together. However, the most important part of the
technical equation is the data. The old adage holds true: "garbage in, garbage out."
Analysts can't deliver accurate insights if they don't have access to good quality
data. And it's a waste of their time to spend days trying to prepare the data for
analysis. A good analytics program is built on a solid data warehousing foundation
that embeds analytical sandboxes tailored to the requirements of individual analysts.

POSTED NOVEMBER 15, 2011 7:44 AM


PERMALINK | NO COMMENTS |

Part III - The Human Side of Analytics

Advanced analytics promises to unlock hidden potential in organizational data. If


that's the case, why have so few organizations embraced advanced analytics in a
serious way? Most organizations have dabbled with advanced analytics, but outside
of credit card companies, online retailers, and government intelligence agencies, few
have invested sufficient resources to turn analytics into a core competency.

Advanced analytics refers to the use of machine learning algorithms to unearth


patterns and relationships in large volumes of complex data. It's best applied to
overcome various resource constraints (e.g., time, money, labor) where the output
justifies the investment of time and money. (See "What is Analytics and Why Should
You Care?" and "Advanced Analytics: Where Do You Start?")

Once an organization decides to invest in advanced analytics, it faces many


challenges. To succeed with advanced analytics, organizations must have the right
culture, people, organization, architecture, and data. (See figure 1.) This is a tall
task. This article examines the "soft stuff" required to implement analytics--the
culture, people, and organization--the first three dimensions of the analytical
framework in figure 1. A subsequent article examines the "hard stuff"--the
architecture, tools, and data.
Figure 1. Framework for Implementing Advanced Analytics

The Right Culture

Culture refers to the rules--both written and unwritten--for how things get done in an
organization. These rules emanate from two places: 1) the words and actions of top
executives and 2) organizational inertia and behavioral norms of middle
management and their subordinates (i.e., "the way we've always done it.") Analytics,
like any new information technology, requires executives and middle managers to
make conscious choices about how work gets done.

Executives. For advanced analytics to succeed, top executives must first establish a
fact-based decision making culture and then adhere to it themselves. Executives
must consciously change the way they make decisions. Rather than rely on gut feel
alone, executives must make decisions based on facts or intuition validated by data.
They must designate authorized data sources for decision making and establish
common metrics for measuring performance. They must also hold individuals
accountable for outcomes at all levels of the organization.

Executives also need to evangelize the value and importance of fact-based decision
making and the need for a performance-driven culture. They need to recruit like-
minded executives and continuously reinforce the message that the organization
"runs on data." Most importantly, they not only must "talk the talk," they must "walk
the walk." They need to hold themselves accountable for performance outcomes and
use certifiable information sources, not resort to their trusted analyst to deliver the
data view they desire. Executives who don't follow their own rules send a cultural
signal that this analytics fad will pass and so it's "business as usual."

Managers and Organizational Inertia. Mid-level managers often pose the biggest
obstacles to implementing new information technologies because their authority and
influence stems from their ability to control the flow of information, both up and down
organizational ladders. Mid-level managers have to buy into new ways of capturing
and using information for the program to succeed. If they don't, they, too, will send
the wrong signals to lower level workers. To overcome organizational inertia,
executives need to establish new incentives for mid-level managers and hold them
accountable for performance metrics aligned with strategic goals around the decision
making and the use of information.

The Right People

It's impossible to do advanced analytics without analysts. That's obvious. But hiring
the right analysts and creating an environment for them to thrive is not easy.
Analysts are a rare breed. They are critical thinkers who need to understand a
business process inside and out and the data that supports it. They also must be
computer-literate and know how to use various data access, analysis, and
presentation tools to do their jobs . Compared to other employees, they are
generally more passionate about what they do, more committed to the success of
the organization, more curious about how things work, and more eager to tackle new
challenges.

But not all analysts do the same kind of work , and it's important to know the
differences. There are four major types of analysts:

Super Users. These are tech-savvy business users who gravitate to reporting and
analysis tools deployed by the business intelligence (BI) team. These analysts
quickly become the "go to" people in each department to get an ad hoc report or
dashboard, if you don't want to wait for the BI team. While super users don't normally
do advanced analytics, they play an important role because they offload ad hoc
reporting requirements from more skilled analysts.

Business Analysts. These are Excel jockeys that executives and managers answer
to create and evaluate plans, crunch numbers, and generally answer any question an
executive or manager might have that can't be addressed by a standard report or
dashboard. With training, they can also create analytical models.

Analytical Modelers. These analysts have formal training in statistics and a data
mining workbench, such as those from IBM (i.e., SPSS) or SAS. They build
descriptive and predictive models that are the heart and soul of advanced analytics.

Data Scientists. These analysts specialize in analyzing unstructured data, such as


Web traffic and social media. They write Java and other programs to run against
Hadoop and NoSQL databases and know how to write efficient MapReduce jobs
that run in "big data" environments.
Where You Find Them. Most organizations struggle to find skilled analysts. Many
super users and business analysts are self-taught Excel jockeys, essentially tech-
savvy business people who aren't afraid to learn new software tools to do their jobs
. Many business school graduates fill this role, often as a stepping stone to
management positions. Conversely, a few business-savvy technologists can grow
into this role, including data analysts and report developers who have a proclivity
toward business and working with business people.

Analytical modelers and data scientists require more training and skills. These
analysts generally have a background in statistics or number crunching. Statisticians
with business knowledge or social scientists with computer skills tend to excel in
these roles. Given advances in data mining workbenches, it's not critical that
analytical modelers know how to write SQL or code in C, as in the past. However,
data scientists aren't so lucky. Since Hadoop is an early stage technology, data
scientists need to know the basics of parallel processing and how to write Java and
other programs in MapReduce. As such, they are in high demand right now.

The Right Organization

Business analysts play a key role in any advanced analytics initiative. Given the
skills required to build predictive models, analysts are not cheap to hire or easy to
retain. Thus, building the right analytical organization is key to attracting and
retaining skilled analysts.

Today, most analysts are hired by department heads (e.g., finance, marketing, sales,
or operations) and labor away in isolation at the departmental level. Unless given
enough new challenges and opportunities for advancement, analysts are easy
targets for recruiters.

Analytics Center of Excellence. The best way to attract and retain analysts is to
create an Analytics Center of Excellence. This is a corporate group that oversees
and manages all business analysts in an organization. The Center of Excellence
provides a sense of community among analysts and enables them to regularly
exchange ideas and knowledge. The Center also provides a career path for analysts
so they are less tempted to look elsewhere to advance their careers. Finally, the
Center pairs new analysts with veterans who can give them the mentoring and
training they need to excel in their new position.

The key with an Analytics Center of Excellence is to balance central management


with process expertise. Nearly all analysts should be embedded in departments and
work side by side with business people on a daily basis. This enables analysts to
learn business processes and data at a granular level while immersing the business
in analytical techniques and approaches. At the same time, the analyst needs to
work closely with other analysts in the organization to reinforce the notion that they
are part of a larger analytical community.

The best way to accommodate these twin needs is by creating a matrixed analytical
team. Analysts should report directly to department heads and indirectly to a
corporate director of analytics or vice versa. In either case, the analyst should
physically reside in his assigned department most or all days of the week, while
participating in daily "stand up" meetings with other analysts so they can share ideas
and issues as well as regular off-site meetings to build camaraderie and develop
plans. The corporate director of analytics needs to work closely with department
heads to balance local and enterprise analytical requirements.

Summary

Advanced analytics is a technical discipline. Yet, some of the keys to its success
involve non-technical facets, such as culture, people, and organization. For an
analytics initiative to thrive in an organization, executives must create a fact-based
decision making culture, hire the right people, and create an analytics center of
excellence that attracts, trains, and retains skilled analysts.

POSTED NOVEMBER 7, 2011 9:45 AM


PERMALINK | NO COMMENTS |

BI Ecosystem of the Future

Business intelligence is changing. I've argued in several reports that there is no longer just one
intelligence--i.e., business intelligence--but multiple intelligences, each supporting a unique
architecture, design framework, end-users, and tools. But all these intelligences are still designed to
help business users leverage information to make smarter decisions and support the creation of either
reporting or analysis applications.

The four intelligences are:

1. Business Intelligence. Addresses the needs of "casual users," delivering reports,


dashboards, and scorecards tailored to each user's role, populated with metrics aligned with
strategic objectives and powered by a classic data warehousing architecture.

2. Analytics Intelligence. Addresses the needs of "power users," providing ad hoc access to
any data inside or outside the enterprise to answer business questions that can't be identified
in advance using spreadsheets, desktop databases, OLAP tools, data mining tools and visual
analysis tools.

3. Continuous Intelligence. Collects, monitors, and analyzes large volumes of fast-changing


data to support operational processes. It ranges from near real-time delivery of information
(i.e., hours to minutes) in a data warehouse to complex event processing and streaming
systems that trigger alerts.

4. Content Intelligence. Gives business users the ability to analyze information contained in
documents, Web pages, email messages, social media sites and other unstructured content
using NoSQL and semantic technology.
You may wonder how all these intelligences fit together architecturally. They do, but it's not the clean,
neat architecture that you may have seen in data warehousing books of yore. Figure 1 below depicts
a generalized architecture that supports the four intelligences.

Figure 1. BI Ecosystem of the Future

The top half of the diagram represents the classic top-down, data warehousing architecture that
primarily delivers interactive reports and dashboards to casual users (although the streaming/complex
event processing (CEP) engine is new.) The bottom half of the diagram adds new architectural
elements and data sources that better accommodate the needs of business analysts and data
scientists and make them full-fledged members of the corporate data environment.

A recent report I wrote describes the components of this architecture in some detail and provides
market research on the adoption of analytic platforms (e.g. DW appliances and columnar and MPP
databases), among other things. The report is titled: "Big Data Analytics: Profiling the Use of
Analytical Platforms in User Organizations." You can download it for free at Bitpipe by clicking on the
hyperlink in the previous sentence.

Since "Multiple Intelligences" framework and BI ecosystem that supports it represent what I think the
future holds for BI, I'd love to get your feedback.

The Next Wave in Big Data Analytics: Exploiting Multi-core Chips and SMP
Machines

As companies grapple with the gargantuan task of processing and analyzing "big data," certain technologies
have captured the industry limelight, namely massively parallel processing (MPP) databases, such as those from
Aster Data and Greenplum; data warehousing appliances, such as those from Teradata, Netezza, and Oracle;
and, most recently, Hadoop, an open source distributed file system that uses the MapReduce programming
model to process key-value data in parallel across large numbers of commodity servers.
SMP Machines. Missing in action from this list is the venerable symmetric multiprocessing (SMP) machine that
parallelizes operations across multiple CPUs (or cores) . The industry today seems to favor "scale out" parallel
processing approaches (where processes run across commodity servers) rather than "scale up" approaches
(where processes run on a single server.) However, with the advent of multi-core servers that today can pack
upwards of 48 cores in a single CPU, the traditional SMP approach is worth a second look for processing big
data analytics jobs .

The benefits of applying parallel processing within a single server versus multiple servers are obvious: reduced
processing complexity and a smaller server footprint. Why buy 40 servers when one will do? MPP systems
require more boxes, which require more space, cooling, and electricity. Also, distributing data across multiple
nodes chews up valuable processing time and overcoming node failures, which are more common when you
string together dozens, hundreds, or even thousands of servers into a single, coordinated system, adds to
overhead, reducing performance.

Multi-Core CPUs. Moreover, since chipmakers maxed out the processing frequency of individual CPUs in 2004,
the only way they can deliver improved performance is by packing more cores into a single chip. Chipmakers
started with two-core chips, then quad-cores, and now eight- and 16-core chips are becoming commonplace.

Unfortunately, few software programs that can benefit from parallelizing operations have been redesigned to
exploit the tremendous amount of power and memory available within multi-core servers. Big data analytics
applications are especially good candidates for thread-level parallel processing. As developers recognize the
untold power lurking within their commodity servers, I suspect next year that SMP processing will gain an
equivalent share of attention among big data analytic proselytizers.

Pervasive DataRush

One company that is on the forefront of exploiting multi-core chips for analytics is Pervasive Software, a $50
million software company that is best known for its Pervasive Integration ETL software (which it acquired from
Data Junction) and Pervasive PSQL, its embedded database (a.k.a. Btrieve.)

In 2009, Pervasive released a new product, called Pervasive DataRush, a parallel dataflow platform designed to
accelerate performance for data preparation and analytics tasks. It fully leverages the parallel processing
capabilities of multi-core processors and SMP machines, making it unnecessary to implement clusters (or MPP
grids) to achieve suitable performance when processing and analyzing moderate to heavy volumes of data.

Sweet Spot. As a parallel data flow engine, Pervasive DataRush is often used today to power batch processing
jobs , and is particularly well suited to running data preparation tasks (e.g. sorting, deduplicating, aggregating,
cleansing, joining, loading, validating) and machine learning programs, such as fuzzy matching algorithms.

Today, DataRush will outperform Hadoop on complex processing jobs that address data volumes ranging from
500GB to tens of terabytes. Today, it is not geared to handling hundreds of terabytes to petabytes of data, which
is the territory for MPP systems and Hadoop. However, as chipmakers continue to add more cores to chips and
when Pervasive releases DataRush 5.0 later this year which supports small clusters, DataRush's high-end
scalability will continue to increase.

Architecture. DataRush is not a database; it's a development environment and execution engine that runs in a
Java Virtual Machine. Its Eclipse-based development environment provides a library of parallel operators for
developers to create parallel dataflow programs. Although developers need to understand the basics of parallel
operations--such as when it makes sense to partition data and/or processes based on the nature of their
application-- DataRush handles all the underlying details of managing threads and processes across one or more
cores to maximize utilization and performance. As you add cores, DataRush automatically readjusts the
underlying parallelism without forcing the developer to recompile the application.

Versus Hadoop. To run DataRush, you feed the execution engine formatted flat files or database records and it
executes the various steps in the dataflow and spits out a data set. As such, it's more flexible than Hadoop, which
requires data to be structured as key-value pairs and partitioned across servers, and MapReduce, which forces
developers to use one type of programming model for executing programs. DataRush also doesn't have the
overhead of Hadoop, which requires each data element to be duplicated in multiple nodes for failover purposes
and requires lots of processing to support data movement and exchange across nodes. But like Hadoop, it's
focused on running predefined programs in batch jobs , not ad hoc queries.

Competitors. Perhaps the closest competitors to Pervasive DataRush are Ab Initio, a parallelizable ETL tool,
and Syncsort, a high-speed sorting engine. But these tools were developed before the advent of multi-core
processing and don't exploit it to the same degree as DataRush. Plus, DataRush is not focused just on back-end
processing, but can handle front-end analytic processing as well. Its data flow development environment and
engine are generic. DataRush actually makes a good complement to MPP databases, which often suffer from a
data loading bottleneck. When used as a transformation and loading engine, DataRush can achieve 2TB/ hour
throughput, according to company officials.

Despite all the current hype about MPP and scale-out architectures, it could be that scale-out architectures that
fully exploit multi-core chips and SMP machines will win the race for mainstream analytics computing. Although
you can't apply DataRush to existing analytic applications (you have to rewrite them), it will make a lot of sense to
employ it for most new big data analytics applications.

POSTED JANUARY 4, 2011 11:17 AM


PERMALINK | NO COMMENTS |

Fathoming NoSQL: Membase Tackles Data Intensive Web Applications

I recently spoke with James Phillips, co-founder and senior vice president of products, at Membase, an emerging
NoSQL provider that powers many highly visible Web applications, such as Zynga's Farmville and AOL's ad
targeting applications. James helped clarify for me the role of NoSQL in today's big data architectures.

Membase, like many of its NoSQL brethren, is an open source, key-value database. Membase was designed to
run on clusters of commodity servers so it could "solve transaction problems at scale," says Philips. Because of
its transactional focus, Membase is not technology that I would normally talk about in the business intelligence
(BI) sphere.

Same Challenges, Similar Solutions

However, today the transaction community is grappling with many of the same technical challenges as the BI
community--namely, accessing and crunching large volumes of data in a fast, affordable way. Not coincidentally,
the transactional community is coming up with many of the same solutions--namely, distributing data and
processing across multiple nodes of commodity servers linked via high-speed interconnects. In other words, low-
cost parallel processing.

Key-Value Pairs. But the NoSQL community differs in one major way from a majority of analytics vendors
chasing large-scale parallel processing architectures: it relinquishes the relational framework in favor of key-value
pair data structures. For data-intensive, Web-based applications that must dish up data to millions of concurrent
online users in the blink of an eye, key-value pairs are a fast, flexible, and inexpensive approach . For example,
you just pair a cookie with its ID, slam it into a file with millions of other key-value pairs, and distribute the files
across multiple nodes in a cluster. A read works in reverse: the database finds the node with the right key-value
pair to fulfill an application request and sends it along.

The beauty of NoSQL, according to Philips, is that you don't have to put data into a table structure or use SQL
to manipulate it. "With NoSQL, you put the data in first and then figure out how to manipulate it," Phillips says.
"You can continue to change the kinds of data you store without having to change schemas or rebuild indexes
and aggregates." Thus, the NoSQL mantra is "store first, design later." This makes NoSQL systems highly
flexible but programmatically intensive since you have to build programs to access the data. But since most
NoSQL advocates are application developers (i.e. programmers), this model aligns with their strengths.

In contrast, most analytics-oriented database vendors and SQL-oriented BI professionals haven't given up on the
relational model, although they are pushing it to new heights to ensure adequate scalability and performance
when processing large volumes of data. Relational database vendors are embracing techniques, such as
columnar storage, storage-level intelligence, built-in analytics, hardware-software appliances, and, of course,
parallel processing across clusters of commodity servers. BI professionals are purchasing these purpose-built
analytical platforms to address performance and availability problems first and foremost and data scalability
issues secondarily. And that's where Hadoop comes in.

Hadoop. Hadoop is an open source analytics architecture for processing massively large volumes of structured
and unstructured data in a cost-effective manner. Like its NoSQL brethren, Hadoop abandons the relational
model in favor of a file-based, programmatic approach based on Java. And like Membase, Hadoop uses a
scale-out architecture that runs on commodity servers and requires no predefined schema or query language.
Many Internet companies today use Hadoop to ingest and pre-process large volumes of clickstream data which
are then fed to a data warehouse for reporting and analysis. (However, many companies are also starting to run
reports and queries directly against Hadoop.)

Membase has a strong partnership with Cloudera, one of the leading distributors of open source Hadoop
software. Membase wants to create bidirectional interfaces with Hadoop to easily move data between the two
systems.

Membase Technology

Membase's secret sauce--the thing that differentiates it from its NoSQL competitors, such as Cassandra,
MongoDB, CouchDB, and Redis--is that it incorporates Memcache, an open source, caching technology.
Memcache is used by many companies to provide reliable, ultra-fast performance for data-intensive Web
applications that dish out data to millions of current customers. Today, many customers manually integrate
Memcache with a relational database that stores cached data on disk to store transactions or activity for future
use.

Membase, on the other hand, does that integration upfront. It ties Memcache to a MySQL database which stores
transactions to disk in a secure, reliable, and highly performant way. Membase then keeps the cache populated
with working data that it pulls rapidly from disk in response to application requests. Because Membase distributes
data across a cluster of commodity servers, it offers blazingly fast and reliable read/write performance required
by the largest and most demanding Web applications.

Document Store. Membase will soon transform itself from a pure key-value database to a document store (a la
MongoDB.) This will give developers the ability to write functions that manipulate data inside data objects stored
in predefined formats (e.g. JSON, Avro, or Protocol Buffers.) Today, Membase can't "look inside" data objects to
query, insert, or append information that the objects contain; it largely just dumps object values into an
application.

Phillips said the purpose of the new document architecture is support predefined queries within transactional
applications. He made it clear that the goal isn't to support ad hoc queries or compete with analytics vendors:
"Our customers aren't asking for ad hoc queries or analytics; they just want super-fast performance for pre-
defined application queries."

Pricing. Customers can download a free community edition of Membase or purchase an


annual subscription that provides support, packaging, and quality assurance testing. Pricing starts at $999 per
node

The most optimal BI organization adopts a federated structure which blends a mix of
centralized and decentralized features. I discussed the basic features of a federated BI
organization in my last blog entry and the journey most organizations take to get there. (See
"Organizing the BICC Part I: Move to the Middle"). This blog discusses the division of
responsibilities between a corporate BI team and embedded BI teams in a federated BI
organization.
Organizational federation is not a new or unique concept. For instance, most countries have
a federated government. In the United States, federal, state, and municipal governments
share civic responsibility. The federal government funds and manages issues of national
concern, such as defense, immigration , and interstate commerce, while state and
municipal governments fund and oversee local functions, such as policing, transportation,
and trash disposal. However, there is a fuzzy boundary between jurisdictions, requiring
federal, state, and local officials to collaborate to meet the needs of citizens.
The same is true in the world of BI. Corporate and departmental BI teams focus on different
tasks but ultimately share responsibility for the delivery of BI solutions and must work in
concert to meet business needs. However, there is a reasonably clear division of
responsibility in two areas: data management and report development.

Data Management
The corporate BI team is chiefly responsible for managing data that is shared among two or
more business units. In other words, the corporate BI team builds and maintains an
enterprise data warehouse (EDW) and dependent data marts geared to individual
departments. The corporate BI team also facilitates data governance processes to create
standard definitions for commonly used metrics, dimensions, and hierarchies. It publishes
these definitions into a data dictionary shared by all business units and embeds them into a
business model within the enterprise BI tool.
Report Management

On the reporting side, departments assume the lionshare of responsibility for creating
departmental dashboards and scorecards. The BI Reporting Framework (see figure 2)
depicts this division of responsibility. The left half of the circle shows the reporting
responsibilities of the corporate BI team and right half shows the reporting responsibilities of
business units. These responsibilities are divided by the other axis, which represents "top
down BI" and "bottom up" BI.
Figure 2. BI Reporting Framework

Top Down and Bottom Up. In top-down BI, the corporate BI team delivers standard reports
and dashboards to casual users. The team gathers requirements, models and sources the
data, and then loads it into the data warehouse. Developers then build reports and
dashboards that dynamically query data warehouse data. In contrast, in bottom-up BI, power
users query the data warehouse directly, in an ad hoc fashion. They explore, analyze, and
integrate data from various systems and then create reports based on their findings, which
they often publish to departmental colleagues and executives. Basically, top-down BI
delivers standard reports to casual users, and bottom-up BI world enables power users to
explore and analyze data and create ad hoc reports.
Enterprise BI. Thus, the upper left quadrant in figure 2 represents the intersection of
enterprise BI and top-down BI. This is where corporate BI developers (i.e. enterprise) create
production reports and dashboards (i.e. top-down) that would be difficult for any single
division to create on their own. In the bottom left quadrant, corporate statisticians or data
scientists (i.e. enterprise) who are aligned with individual divisions but not collocated, explore
and query data in an ad hoc fashion to create predictive learning models.
Divisional BI. In the bottom-right quadrant, business unit analysts in each division use self-
service BI tools to analyze data and create ad hoc reports to display their insights. If
executives and managers want to continue seeing these reports on a regular basis, the
business unit analysts turn them over to the collocated BI professionals who convert them
into production reports (top-right quadrant.) This handoff between analysts and embedded BI
staff is critically important, but rarely happens in more organizations. Too often, business unit
analysts publish reports and then end up perpetually maintaining them. This is a job they
aren't trained or paid to do and keeps them from spending time on more value-added tasks,
like analyzing the business.
To succeed, there needs to be a bidirectional flow of information between each of the
sectors as depicted in figure 2.
Figure 2. Handoffs between BI Sectors

Going around the diagram, we can describe the handoffs that occur between each group in
a BI Center of Excellence:

Enterprise BI Team-->Departmental BI Team (Top left to top right): The enterprise


BI team has data and reporting professionals who specialize in specific tools.
Besides creating complex, cross-functional reports and dashboards, they provide first
line-of-support to embedded BI professionals who are more business oriented than
tools oriented.
Departmental BI Team-->Enterprise BI Team (top left to top right): Conversely, the
embedded BI professionals help gather requirements for complex cross-functional
applications and communicate them to enterprise BI reporting specialists.

Departmental BI Team-->Departmental Analysts (top right to bottom right):


Embedded BI professionals provide first line of support to business analysts who
need to learn how to use self-service BI tools to explore data and create ad hoc
reports.
Departmental Analysts-->Departmental BI Team (
bottom right to top right): Conversely, analysts hand over their ad hoc reports to
embedded BI professionals to convert them into production reports.

Departmental Analysts-->Data Scientists (Bottom right to bottom left):


Departmental analysts submit requests to data scientists for more complex analyses
than they can perform and work with them to gather requirements and data sets.
Data Scientists-->Departmental Analysts (Bottom left to bottom right): Conversely,
data scientists aligned with a department provide insights to business analysts based
on data models they've developed.
Data Scientists-->Enterprise BI Team. (Bottom left to top left): Data scientists
deliver model scores to the enterprise BI team, which incorporates them into complex
reports. (Ditto with departmental reports.)
Enterprise BI Team-->Data Scientists (Top left to bottom left): The enterprise BI
team delivers data and requirements to the data scientists who use them to create
analytical models.

Bridging the disparate worlds of top-down and bottom-up BI and enterprise and divisional BI
is not easy. It requires a lot of communication and collaboration. The structure that glues
together these BI sectors requires both matrixed reporting relationships and a multi-
layered governance body. This is the topic of my next blog.
POSTED OCTOBER 17, 2013 10:54 AM
PERMALINK | NO COMMENTS |

Organizing the BICC Part I: Move to the Middle


The origin of every business intelligence (BI) team is quite simple. A company starts doing
business and then recognizes it needs to track how it's doing. So business unit heads hire
analysts who create a bevy of reports using whatever low-cost tools they can find, usually
Excel and Access. Although managers get answers to their questions, it comes at a high
cost: analysts spend more time integrating data than analyzing it and create a bevy of
redundant data silos that are costly to maintain and prevent executives from getting a single,
integrated view of the business. When the CEO can't get an answer to a simple question,
such as "How many customers do we have?" or the CFO sees a red balance sheet caused
by a proliferation of redundant people and systems, they take action.
Usually, executives decide to move reporting, analysis and data management out of the
business units and into a shared corporate service. By centralizing the BI function,
executives pull BI-related professionals out of the business units and put them onto an
enterprise BI team that is charged with building an enterprise data warehouse and creating
all reports and dashboards for business units. The goal of the new group is to align the
business with uniform data and deliver cost savings through economies of scale. This
reorganization swings the pendulum from a decentralized approach to managing BI to a
centralized one.
Problems
All goes well until business units start to grumble that the new corporate BI team isn't
meeting their information needs in a timely manner. Whereas in the past, a business
person could initiate a project by talking with an analyst in the next cubicle, now she needs
to submit a proposal to the corporate BI team, participate in a scoping process, and then
wait for the BI team to decide whether or not to do the project. And this is all before a single
line of code gets written. Once a project finally starts, the BI team takes longer to build the
solution because it's now a step removed from the business unit and doesn't know its
processes, data or people well enough to work efficiently or effectively. Given these
circumstances, some business unit heads decide they can't wait for corporate BI to meet
their needs and hire analysts to build reports for their group, effectively replacing the ones
that the corporate BI team "stole" from them.
Finding the Middle
By swinging from one end of the organizational spectrum to the other--from decentralized to
centralized--the BI program becomes a bottleneck for getting things done. At this point,
some executives throw up their hands and slash the BI budget or outsource it. Yet
enlightened executives seek counsel to find a more nuanced way to structure the BI program
so that it marries the best of both worlds. In essence, they find a middle ground that I call this
a federated BI organizationthat delivers both the nimbleness and agility of a decentralized
approach and the standards and consistency of a centralized approach. (See figure
below.)
Figure 1. Evolution of BI Organizations

A federated BI team maintains a unified BI architecture that delivers consistent, accurate,


timely, and relevant data that makes the CEO happy. And it delivers economies of scale that
makes the CFO happy. Moreover, because a federated BI organization embeds BI
professionals in the divisions, it delivers BI solutions quickly, which makes business unit
heads happy. Finally, through cross-training and support provided by collocated BI
professionals, business analysts finally become proficient with self-service BI tools, which
makes them happy.
So there is a lot to like with a federated BI organization, and very little to dislike. In essence,
this approach creates a common charter that impels the business and IT to collaborate at a
deep and more productive level. The only real challenge is managing the web of matrixed
relationships and co-managed teams that maintain the proper balance between business
and IT. We'll discuss these relationships in a future blog post.

POSTED OCTOBER 17, 2013 10:39 AM


PERMALINK | NO COMMENTS |

Classifying Business Users


In my last blog, I made the case for both classifying and certifying the analytical capabilities
of power users. In short, classifying power users helps business intelligence (BI) teams
better understand and serve the information needs of power users and shows executives
where and how to beef up their organization's analytical talent. And certifying power users
motivates them to upgrade their analytical capabilities to achieve greater status, pay, and
responsibility. (See "Classifying and Certifying BI Users").
As I mentioned last time, every BI team should classify their users, either using my scheme
below or creating one of their own. This takes time but pays big dividends. Tailoring BI
functionality and report design to individual information requirements increases the likelihood
that users will adopt BI tools. Some BI vendors enforce this discipline by offering named user
licenses based on functionality, but this can be restrictive since most users play multiple
roles during the course of a day. It's better to tailor BI functionality in the administrative
console.
More importantly, business users should be cognizant of their classification when using BI
tools and reports. They should see their status (e.g. "Class I: Viewer") in the heading of each
report they use and be able to view its description by hovering their mouse over the text.
Ideally, users should be able to click on a button that changes their report interface from
Class I to Class II or Class III and back again. BI tools that expose and hide functionality on
demand drive higher levels of BI adoption. If anyone has done anything remote similar to
this, let me know!!
Casual User Classifications
Casual users are business people who use information to do their jobs . They mostly
consume information artifacts that others create (i.e. power users.) Figure 1 presents three
levels of casual users: Class I Viewer, Class II Navigator, and Class III Explorer.
Figure 1. Casual User Classifications

Class I: Viewer. A Viewer is an executive, salesperson, or front-line worker who views


information displayed on a screen and rarely interacts with it. Executives and salespeople
may not have the time or inclination to interact with the display, while front-line folks don't
have the time. A Viewer is often reared on spreadsheets and prefers a tabular view of data
with all information on a single page. If a Viewer has a question about the data, he'll pick up
the phone and call an analyst (especially old-school executives.) And he prefers receiving
reports via email, although many are now entering the digital age with tablet computers and
are now receptive to viewing information on these devices.
Class II: Navigator. A Navigator is typically a manager or knowledge worker who needs to
monitor and manage the performance of a team and present the results to executives. Thus,
a Navigator is more inclined to drill into the data to view more detail about an issue. She may
also pivot dimensions or sort, rank, or add columns in a table or create custom groups (if it's
a one-click function) and perform "what-if analyses" (ditto.) She may still call an analyst if
she gets hung up after four or five clicks or can't find what she's looking for. A Navigator
typically wants to interact with charts and view tabular data when examining detail. She
prefers a browser-based interface and is increasingly using tablets as the interactivity of
mobile BI displays improves.
Class III: Explorer. An Explorer is the archetypal BI user: a business user who uses a BI
tool not only to view and interact with predefined reports and dashboards but explore data
presented in a BI semantic layer and create simple reports and dashboards for themselves
and colleagues. I've called these folks "super users" in the past: business users who
gravitate to a BI tool, become proficient with it, and become the "go to" person in their
department to get a custom report. In essence, Explorers are bonafide analysts. In fact, a
Class III Explorer (casual user) is the same as a Class I Explorer (power user).
Power User Classifications
Like casual users, there are three classes of power users: Class I: Explorer, Class II:
Analyst, and Class III: Data Scientist. (See figure 2.) It's perhaps more important to classify
power users because, unlike casual users, they access data not reports and generate data
that others consume. Therefore, it's critical to assess their data, analysis, and publishing
capabilities. They are the eyes and ears of the BI team in the business units and must be
trusted to accurately gather and display information upon which executives, managers, and
others make critical decisions.
Figure 2. Power User Classifications

Class I: Explorer. As mentioned above, a Class I Explorer (power user) is identical to a


Class III Explorer (casual user.) An Explorer is really a super user who uses a BI tool not
only to view and interact with predefined reports and dashboards but explore data
presented in a BI semantic layer and create simple reports and dashboards for themselves
and colleagues. An Explorer has at least basic knowledge of the business and can use a BI
tool to create custom groups and hierarchies and assemble and publish dashboards from
predefined objects. In other words, an Explorer knows how to use a BI tool's ad hoc query
and publishing capabilities. If motivated, he can easily become a Class II Analyst and do
analytical work full time.
Class II: Analyst. An Analyst explores and combines data at a deeper level than the
Explorer. An Analyst queries the data warehouse directly, combining the data with local files
via custom joins and data scrubbing functions. An Analyst has greater knowledge of the
business than an Explorer, having spent three to five years in the industry and one to two
years in a specific department learning its people, processes, data, and applications. An
Analyst can perform more complex analyses and knows basic statistics and is familiar with
statistical or machine learning tools. The Analyst can create dashboards from ad hoc queries
and custom views and publish them to various groups.
Class III: Data Scientist. The Data Scientist is the ultimate power user whom you entrust to
access data in its raw form in a staging area or source system and create accurate queries,
joins, reports, and models from the data. They have deep knowledge of the business, its
processes, applications, and data with three to five years of experience in both the industry
and an individual business unit. The know how to integrate and transform complex data and
use statistical and machine learning tools to create complex analytical models. The best
ones can also program queries in a variety of languages to access non-relational data (e.g.
Hadoop) and display the results using data visualization software.
Summary. Hopefully, these classifications will inspire you to create a similar set of
classifications of your organization's users. Knowing your users is the first step toward
delivering BI services that users want and use. And when you publicize these classifications,
it may inspire business users to upgrade their data and analytical capabilities, which will reap
dividends for the individuals, your BI program, and the organization as a whole.

POSTED SEPTEMBER 24, 2013 11:25 AM


PERMALINK | NO COMMENTS |

Classifying and Certifying BI Users


Are you getting the level of BI adoption you promised executives? Do you see an initial spike
of BI activity when you deploy a new BI tool or report which then trails off? Do you fear that
executives will give your BI program (or your job ) the axe next fiscal year because the BI
program isn't delivering enough bang for the buck?
If so, welcome aboard. You are not alone. But that doesn't make the situation any less dire.
Your rate of active user adoption directly indicates the success of your BI program. Low
adoption not only means you get less dollars at budget time, it also likely means that
business users have abandoned your BI tools and data in favor of some "non-standard"
environment. Worse yet, it may indicate that users have given up entirely and no longer seek
to use data to make decisions.
Know your Audience
Although myriad problems contribute to low user adoption, the primary one for BI
professionals is that they are woefully ill-informed about the business people they support.
Most BI professionals have a mass-market mentality: they think all users have the same
information needs and requirements. As a result, they provide everyone the same
homogeneous stew of data, views, and tools. And then they're perplexed why so few
business people use the BI tools and reports they provide. But the reason is obvious.
Any good novelist, painter, screenwriter or marketer knows that to connect with an audience
you have to know what makes them tick. Many create a mental or visual profile of their target
viewer and keep it front and center while creating their work of art or message. This helps
them put the reader or viewer front and central so that whatever they create resonates more
deeply and profoundly with their audience. Every artist knows that unless you connect with
your audience, your beautiful creation will be ignored at best and lampooned at worst.
Like artists and marketers, BI professionals must know their audience, but even moreso.
That's because BI users are a diverse lot. There are many gradations of information
requirements. And most business people switch roles multiple times a day. A business
person who needs a simple static dashboard to manage one part of his job may need to
combine a local Excel file with raw data from the warehouse in another part. Keeping track of
user roles is a full time job, but one that is critical to the success of any BI program.
Classifying Business Users
For years, I've written and spoken about two camps of BI users: casual users and power
users. This is the most basic classification scheme, but it adheres to the 80/20 rule.
Understanding the differences between casual and power users delivers 80% of the benefit
when rolling out BI tools and reports. The basic difference is that casual users require
structured access to predefined sets of data through interactive reports and dashboards
tailored to individual roles, while power users explore data in a variety of systems in an ad
hoc fashion.
Although most BI managers understand the differences between casual and power users,
most don't act on their knowledge. Their biggest blunder is trying to gather requirements for
power users. Ha! That never works because power users will simply say, "Give me all the
data." Nonetheless, many BI programs continue to bang their head against that
requirements wall.
But this is beside the point. Even if an organization understands and acts on the differences
between casual and power users, they still may not achieve a high degree of user adoption
because they've failed to comprehend the remaining nuances of the way their customers
consume information. Consequently, they never achieve the final 20% of benefits from their
BI initiatives that results in high-levels of adoption and user satisfaction.
Classifying Power Users
To help BI professionals create a more nuanced view of their audience, I've created a
classification scheme for one of their key group: power users. (In future blogs, I'll present
classification schemes for casual users, BI professionals, report writers, and ETL
developers.) Traditionally, I classify power users by their business role: super user, business
analyst, statistician, and data scientist. That's a fair classification but I've never elaborated
on the nuances of how each type uses information.
Figure 1 defines three classes of power users by four dimensions: business knowledge,
analytical skills, data integration skills, and publishing skills. This is a good start to a formal
classification scheme, but it needs further refinement to be useful. Please send me your
feedback and perhaps we can create an industry standard scheme that benefits everyone.
Figure 1. Power User Classifications

Certifying Power Users


More important than the content of the classification scheme is how organizations use it. My
hope is that BI programs will collaborate with their human resources departments to create a
formal certification program based on this (or a similar) classification. In a certification
program, each power user receives a rating or classification based on some formal
yardstick, such a training class they've taken, test scores they've achieved, or a real-world
project they've managed, or some combination of all three.
Power users receive a certificate or badge when they achieve a new level in the
classification scheme. Call this the "gamification" of BI, but I think it will provide greater
clarity around power user skills and requirements as well as motivate analysts and their
managers to upgrade their analytical capabilities. And who doesn't want a badge to wear or
display in their office showing their professional accomplishment and status with the
organization?
Although establishing a certification program may seem like a lot of work , it offers
numerous benefits:

1. Customer Knowledge. BI teams can understand the types of power users in their
organization, better anticipate their needs, and tailor access to their requirements.
2. Departmental Deficits. Executives know which departments or business units lack
the power users required to support various types of analytical initiatives.
3. Self Knowledge. Business analysts understand where they stand in the spectrum of
analytical capabilities and it gives them a concrete set of objectives to pursue to
move up the ranks.
4. Training. Executives will be motivated to establish and fund training programs and
career paths to improve the analytical capabilities of the organization.

A formal classification scheme becomes a palpable way to accelerate adoption of your BI


environment and increase your organization's analytical maturity. By delineating types of
users at a granular level and formalizing their existence through a certification program, your
BI staff will better serve the information needs of its audience and accelerate adoption.
Moreover, a certification program will encourage analysts to upgrade their analytical skills
and show executives when and how to invest in upgrading their organization's analytical
capabilities.

POSTED SEPTEMBER 23, 2013 9:39 AM


PERMALINK | NO COMMENTS |

Self-Service BI Tips
One of the biggest paradoxes in business intelligence (BI) is that self-service BI requires a
lot of hand holding to succeed. That was the predominant sentiment voiced in a recent
Webcast panel I conducted with Laura Madsen, a healthcare BI consultant at Lancet,
Russell Lobban, director of BI and customer analytics at Build.com, and Brad Peters, CEO
and co-founder of the multi-faceted BI vendor, Birst. (The Webcast will soon air on
SearchBusinessAnalytics (date TBD.)
Training required. The panelists iterated the need for significant training and support to
ensure the success of a self-service BI initiative. Madsen said companies should implement
multi-modal training (i.e., Web, classroom, self-pace) on a continuous basis. Peters said
many of his customers are effectively using social media to grease self-service BI wheels;
specifically, discussion forums that enable users to share experiences and answer each
other's questions. Lobban said it's critical to offer an integrated data dictionary that defines
data elements used in the BI tool.
Know your audience. Another critical success factor is knowing the audience for self-
service BI. Peters, for example, said there are two types of self-service: "data self service"
for power users and "business self service" for casual users. Power users require ad hoc
access to data using visualization tools that enable them to explore application data and
local files. Casual users, on the other hand, need structured access to data via a semantic
layer that adds business context. Although a semantic layer requires time and effort to build,
all three panelists said it is critical to the success of any self-service BI program and helps
ensure that users make accurate decisions.
Lobban emphasized the importance of starting small and iterating quickly. He said users
often get discouraged when they don't find the data they need. Thus, it's critical for the BI
team to add new data quickly to keep up with user requirements. The other major challenge
Lobban sees with self-service BI is that business users often misinterpret the data in reports
and dashboards. This is especially true for new hires or transfers from other departments.
Thus, it's critical new hires and transfers get mentored by experienced BI users, either in
person or virtually via help desks, training classes, or online forums.
Which tools? The panel also spent a lot of time discussing the types of tools that are best
suited to self-service BI. Most thought the new generation of in-memory visualization tools
are great for power users who can navigate their way through existing databases and
applications, but inadequate for casual users who need more structured access to data. The
panel also discussed the benefits of traditional OLAP tools versus the new visualization
tools. The consensus is that OLAP tools still play an important role in BI tool portfolios
because they provide robust dimensional views and calculations that new visualization tools
don't support.
Finally, the panel discussed the tradeoffs between best of breed versus all-in-one BI suites.
Best of breed tools provide the best functionality available or satisfy the parochial needs of
individual workgroups or departments, while BI suites provide an integrated experience and
architecture that addresses the entire spectrum of BI needs in an organization and is thus
easier to administer. Ultimately, the panel agreed that each organization needs to decide
which approach best fits their individual requirements and culture.
Summary. Self-service BI is the holy grail for BI professionals, but it has been difficult to
achieve. BI practitioners and business users expect self-service BI to be easy when it's not.
It requires clean, comprehensive data, integrated metadata, and continuous training and
support. Ultimately, self-service BI is the true test of a BI program's overall maturity

Cognos

Vous aimerez peut-être aussi