Vous êtes sur la page 1sur 10

Week 2: Alteryx + Tableau = Magic!

There was no slowing down this week. Our project was predominantly Alteryx based a
software we had never used before so this was a real uphill battle! The theme was Education and
the task was to find Education data on the Internet, use Alteryx to prepare and collate it then use
Tableau to create a Viz.

Fortunately the almighty Alteryx Ace Chris Love provided us with the ammunition to combat the
challenge by teaching us about connecting to data, dataset manipulation, data parsing, data
blending, web scraping and where to look for help. Always impressed by experts in a subject
who are able to pass on their knowledge so well!

As usual, Andy had lined up another prominent figure to speak at the Data School; yes we are
becoming a spoiled bunch! This week Tableau CMO, Elissa Fink talked to us about how Tableau
is marketed. Elissa was very informative, encouraging and simply magnificent. The more I hear
about how Tableau operates, the more I fall in love with it. Tableaus success has been built on
putting us customers and the community first and at the heart of everything they do. And no
doubt, they use Tableau themselves in the Marketing team too!

During the week we were also fortunate to have Laszlo Zsom talk us through the differences in
table joins between Alteryx, Tableau and SQL as well as consultants Mike Lowe and Robin
Kennedy who further prepared us for the Tableau Desktop Qualified Associate exam by
delivering some more Tableau training.

Back to Alteryx. My aim for this project was to drive more insight from The Compete University
Guides university ranking tables. I wanted to import all 9 years worth of league tables by the 67
subjects along with the overall league table for every year, meaning I needed to import 9 x 68 =
612 league tables. Now imagine going to each and every one of those 612 web pages, copying
and pasting the tables, removing and column headers, making sure all the data was in the right
format and so on. The whole process would be boring and repetitive and wouldve taken days if
not weeks to complete! Luckily, I was able to do this in no time once I figured out how to use
Alteryx.

To begin with, I notice a pattern emerging with all the URLs:

Overall table for 2016 no year at the end

o http://www.thecompleteuniversityguide.co.uk/league-tables/rankings?

Overall tables by year year at the end

o http://www.thecompleteuniversityguide.co.uk/league-tables/rankings?y=2015

Subject tables for 2016 no year at the end


o http://www.thecompleteuniversityguide.co.uk/league-tables/rankings?
s=Accounting+%26+Finance

Subject tables by year year at the end

o http://www.thecompleteuniversityguide.co.uk/league-tables/rankings?
s=Accounting+%26+Finance&y=2015

My first task therefore was to generate all 612 URLs using the first part of the URL, which they
all had in common, and XML elements for all 67 subjects. I got the XML elements by right
clicking on the webpage that displayed all the hyperlinks to the subject league tables and clicking
on view source code.

Below is the workflow I created in Alteryx to generate all 612 URLs.


I used import.io to extract the 612 league tables of data. My next task was to extract the subject
names and years from the URLs, I was thinking one step ahead here, as I knew I needed a
column for Subject and Year so the end user can filter the league tables by these two variables.
This led me to create the workflow below.

The rest was straightforward. I used Tableau to create my Viz. Click here to interact with it on
Tableau Public. It allows you to see the number of universities you specify ranked and on a map
by year, subject and one of four indicators, explained here on their website.
Quite astonished I was able to do this! Previously thought you had to be a hardcore programmer
to do things like web scraping! Excited by the possibilities of using Alteryx in combination with
Tableau, especially since we accomplished so much in so little time using Alteryx!

Show Table of Contents

You are here: Tool Palette > All Tools > Behavior Detail Fields

About the Behavior Detail Fields Tool


The Behavior Details tool returns detailed field information at the Cluster or Group level specific
to the Profile.

Configuration Properties
1 Choose a Dataset: Select the Dataset to use. Each dataset has its own Profiles and Profile
Sets that are specific to the selected clustering system. These datasets require a current
subscription and license. Please contact your Alteryx account representative for more
information regarding compatible datasets.

o Note: For best results, keep your datasets consistent with each Behavior tool.
Choosing "Most Recent Vintage" rather than a specific dataset ensures the most
current dataset is used and won't require updating your workflow. You can easily
specify the dataset in multiple tools at once through Workflow Dependencies to
ensure consistency throughout your workflow.

o Note: You can specify the default dataset from User Settings. Go to Tools --> User
Settings and click on the Dataset Defaults tab.

1 Analyze: Select the data field that represents the Profile to use for analysis.

1 Using (Optional): Select the data field that represents the Profile to use to compare the
Analyze Profile against for the fields selected below.

1 Show By: Record level of the data. Choices include Cluster or Group. Clusters are rolled
into Groups.

1 Demographic: Classification level the Profile was built with. Choices include Auto-
Detect, Household, Person, or Adult, but will vary by vendor.

1 Select Output Fields: Use the checkboxes to select the desired output fields. The buttons
All and Clear can help with multiple selections.

o Use Long Names: When checked, will return field names as they appear below.
When left unchecked, shorter less descriptive field names will be used. The table
below displays available fields for output.

Field Name Short Names


Analyze - % Base
A_BASE_PER
Count

Analyze - % Count A_USERS_PER

Analyze - Average
A_AVG_VOL
Volumetric Value

Analyze - Base
A_BASE
Count

Analyze - Count A_USERS

Analyze - Index A_PENI_A_TOT

Analyze -
A_PEN
Penetration

Analyze - Total
A_VOL
Volumetric Value

Analyze Scaled to
Country - Base A_BASE_US
Count

Analyze Scaled to
A_USERS_US
Country - Count

Cluster -
DESC
Description
Cluster - Global
UG
Mosaic Group

Cluster - Index PEN_IDX

Cluster - Mosaic
MF
Group

Cluster -
CLUSTER_PER
Penetration

Cluster - Volume
VOL_PEN_IDX
Index

Cluster - Volume
VOL_PEN
Penetration

Cluster Number CLUSTER_NUM

Demographic Demographic

Market Potential -
MKPOT_PER
% Count

Market Potential -
MKPOT_VOL_PER
% Volumetric

Market Potential -
MKPOT
Count
Market Potential -
MKPOT_VOL
Volumetric

Using - % Base
U_BASE_PER
Count

Using - % Count U_USERS_PER

Using - Average
U_AVG_VOL
Volumetric Value

Using - Base Count U_BASE

Using - Count U_USERS

Using - Index U_PENI_U_TOT

Using - Penetration U_PEN

Using - Total
U_VOL
Volumetric Value

Using Scaled to
Country - Base U_BASE_US
Count

Using Scaled to
U_USERS_US
Country - Count
Click Apply to have the configurations accepted.

Note: For information regarding Input, Output, Annotation and Error Properties, see Tool
Properties.

Related Topics

Send comments on this topic to the Alteryx Community.

Alteryx, Inc. * 230 Commerce * Suite 250 * Irvine, CA * 96202-1338 * Telephone: (888) 255-
1207 *www.alteryx.com *

Data Integrity refers to the accuracy and consistency of data stored in a


database, data warehouse, data mart or other construct, and it is a fundamental component of any
analytic workflow. In Alteryx, creating a macro to compare expected values to actual values in
your data is quite simple and provides a quality control check before producing a visual report.
Let me show you how to build this.

The two inputs represent the actual and expected values in your data. These data streams are
passed through a Record ID tool to keep positional integrity and then passed on to the Transpose
tool to create two columns. The first column contains the field names and the second column
shows the values within each field. This data is then passed on to a join, matching on Record ID
and the Name of the field, in order to compare each value. Lastly, if the data does not match from
expected to actual, a custom message will appear in the results messages alerting the user where
the mismatch happened within the dataset. The image below shows the error message produced
if values differ across datasets.

Vous aimerez peut-être aussi