Vous êtes sur la page 1sur 12

22/5/2014 Conformed Dimension Question

http://forum.kimballgroup.com/t2915-conformed-dimension-question 1/12
Kimball Forum
Conformed Dimension Question
Conformed Dimension Question
BICorvette on Tue Apr 08, 2014 7:29 pm
Hello,
I am fairly new to dimensional modeling using the Kimball method and would like to
obtain your opinions on a specific modeling scenario I am encountering.
I have 2 business processes that I have derived from the enterprise business
requirement findings stage of my project which correlate to the publishing of policy
premium related data for automobile insurance policies.
These 2 business processes are listed below.
1. Policy Premium Transactions
2. Policy Premium Periodic Snapshot
With this said, the underlying data source that I will be referencing to build this DW/BI
solution consists of a flat file derived from a mainframe system.
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 2/12
This flat file (+300 million records) is at the transaction level of granularity and contains
a raw dump of each transaction that occurs on a policy as it happens.
This would imply that each record contained within this flat file would contain the
following data:
1. The amount of the policy transaction (total premium)
2. The type of transaction (cancellation, renewal, change, etc..)
3. Many dimension attributes tied to the transaction such as:
a. Policy holder information (age, gender, marital status, etc..)
b. The insurance program
c. Coverage limits information
d. Insured vehicle information such as the make, model, and year of the vehicle.
e. Location of the primary insured and vehicle
It would be important to note that no "customer number" exists from the source
system, and so one would not be able to truly identify if a "John Smith" from policy
"A" written during CY 2005 for example is the same "John Smith" that canceled coverage
on this policy and then later returned as a policy holder to policy "B" during
2007.
The source system simply "dumps" all dimensional attributes that are present on a policy
to this file as transactions are processed during the lifecycle of a policy.
As such, if there are 4 different transactions that are processed during the lifecycle of a
policy it would be possible for the majority of all dimensional attributes
present within each of the 4 transactions to be the same.
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 3/12
This can be illustrated below:
-------------------------------------------------------------------------------------------------
------------------------------------------------------------------
|Policy # | Insured Name | Insured State | Term Eff Date | Term Exp Date |
Coverage | Coverage Limit | Trans Type | Trans Amount| Trans Date
-------------------------------------------------------------------------------------------------
------------------------------------------------------------------
ABC123 John Smith CA 1/1/2012 1/1/2013 Collision
$40,000.00 Renewal $93.12 1/1/2012
ABC123 John Smith CA 1/1/2012 1/1/2013 Collision
$50,000.00 Change $50.23 1/9/2012
ABC123 John Smith CA 1/1/2012 1/1/2013 Collision
$80,000.00 Change $70.23 8/5/2012
ABC123 John Smith CA 1/1/2012 1/1/2013 Collision
$40,000.00 Change ($30.23) 9/8/2012
As shown above, there were 4 transactions that were issued.
Transaction #1 was a renewal transaction where the insured renewed his policy.
Transaction #2 entailed the insured increasing the policy limits of their collision
coverage to $50,000.00
Transaction #3 entailed the insured increasing the policy limits of their collision
coverage to $80,000.00
Transaction #4 entailed the insured decreasing the policy limits of their collision
coverage back to $40,000.00
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 4/12
As illustrated within the abbreviated example above, no unique customer number is
brought in from the source system. Again, it would seem that the source system
simply "dumps" all attributes and facts tied to each transaction into a single transaction
record as illustrated above. These attributes are then repeated again and
again for each transaction regardless if they do not change...
With the information above in mind, I ask myself how I would construct conformed
dimensions using this source transactional data.
I also ask how SCDs would be handled in this scenario such as when the insured moves,
changes marital status, or other events that would trigger changes in
dimensional attributes.
With this said, I am proposing the following approach used to construct and maintain
conformed dimensions based on this transactional source data:
1. Each conformed dimension will be built and maintained by selecting all distinct
combinations of attributes from the source transactional flat file.
a. For example, for the [Policy Holder] dimension, I would select all distinct
combinations of attributes that make up the dimension from the flat file and use these
distinct values to build the dimension.
b. I would then assign a SHAH512 hash value to each row within the dimension
that would be built off of each attribute value within the dimension.
c. When new data arrives, I would simply compare the SHAH512 hash value
derived from the new incoming data to the existing data contained within the dimension
and add or discard the record as needed.
d. I am thinking that type 2 SCDs would not be applicable in this scenario as if the
customer changes his/her address or gets married for example, the updated attribute
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 5/12
values would constitute a new dimension record within the [Policy Holder] dimension,
with the new dimension record being associated to the new fact record.
Is this an acceptable/correct approach to take when modeling conformed
dimensions based on raw data derived from a transaction grained flat file?
Am I correct in assuming that due to the transactional nature of the data, no
type 2 SCD scenarios would exist as they would be handled with each
subsequent
transaction that is processed?
I greatly appreciate your thoughts on this.
Take care,
Re: Conformed Dimension Question
nick_white on Wed Apr 09, 2014 1:25 am
It sounds like the transactional system not having a customer number is a bit of a flaw in
it's design - do you really not know if the same person owns multiple policies? Assuming
redesigning your source system is out of scope for this discussion ...
Is it correct to assume that the owner of a policy cannot change? i.e. if a policy was
owned by person A it can't subsequently be owned by person B? If this is the case then
you can identify individuals via their policy number and use this as part of your SCD
BICorvette
Posts: 4
Join date: 2012-03-04
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 6/12
logic. Obviously this doesn't help you de-dupe your policyholders across policies.
However, if your business cannot identify individuals who hold multiple policies then it
might be worth asking the question why you need the policyholder dimension at all. Is
there actually a requirement to know a policy is held by a 'John Smith' when you have
1000s of John Smiths in your system with no way of knowing which refer to the same
person - what useful reports can this information generate?. Why not pick up the
attributes that do give you useful, reportable, data and put them into relevant
dimension(s) e.g. title, gender, marital status, zipcode, city, state, etc.
Re: Conformed Dimension Question
BICorvette on Wed Apr 09, 2014 2:27 pm
Thank you for your reply Nick.
I agree that the transactional system is flawed by not having a customer number.
I can also agree with the question pertaining to the true business value that an insured
name would provide for business intelligence purposes. Perhaps this sort of lookup would
be better left for the point of entry transactional OLTP system.
I still would like to validate that my approach for building conformed dimensions is
correct.
nick_white
Posts: 103
Join date: 2014-01-06
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 7/12
For example, please consider the following [Coverage] dimension I am proposing:
[Coverage]
----------------------------
Coverage Code
Coverage Description
Coverage Limit 1
Coverage Limit 2
Deductible Amount
Endorsement Code
Endorsement Description
----------------------------
Should the dimension above be built based on selecting all unique value
combinations of these attributes from the source transactional data that would
encompass the dimension using the approach I outlined earlier?
Or should I have one row of data contained within the dimension per policy
number knowing that duplicate attribute values would exist within the
dimension?
I have outlined both of these approaches below:
Source Data:
Based on this data above, one can see that the same exact attribute values are repeated
for different, unique policy numbers.
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 8/12
Approach #1: Selecting All Unique Attribute Values
Using this approach, I would select a distinct list of attribute values from the source data
and condense the values into a unique list of dimension members as shown below:
As shown above, the duplicate values from the source data are condensed into single
unique rows within the dimension table.
Type 2 SCD scenarios would be implicitly handled due to the transactional system
feeding the dimension updated, unique values which would get propagated into the
dimension, and used against new facts.
Approach #2: Duplicate Values Using Business Key
This approach would establish a natural/business key off of the [Policy Number] and
[Vehicle Number] columns and would maintain one record within the resulting
[Coverage] dimension per unique [Policy Number] and [Vehicle Number] value
combination.
This approach is illustrated below:
This approach does duplicate dimensional attribute values across many rows within the
dimension and would grow very large if many policies and vehicles write the same
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 9/12
coverages. I would also assume that type 2 SCDs would be difficult if not impossible to
implement using this approach as if a new coverage is instituted in the source system
with new [Coverage Limit] values for instance, a new record would get inserted into the
[Coverage Dimension] , leaving existing rows "as is".
I'm thinking my first approach above would prove beneficial as the dimension itself
would remain compact and would truly reflect distinct values that could be tied to facts.
Unless I am mistaken, I would also think that SCDs would be non existent (at least for
this dimension) due to any new dimensional attribute values being found within new
transaction records being inserted into the [Coverage Dimension] as new rows.
Which approach would you recommend for modeling this "Coverage" oriented dimension,
as well as other dimensions I would create off of this unstructured transactional data?
I greatly appreciate your input.
Thanks!
Re: Conformed Dimension Question
nick_white on Thu Apr 10, 2014 9:49 am
As a set of principles, Dimension values should be unique and a Dimension should be a
self-contained list of attributes => Approach 1 is probably best: create a record for each
unique combination of attributes and don't start including things like policy keys in it.
BICorvette
Posts: 4
Join date: 2012-03-04
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 10/12
Remember - Dims are linked via Fact tables not directly to each other (ignoring bridge
tables, snowflakes and all the other exceptions to this rule).
Things you might want to consider (not necessarily for this particular case, more general
points):
- if the number of records in this Dim is too large (and it's your judgement call of what
too large means) then probably some of the attributes are not correlated to each other
and you should split them into separate Dims containing only those attributes that do
have a correlation
- if you can pre-determine all the possible combinations of attributes then the best
approach is probably to populate the Dim in a one-off load (plus any subsequent
maintenance that is required). But if this results in a large number of records with a
significant proportion that will never get used (because they are a theoretical
combination rather than one that would occur in the real world) you may want to
reconsider and use the approach in the next point
- if you can't pre-determine the combinations you will have to load the Dim as part of
your transactional load (does the combination of attributes on this record match a record
I already have in the Dim? Y - use that one; N - create a new record in the Dim). This has
performance implications and obviously you have to create the Dim record, and know its
Surrogate Key, before you can load the fact record - so it all takes a bit more thought and
planning
Hope this helps your thought processes a bit more
nick_white
Posts: 103
Join date: 2014-01-06
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 11/12
Re: Conformed Dimension Question
BoxesAndLines on Thu Apr 10, 2014 2:05 pm
Typically, Policies and Coverages have a large number of attributes which make a junk
dimension (distinct rows approach) not feasible. If your data is limited to this example
then your approach should work fine.
Re: Conformed Dimension Question
BICorvette on Thu Apr 10, 2014 8:25 pm
Thank you Nick and BoxesAndLines for your quick and informative replies, it is very much
appreciated.
I will proceed with my first outlined approach (distinct rows approach) keeping the
design concepts and recommendations you outlined in mind.
Thanks again!
BoxesAndLines
Posts: 1046
Join date: 2009-02-03
Location: USA
BICorvette
Posts: 4
22/5/2014 Conformed Dimension Question
http://forum.kimballgroup.com/t2915-conformed-dimension-question 12/12
Home Create free forum | phpBB | Free forum support | Report an abuse | Free forums
Similar topics
question regarding databases and RoR
172 Skyhawk question
Conformed dimension or Degenerate dimension?
Conformed Dimension for Transaction Fact and Accumulating Snapshot Fact Table
Hot swappable dimension and conformed dimension usage (bank/ credit union)
PERMISSIONS IN THIS FORUM:
You cannot reply to topics in this forum
Join date: 2012-03-04

Vous aimerez peut-être aussi