The Purple Trap

Data modeling is a methodology used to logically represent the data you store about your business and is an important early step in any Data Warehouse project. The quality and grain of the data identified in the data model is a key step to building a Data Warehouse that can provide insight into your company’s customers, products and business operations.  Determining the correct level of detail or grain of data about your business is a difficult but critical step to success.

As with any IT project, limits of time and resources dictate the quality of your project. In the case of a Data Warehouse project, cost and available resources impact the level of detail data you ultimately capture in the Data Warehouse.  Determining the level of detail is part art and part science. Making the best decision starts with how well you understand your business.  Choosing the correct level of detail will directly impact the quality of analysis and business value the Data Warehouse project provides your company.  But remember, once you have chosen the level of detail for the Data Warehouse it is very expensive and difficult to change.  Essentially your early decisions can trap you with less than optimal results.

Deciding on the appropriate level of detail data to capture is probably one of the most important decisions made in data modeling when trying to provide flexible reporting to the business.  Do you capture the red and blue data separately or capture the myriad shades of purple knowing you can estimate the amount of red in purple.  The answer really depends on the business perspective. If I have a car dealership, I may only be interested in the shades of purple I can offer. On the other hand, if I’m a paint store I may keep the red and blue separate along with the business rules needed to get the right combination of red and blue to create the shade of purple needed.

Using purple as an example, having detail about the shades of purple is perfectly ok if you’re a car dealership but what about a paint store?

Say two paint stores open in your town, Tom’s Purple Paints understands the current market and that only one shade of purple is popular and they get a great deal on purple paint by signing a long term contract to buy that one shade. The Kaleidoscope Paint store opens with the same understanding of the market that the current demand is for one shade of purple. The  Kaleidoscope Paint store makes the strategic decision to sign long term deals to buy red and blue paint because they realize that with minimal effort the popular shade of purple can be mixed from the red and blue therefore Kaleidoscope Paint has the flexibility to react to market changes.

Choosing the grain of data to store in a database is not unlike the decisions these two paint stores made.  They each made strategic decisions about the shade of paint they needed for their businesses. It should be obvious that the Kaleidoscope Paint store is positioned to respond to market changes much faster than Tom’s Purple Paints.  In addition to being able to react to market changes, Kaleidoscope Paints can drive market changes by experimenting with different shades of purple and analyzing customer’s reactions to expand their business.

Data is no different.  It’s just much harder to visualize what you lose when you choose the wrong grain of data to store. Remember that Databases and Data Warehouse projects are long term investments!  You don’t get many opportunities to determine the level of detail to capture.  Make every opportunity to capture the right detail of data.  Choosing the right level of detail in your data is critical to your project’s success and possibly your business’s success. You don’t want to be like Tom’s Purple Paints!

Considerations for choosing the best grain of data for your business:

  1. Understand the essence of your business and business strategy
  2. Avoid over reliance on industry data standardization patterns
  3. Remember patterns are generic so use them for your generic data (e.g. Address)
  4. Evaluate industry standard patterns in the context of your business
  5. Find the data that is specific for your business’ success and own it
  6. Define your data thoroughly so it’s clear what the data means for your business
  7. Clearly indicate in your definitions what grain of data you’re storing
  8. Don’t limit your thinking to the current level of data captured by your business
  9. Listen for what data the business uses for making decisions
  10. Challenge the current understanding of data about your business

Deciding not to track certain facts about your business or generalize too much will limit the value of your Data Warehouse.  Data detail is just like color, you need to understand the grain. The number of colors you can store is limited but storing the base colors creates limitless flexibility because with new formulas you can add new colors with ease. That’s the flexibility a well designed Data Warehouse can provide your business!

illustration by Paul Kulikowski
This entry was posted in Data, Solution Architecture. Bookmark the permalink.