Shippers Foretelling Future Deliveries!

Shippers Foretelling Future Deliveries!

In 2016 it was reported that customers do more shopping online than in the store. Forrester estimates that Amazon accounted for 60% of total US online sales growth in 2016 (1). So with these changes in how we shop, the delivery of goods to our houses has become more frequent and common place. One of our readers found a really good example of data quality relating to these changes in our lives so let's take a look at this in detail.

This time we'll look at Validity and how to use it to maintain your credibility with customers. In the first example, provided by one the blog's readers, we see that a new Wi-Fi router was purchased online and the customer is tracking the delivery progress. It shipped from Harrisburg, Philadelphia successfully on April 24th and the expecting customer is checking the status at 7:16AM on April 25th (Blue Arrow: see local PC's date/time in bottom right corner).

In the green font we can see that the latest update was the arrival of the package in Poughkeepsie, New York at 7:28AM (Green Arrow), but how can that be if the latest update time was at 6:15AM as stated in the header (Red Arrow)? Clearly the system projects the estimated Arrival Scan time based on some update along the trip, which is incorrectly reported as an Arrival Scan. Or maybe the scanner is broken and recorded the wrong time?

So in terms of the Conformed Dimensions of Data Quality, we can say that the Timeliness of information provided is how soon the information is needed, called Time Expectation for Availability. In this case the information was updated at 6:15AM (based on the header), but of concern is that the Arrival Scan says 7:28AM, which if the system had a validity check programmed into it, would have caught the business rule that states that updates cannot happen at any date greater than the current system date-time. This would be an example of Validity dimension, and Underlying Concept of Values Conform to Business Rule.

This brings us to another important question. What if a data quality anomaly could be classified under two or even three dimensions and associated underlying concepts? To answer this it's best to provide an example with associated perspectives. Using the example above, let's list a few ways to classify this anomaly.

 DimensionUnderlying ConceptNote of Explanation
1.ValidityValues Conform to Business RuleAs described above
2.ConsistencyLogical ConsistencyMeasures whether two related data points are in logical agreement, or not, so if the current time is only 7:16AM, how can the Arrival Scan be for some time in the future? These data points are logically inconsistent.
3.AccuracyAgree with Real-WorldClearly based on the information provided one of these data points is wrong due to the inconsistency explained above. Either the current date time on the PC is incorrect, or the future dated Arrival Scan is incorrect, or both.
4.AccuracyMatch to Agreed SourceIt is possible that if we assume accuracy is defined by the system of record (aka the hand-held scanner at the facility which likely recorded the Arrival Scan) then this data point is correct and the PC's system date time is incorrect.

Above I have provided a few different ways that one could choose to report this issue. Which is best? That depends on your customer.

First, a little root cause analysis is required to resolve this to one explanation of true cause, but until that point you may want to report it to different customers in their own preferred methods. For instance, the supply chain operations team may want to know if their scanners are failing, thus prefer methods 3 and 4 that focus on what they need to do--fix the scanner. For the Web development team, they may prefer to see this in context of the fact that their system is reporting something logically impossible, and that if they had a programmatic exception rule in place, they could have notified the upstream source system and reported a more benign message to the end-user, such as "under review".

Personally, I think the word "Accuracy" is over used and I encourage data consumers to use the word "correct" or "incorrect" to refer to whether something is right or wrong and only use "Accuracy" in the context of the dimensions of data quality and its two Underlying Concepts (Agree with Real-world, and Match to Agreed Source).

1. Madeline Farber, "Consumers Are Now Doing Most of Their Shopping Online", June 8, 2016. http://fortune.com/2016/06/08/online-shopping-increases