Fall Harvest Time Reveals Geospatial Data Quality Example

Fall Harvest Time Reveals Geospatial Data Quality Example

During a family visit to a local pumpkin patch (Uesugi Farms) this fall, I was looking for a gas station. The corner, just across from the Farm (yellow arrow below), was identified as a gas station on my car's GPS, but that was just a dirt parking lot. Google's satellite imagery shows this as well in the image below. There was, however, a Valero gas station just down the street, but even that didn't show up in Google's list of local gas stations (blue arrow).

image

So, this got me to thinking about Spatial Data Quality, which we haven't discussed in this blog relative to the Conformed Dimensions of Data Quality (CDDQ) yet. On a side note, my wife works as a GIS Manager for Santa Clara County (SCC), and I knew the County had maps covering this area, so I tried the SCCMap and searching for the nearby cross streets (E Middle Ave and Monterey Rd), found the Valero gas station, and using the identify tool, I identified the parcel where the gas station is located, and its associated address (red arrow below).  

image

To validate that this is indeed the Valero gas station, I then searched the SCC Assessor's map for the name of the business at this location, using the APN we collected from the prior site (82506024), but I couldn't find it so I figured I'd use the Google maps street view and clicked the link provided on the page.

http://maps.google.com/?q=14660+++MONTEREY+RD+%2c+MORGAN+HILL+95037-0000+CA&apnValue=825-06-024

Note that I have added the red font in the URL above to help distinguish the data variables sent to Google for the query.

As seen below, according to Google, the address 14660 Monterey Road is a residence (and Google placed it near the corner of John Wilson Way and Monterey Rd, see the green arrow in the screenshot below). Something is wrong here. What is at 14660? A residence or the Valero gas station? After using the Google street view to navigate down Monterey Rd, I found the Valero at 14631 Monterey Hwy, which is almost a half a mile away (0.42).

image

Now we can categorize this with the Conformed Dimensions of Data Quality's Accuracy Dimension and Underlying Concept of Agree with Real-world, which says, "Degree that data factually represents its associated real-world object, event, or concept," but let's find out how this may have happened.

Addresses can be stored in a map many ways, but typically streets are represented as lines segments with a beginning/ending and left/right attributes assigned to them. As seen in the SCC Interactive map viewer below, Santa Clara County's map has a line segment in front of the Valero parcel showing the address range in the screenshot below. This verifies that the 14660 is on this segment between the address from-value of 14650 to the to-value of 14798).

image

But where in the world is that 14631 address that the Google street view provided for Valero? It turns out that actually that is in the next line segment (past the Middle Ave junction) as shown below. Another interesting note should be made that odd addresses are on the left side of the street (note call out boxes) and even are on the right-hand side, but according to Google maps street view of Valero the address is 14631 (an odd number) so shouldn't that be on the left side of the street not the right side? It seems that Google street view (at least in this scenario) does not correlate direction that users are viewing with address assignment.

image

As we have seen, simple quirky observations about our world around us and data quality in our every-day lives (focus of this blog) reveals excellent examples about how to communicate data quality using the Conformed Dimensions of Data Quality (CDDQ).

As always, if you have additional information regarding what you think the root cause is for this issue identified, please reach out to me dan[at]dqmatters{dot}com or on LinkedIn.