Recently, when flying Southwest, I needed to reset my password. I wouldn’t have thought Data Quality applied to such a simple task, but of course they rejected my new password. I always use a password generator to create strong passwords. By default, I create all new passwords with a decent length including all three categories (letters with mixed case, numbers and Special Characters). So when I submitted my password I was prompted with the following error.
First, I have to commend Southwest for providing the list of possible special characters but when it comes to troubleshooting data quality issues, exception handling usually is best conducted by using the error value not the domain of valid values. I had no way of knowing which character in my submitted password was causing the error. My recommendation to organizations in this situation would be to identify any special characters that your system does not allow and include a comment with those instead.
I’d program the app to record the number of new password attempts that fail and the character(s) that cause it. I’d then use this frequency analysis to order (from left to right) the list of disallowed values shown. This should help people identify which character is causing an issue faster. Note that this is a one-time activity and need not be conducted very often. I’ll go a little further on how I’d count these in a secure way.
You do NOT want to save the password that they submit for obvious security and architectural reasons, so for instance, you count the number of times each disallowed character is used by users and store the result only (see below). Note that the first column is the value entered and only used for the calculation real-time, but not stored.
Assumption: Disallowed characters are: $ and # (read this on why some characters are disallowed and why previous reasons for disallowing certain characters are less relevant these days)
The first question you ask yourself is, how do you want to use this data? This helps you identify how to model it. For instance, do I want one row per new password attempt? If so, I may have up to as many rejections per row as there are disallowed characters. Then in order to store the count by character you’ll have to add additional columns (e.g. one column for the “*” count…etc).
In the example above, I prefer to create a loop and insert a new row for each disallowed character. If for some reason I want to tie them together I’ll add a rejection ID column that has a unique ID for each time a user attempts to create a new password but fails. For extra credit, use this or other tool to identify what type of hash I used here for my ID. This data structure allows me to count the number of offending characters for a specific period very easily (and even identify which characters are most frequently repeated within a password).
Assuming that this data provided above, the sequence that I’d cite the disallowed characters above would be hash first and then dollar sign (# $), because the hash is used more frequently.
In Conclusion
The right level of communication is extremely important when dealing with data quality. If you’re concerned about ensuring a simple UI, allow users to drill into more information (e.g. click a link to see the full list of disallowed characters and even tell them why this constraint exists). This builds trust with users and reduces the count of help desk tickets. Also list exceptions, at least in addition to allowed values but NOT only allowed values. Generally speaking, the exception list is smaller than the allowed values list.