We’ve all had this experience: Your password is required to be changed every so often, and now you’ve got to come up with a new one. While security-conscious people often use password managers that generate completely random passwords, the average person does not use a different password for every site. You wrack your brain trying to come up with a new combination of letters, numbers, punctuation, and ancient Sanskrit to get past the password complexity requirements. After several tries, you modify the final character of your original password from “1” to “2” and submit successfully, and get on with your life.
While this behavior is far from ideal, we all know someone who does this, and have probably even done it ourselves once or twice. Because users have a tendency to reuse or just slightly modify old passwords, knowing a previous password makes guessing the new one much easier for an attacker. This is especially true when attackers analyze massive password leaks and perform statistical analysis to learn our password hygiene habits. Armed with the common ways we manipulate our old passwords, password cracking software re-applies these transformations to dictionaries and lists of previously used passwords to test millions of combinations.
Password reset pages in websites have implemented status bars or indicators to show how “strong” a user’s password is, and whether they’ve adhered to the password complexity standard for each character class (i.e., symbols, letters, numbers). While examining password reset functionality during a recent pentest, we noticed that it is not common for password-changing mechanisms to enforce a minimum criteria for how different a candidate’s new password must be from the original. We propose in this article that the concept of Hamming Distance, which measures password differential, can effectively be used alongside other complexity evaluation mechanisms to easily and efficiently ensure that a newly set password is significantly different from the previous one.
The concept of a Hamming Distance was invented by Richard Hamming in 1950 in his paper “Error detecting and error correcting codes” [https://zoo.cs.yale.edu/classes/cs323/doc/Hamming.pdf]. The original purpose of this new metric was to detect errors between arrays of bits (0’s and 1’s) during transmission. Conceptually, the process is simple: compare two pieces of data, bit by bit, and count the number of differences.
In terms of computational efficiency, this approach is fairly inexpensive. In theory, it requires only a single pass through both strings/arrays, comparing the values at each position and iterating a counter if the values differ. While it’s possible to complicate this process with additional conditions, this is its most basic form.
In relation to passwords, we are at an advantage for implementation given that our checks will happen on the client-side, because in order to change the password for most modern websites, the current password is required. This makes it available and part of the current page context to be used in comparisons, as opposed to requiring a separate database call for retrieval and/or decoding. The user will have already entered the value in plaintext and it will be part of the current evaluation steps on the page.
An additional consideration is what an acceptable minimum threshold for Hamming Distance might be. There doesn’t seem to be an accepted standard for this, but NIST recommends not reusing passwords for at least two years, and that none of the last eight passwords should be reused (https://www-s.nist.gov/NCNR-IMS/passwordTips.jsp). Additionally, they recommend no character being repeated more than 4 times in an 8-character minimum password. Using this as a starting point, we can consider 4 characters as a base score for an acceptable password change.
While this seems deceptively straightforward, there are some issues which may arise: The first is where an individual might use the opposite capitalization case for a position compared to the original password. For example, if the original password is “SalamanderSweetLife23”, they might use a new password: “salamandersweetlife23”. From a pure Hamming Distance perspective, these two passwords differ by a value of 3 and may actually pass a threshold for an acceptable change. However, from a common-sense perspective, this is essentially the same password. One possible solution might be to consider both values the same, regardless of the case. This would be a more aggressive option. Another possibility would be to assign a partial value, such as a 0.5 score, to positions where the characters are the same but the case differs. This would be a less aggressive, and perhaps more acceptable, solution.
Another issue that should be considered is the use of semantic changes within the context of the string itself. An example of this is the use of common password forms like “Summer2023!” and “Winter2023!”. A change of the month subsection of the password would register as a significant difference in terms of Hamming Distance. However, semantically, they’re very similar. This is an issue that our proposal isn’t likely to resolve without a much more complex solution.
Finally, there is the issue of substrings and partial passwords. For example, if a user had an initial password of “SandwichParty123”, but changed it to “wichParty123”, a naive implementation of Hamming Distance would generate a significant score and would result in the password being accepted. The minor shift of text in the second password causes a misalignment that Hamming Distance does not detect.
Typically, Hamming Distance requires that the strings being compared are of the same length. A workaround for this is padding of the smaller string to match; however, where to pad the string and with what presents some difficulty. This is a complex issue and as a result, it may be worthwhile to pad only at the end of the string and accept this as a limitation of the implementation. Using this approach, the comparison is no worse than not doing Hamming Distance checks at all, and in most cases, presents a better solution.
More involved implementations are left as an exercise for the reader.
def HammingDistance(item1, item2):
item1_size = len(item1)
item2_size = len(item2)
size_diff = abs(item1_size - item2_size)
format_str = "{:<" + str(size_diff) + "}"
if item1_size > item2_size:
# pad item 2
format_str.format(item2)
elif item2_size > item1_size:
# pad item 1
format_str.format(item1)
else:
# same size, continue
pass
hamming_distance = 0
for i in range(item1_size):
if item1[i] != item2[i]:
hamming_distance += 1
return hamming_distance
if __name__ == "__main__":
print("Hamming Distance password assessment:")
print("Checking simple change:")
orig_pass = "HotdogLife123"
new_pass = "HotdogLife124"
print("Original password: " + orig_pass)
print("New password: " + new_pass)
distance = HammingDistance(orig_pass, new_pass)
print("Hamming distance is: " + str(distance))
if (distance > 4):
print("Password is acceptable.")
else:
print("Password is unacceptable.")
print("Checking complex change:")
orig_pass = "HotdogLife123"
new_pass = "NowItIsDifferent45%"
print("Original password: " + orig_pass)
print("New password: " + new_pass)
distance = HammingDistance(orig_pass, new_pass)
print("Hamming distance is: " + str(distance))
if (distance > 4):
print("Password is acceptable.")
else:
print("Password is unacceptable.")
% python3 hamming.py
Hamming Distance password assessment:
Checking simple change:
Original password: HotdogLife123
New password: HotdogLife124
Hamming distance is: 1
Password is unacceptable.
Checking complex change:
Original password: HotdogLife123
New password: NowItIsDifferent45%
Hamming distance is: 12
Password is acceptable.
While the use of Hamming Distance for password evaluation is not groundbreaking research, the author had noted a lack of similar mechanisms in modern password complexity check functionality. Hamming Distance could provide a simple and easy-to-implement method for developers to increase the security evaluation of new passwords created by users of a website or application. Organizations should also consider a re-alignment of their password hygiene requirements to better align with modern research on effective password habits, as provided by NIST.
Interested in learning more about applications of the Hamming Distance? Consider reading our Blog on Screenshotting Tools like GoWitness, which you can use to help manage your network inventory and detect malicious or unapproved devices connected to your network. These tools use image compression techniques with Hamming Distance to identify anomalies.