PrecisionCalc
inspector
text
Do Anything with Text in Formulas

 

itFUZZYCOMPARE

Compares two text strings. Returns the string they have in common, or a fuzzy comparison score.

You can also use itFUZZYCOMPARE to find which items in a list most closely resemble a target item. See the last example below.
 

How It Works:

itFUZZYCOMPARE starts by making a first pass through the two input text strings, finding in-order matching characters. Then, it assigns a score based on the proportion of the length of the matching text string to the longest input text string, or the shortest input text string, or the average of the two, depending on what is specified in return_type.

Then, unless include_out_of_order_matches is set to False, it makes another pass through the unmatched portions of the two input text strings, finding matching characters that were out of order in the previous pass, but are now in order after removing the original in-order matching characters. Then, it  adds to the first pass' score in the same manner as in the first pass, except this pass' contribution to the final score is reduced according to out_of_order_match_strength.

Then it makes as many passes as required until there are no more matching characters, reducing the contribution of each pass to the final score compared to the pass before it by the amount in out_of_order_match_strength.

itFUZZYCOMPARE can return the final score, or it can return the matching string found in the first pass, or it can return the length of the text string found in the first pass.

 

Syntax

itFUZZYCOMPARE(fuzzy_string_1,fuzzy_string_2,,return_type,case_sensitive,include_out_of_order_matches,out_of_order_match_strength)

fuzzy_string_1 Required. One of the two strings to compare.
fuzzy_string_2 Required. The other of the two strings to compare.
return_type Optional. Determines what kind of information is returned about the comparison result. 1 by default.

 

return_type value

Result

1 The matched string. Does not include out of order matches. Default if return_type is not specified.
2 The length of the matched string. Does not include out of order matches.
3 Match strength, long. The percent of the length of the match to the longer input string's length.

If the input strings are 8 characters long and 12 characters long, and the match found is 6 characters long, the return value is 6/12 = 0.5.

If include_out_of_order_matches is True or omitted, out of order matches may increase the fuzzy match score. See include_out_of_order_matches.

4 Match strength, short. The percent of the length of the match to the shorter input string's length.

If the input strings are 8 characters long and 12 characters long, and the match found is 6 characters long, the return value is 6/8 = 0.75.

If include_out_of_order_matches is True or omitted, out of order matches may increase the fuzzy match score. See include_out_of_order_matches.

5 Match strength, average. The percent of the length of the match to the average of the two input strings' lengths.

If the input strings are 8 characters long and 12 characters long, and the match found is 6 characters long, the return value is 6/((8+12)/2) = 6/(20/2) = 6/10 = 0.6.

If include_out_of_order_matches is True or omitted, out of order matches may increase the fuzzy match score. See include_out_of_order_matches.

case_sensitive Optional. Determines whether to distinguish between lower case and upper case. Set to False to ignore case differences. True by default.

If case_sensitive is False and return_type is 1 or omitted, the return value is in all capital characters.

include_out_of_order_matches Optional. itFUZZYCOMPARE makes one pass through the two original input strings, finding matches. Then, by default, it makes more passes, assigning each successive pass a lower fuzzy match score factor (see out_of_order_match_strength), making as many passes as needed before there are no more characters in common.

You may prefer to restrict the search to a single pass only. To do so, set include_out_of_order_matches to False.

Consider the input strings abbccee and eebbccf. The first pass finds bbcc. The second pass finds ee. The second pass' contribution to the final matching score is reduced by out_of_order_match_strength. Then there are no remaining matching characters.

True by default.

out_of_order_match_strength Optional. If specified, must be a number from 0 to 1.

0.75 by default.

If include_out_of_order_matches is True or omitted, itFUZZYCOMPARE makes multiple passes searching for matching characters, assigning each successive pass a lower fuzzy match score factor, making as many passes as needed before there are no more characters in common.

To make each successive pass' match count less toward the final score per matching character than the pass before it, each is multiplied by out_of_order_match_strength, and out_of_order_match_strength is itself multiplied by the previous pass' out_of_order_match_strength.

For example, if 5 passes are required before all matching characters are found:

  • The first pass counts in full toward the final score. out_of_order_match_strength is effectively 1 for the first pass. If, for example, the first pass has a fuzzy match score of 0.6, then the entire 0.6 counts toward the final score.
  • The second pass' score is reduced by multiplying it by out_of_order_match_strength. If out_of_order_match_strength is 0.75 (the default), and if the second pass has a raw score of 0.2, then the second pass contributes 0.2 x 0.75 = 0.15 to the final score.
  • The third pass' out_of_order_match_strength is reduced by multiplying it by the previous pass' out_of_order_match_strength, for an effective out_of_order_match_strength of (by default) 0.75 x 0.75 = 0.5625. If the third pass has a raw score of 0.1, then the third pass contributes 0.1 x 0.5625 = 0.05625 to the final score.
  • The fourth pass has an effective out_of_order_match_strength of (by default) 0.5625 x 0.75 = 0.421875.
  • The fifth pass has an effective out_of_order_match_strength of (by default) 0.421875 x 0.75 = 0.316406.

Because each pass' score is a percentage of the input strings' length as determined by return_type, it is impossible for the final score to exceed 1.

To make each pass count as much as the first pass, set include_out_of_order_matches to 1.

You may wish to format the result as a percentage, so that a final score of e.g. 0.85 is displayed as 85%.

 

Examples

Formula Description Result
=itFUZZYCOMPARE("abc","abc") Find and return the matching string between "abc" and "abc". abc
=itFUZZYCOMPARE("abc","bcd") Find and return the matching string between "abc" and "bcd". bc
=itFUZZYCOMPARE("abc","bcd",2) Find the matching string between "abc" and "bcd", and return the length of the matching string. 2
=itFUZZYCOMPARE("abcdefgh","cdefghijklmn",3) Find the matching string between "abcdefghi" and "cdefghijklmn", and return the percentage of the match length to the length of the longer of the two inputs. 0.5
(The matching string is "cdefgh", which is 6 characters long. The longer of the two input strings, "cdefjhijklmn", is 12 characters long. 6/12 = 0.5.)
=itFUZZYCOMPARE("abcdefgh","cdefghijklmn",4) Find the matching string between "abcdefghi" and "cdefghijklmn", and return the percentage of the match length to the length of the shorter of the two inputs. 0.75
(The matching string is "cdefgh", which is 6 characters long. The shorter of the two input strings, "abcdefjh", is 8 characters long. 6/8 = 0.75.)
=itFUZZYCOMPARE("abcdefgh","cdefghijklmn",5) Find the matching string between "abcdefghi" and "cdefghijklmn", and return the percentage of the match length to the average lengths of the two inputs. 0.6
(The matching string is "cdefgh", which is 6 characters long. The average length of the two inputs is (8+12)/2 = 10. 6/10 = 0.6.)
=itFUZZYCOMPARE("ABC","bcd") Find and return the matching string between "AB" and "bc". (empty string)
(itFUZZYCOMPARE is case-sensitive by default, so there are no matches between "AB" and "bc".)
=itFUZZYCOMPARE("ABc","bcd",,FALSE) Find and return the matching string between "AB" and "bc", ignoring case. BC
(If case_sensitive is False and return_type is 1 or omitted, the return value is in all capital characters.)
=itFUZZYCOMPARE("abc123","123abc",3) Find and return the matching score between "abc123" and "123abc", allowing out of order matches, and allowing the out of order match strength to default to 0.75. 0.875
(The first pass results in a score of 3/6 = 0.5. The second pass results in a score of (3/6) x 0.75 = 0.375. 0.5 + 0.375 = 0.875.)
=itFUZZYCOMPARE("abc123","123abc",3,,FALSE) Find and return the matching score between "abc123" and "123abc", but do NOT allow out of order matches. 0.5
=itFUZZYCOMPARE("abc123","123abc",3,,,0.9) Find and return the matching score between "abc123" and "123abc", allowing out of order matches, and specifying a custom out of order match strength of 0.9. 0.95
(The first pass results in a score of 3/6 = 0.5. The second pass results in a score of (3/6) x 0.9 = 0.45. 0.5 + 0.45 = 0.95.)
=itFUZZYCOMPARE(A1,A2,A3,A4,A5,A6)
 
  A
1  abc123
2  123abc
3  3
4  FALSE
5  TRUE
6  0.9
Find and return the matching score between "abc123" and "123abc", returning the percentage of the match length to the length of the longer of the two inputs (cell A3), not case-sensitive (cell A4), allowing out of order matches (cell A5), and specifying a custom out of order match strength of 0.9 (cell A6). 0.95

Find which items in a list most closely resemble a target item:

Formulas Description Result
  A B
1  aaabbbcdddeee  
2    
3  aaabbbcccdddeee  =itFUZZYCOMPARE(A3,$A$1,3)
4  aabbccddee  =itFUZZYCOMPARE(A4,$A$1,3)
5  aaabbbccdddee  =itFUZZYCOMPARE(A5,$A$1,3)
6  aaabbbeeedddccc  =itFUZZYCOMPARE(A6,$A$1,3)
7  eeedddcccbbbaaa  =itFUZZYCOMPARE(A7,$A$1,3)
8  aaabbbcccdddfff  =itFUZZYCOMPARE(A8,$A$1,3)
9  aaabbbcccfffggg  =itFUZZYCOMPARE(A9,$A$1,3)
10  aaabbbfffggghhh  =itFUZZYCOMPARE(A10,$A$1,3)
11  aaafffggghhhiii  =itFUZZYCOMPARE(A11,$A$1,3)
12  fffggghhhiiijjj  =itFUZZYCOMPARE(A12,$A$1,3)
Find which of the items in the range A3:A12 most closely resemble the item in cell A1, by finding and returning the matching scores between cell A1 and each of the items in the range A3:A12.

Then, sort the range A3:B12 by column B to easily see which items most closely resemble cell A1.

To save time entering the formulas, enter the formula in B3 as shown here, then use Excel's "fill down" feature to fill in the formulas in cells B4 through B12, by grabbing the "fill handle" on the lower right corner of cell B3 and either double-clicking it or dragging down.

  A B
1  aaabbbcdddeee  
2    
3  aaabbbcccdddeee  0.875
4  aabbccddee  0.714285714
5  aaabbbccdddee  0.928571429
6  aaabbbeeedddccc  0.80078125
7  eeedddcccbbbaaa  0.564208984
8  aaabbbcccdddfff  0.6875
9  aaabbbcccfffggg  0.5
10  aaabbbfffggghhh  0.4375
11  aaafffggghhhiii  0.25
12  fffggghhhiiijjj  0.0625

Then, sort the range A3:B12 by column B, in descending order:

  A B
1  aaabbbcdddeee  
2    
3  aaabbbccdddee  0.928571429
4  aaabbbcccdddeee  0.875
5  aaabbbeeedddccc  0.80078125
6  aabbccddee  0.714285714
7  aaabbbcccdddfff  0.6875
8  eeedddcccbbbaaa  0.564208984
9  aaabbbcccfffggg  0.5
10  aaabbbfffggghhh  0.4375
11  aaafffggghhhiii  0.25
12  fffggghhhiiijjj  0.0625

 

 

PrecisionCalc Home Page