I have a scenario where fuzzy matching should only be executed in regards to spaces and special characters such as (),.- I do not want the fuzzy matching to apply to alphanumeric characters (letters or digits), as I am trying to match to an exact list of alphanumeric strings.
Real-life scenario: I have an entity list of California Vehicle Codes (traffic offense numbers e.g. VC 21100.3). There are several hundreds of them. I have had to turn off fuzzy matching for that entity, as it is very important that I try to get an exact match on the user's query. However, I would like to turn ON fuzzy matching for that entity if I could restrict it to being fuzzy about spaces and special characters only, but not be fuzzy about alphanumerics.
An example would be that "VC21452a" and "VC 21452a" and "VC21452(a)" and "VC21452 (a)" and "VC 21452 (a)" should all be a fuzzy match to "VC21452a" with the spaces and special characters stripped; but should not match a similar number or the same number with a different suffix. So the fuzzy match should NOT match VC 21352e to VC 21352a, and should NOT match "VC 21352 (e)" to "VC 21352 (a)" or to "VC 21352".
Why is it useful?
|Who would benefit from this IDEA?||While programming my Watson Assistant, this would save me many hours of tedious effort and frustration in having to add acceptable aliases to my entity list of vehicle codes. I'm guessing that banks and insurance companies would also benefit from this.|
How should it work?
NOTICE TO EU RESIDENTS: per EU Data Protection Policy, if you wish to remove your personal information from the IBM ideas portal, please login to the ideas portal using your previously registered information then change your email to "email@example.com" and first name to "anonymous" and last name to "anonymous". This will ensure that IBM will not send any emails to you about all idea submissions