IBM Data and AI

Welcome to the IBM Data and AI Ideas Portal for Clients!

We welcome and appreciate your feedback on IBM Data and AI Products to help make them even better than they are today!
Before you submit an idea, please perform a search first as a similar idea may have already been reported in the portal. If a related idea is not yet listed, please create a new idea and include with it a description which includes expected behavior as well as why having this feature would improve the service and how it would address your use case.
IBM Employees:
Clients:
  • Our team welcomes any feedback and suggestions you have for improving our offerings / products! This forum allows us to connect your offering / product improvement ideas with IBM product and engineering teams.

  • If you have not registered on this portal please click on the following link and register. To complete registration you will need to open the email you will receive from Aha to confirm your identity. http://ibm.biz/IBM-Data-and-AI-Portal-Register

Additional Information:
  • The shorter URL for this site is: https://ibm.biz/IBM-Data-and-AI-Ideas

  • To view our roadmaps: http://ibm.biz/Data-and-AI-Roadmaps

  • Reminder: This is not the place to submit defects or support needs, please use normal support channel for these cases

  • Please do not use the Ideas Portal for reporting bugs - we ask that you report bugs or issues with the product by contacting IBM support.

Ability to disable or customize stemming

Watson Discovery uses stemming to identify valid matches. Based on our experience it appears that stemming is applied in both Natural Language Queries and also Discovery Language Queries. It seems to also enforced in phrase searching. This is useful sometimes but there are other cases where this approach makes it very difficult to find relevant matches. For example, in one of our collections we have several documents with the term "DCS" and several with the term "DC". These are not related terms and there is no reason for our users to get both in a single query. However, if you search for either DCS or DC, you will get both and there is no way to filter our the undesired matches because they seem to be interpreted by WDS as the exact same term.

We think WDS should not enforce stemming when using phrase searching, as this type of search is primarily used to specify exact matches such as titles or excerpts of a document. Alternatively, we would like to have some control over when stemming should be applied or what words or terms should not be stemmed.

  • Avatar32.5fb70cce7410889e661286fd7f1897de Guest
  • Jun 26 2019
  • Future consideration
Who would benefit from this IDEA? "As a customer I would like to be able to do searches that only retrieve exact matches
  • Attach files
  • Admin
    Phil Anderson commented
    24 Apr 06:34pm

    I'm sorry for the confusion, it appears you are right and double quotes does not disable stemming.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    24 Apr 04:47pm

    Hello Phil,

    After reading your post I tried this Support Search, title:"integrator", from CSP after enabling "Advanced Query Syntax". The search results includes items such as this title- 'Integration for Application Integration'. This search did not return only items with "integrator" in the title.

    Perhaps I misunderstood your post and the CSP Playbook instructions. Or maybe this is just an implementation hurdle that gives me and many Support Team members an unnecessary amount of grief and wasted effort.

    NLQ should not be the default search option for our Support Team research as it returns a huge number of false positive results. NLQ greatly devalues the benefits and increases the filtering time.

    Thank you for your attention to this RFE.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    24 Apr 12:56pm

    Hello Phil, that doesn't work from Support Search with Watson which is why Eduardo opened this request.

    If you're saying a "quoted" search should disable stemming, it sounds like SSwW is doing something wrong when it submits a query.


    I'll try to find out what a query looks like from SSwW when we use it's Advanced Query Syntax option and perform a quoted search.

    Thanks very much.

  • Admin
    Phil Anderson commented
    24 Apr 12:35pm

    Hi Kevin - you can already disabled stemming today via double quotes, just use the Query parameter instead of the Natural Language Query parameter

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    23 Apr 09:52pm

    Hello Phil,


    Thanks for the prompts feedback.

    The need to prioritize this requirement has already covered by the wording of the orginal submission and subsequent comments from Support Search with Watson users.

    These users are IBM Support professionals that use SSW on a daily basis to search data from a variety of sources, mainly for the purpose of finding information used to solve hardware and software problems submitted to IBM Support by our clients.

    It would be almost impossible to quantify, but I can say with certainty that we spend a significant amount of time trying to "see the wood for the trees" because stemming delivers additional results that we have no interest in using,

    The argument that "relevance" sorting of results can overcome these problems doesn't help when we often use other sort orders to display results.

    The simple ability to know that a "quoted" search will only return results relating to the string inside the quotes by not enforcing stemming or giving us the option in a query sunmission to indicate stemming is, or is not required, would fix this problem.

    I could go "vote hunting" by way of a blanket bombing email run of 17.000'ish users and invite them to add comments, but these comments would most likely be "variations on a theme".

    You suggested I could... "provide the Offering Managers revenue or other info to help with the prioritization".

    What "revenue" are you referring to?
    Other than the information already provided here, and/or collecting votes, what else should I be looking for?

    Would it help if I got some Support Executives involved?

    Regards, Kevin

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    23 Apr 09:34pm

    Phil, this is one of the basic things which is being asked here in this request. We use very exact words in a products vocabulary. 1000s of IBM support engineers rely on the search function everyday. If you add the extra cycles each one of them has to spend because of the basic search function is lacking, it makes a no brainer case. In my support org alone, we have 350+ engineers dependent on this.
    Thanks, Puneet

  • Admin
    Phil Anderson commented
    23 Apr 09:00pm

    Hi Kevin,

    Future Consideration means "it's a good idea, we will consider it for our roadmap". High voted features like this we consider each quarter based on a number of factors. If you want to accelerate a feature getting on our roadmap you can provide the Offering Managers revenue or other info to help with the prioritization.

    -Phil

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    23 Apr 08:49pm

    This request appears to have a status of "Future Consideration".
    What exactly does that mean?

    Are we far enough away from the time when that status was set to now be in the future for it to be considered?

    How much longer will it be before users of "Support Search with Watson" have the luxury of seeing results that reflect the search criteria we've used without suffering from the pain introduced by stemming?

    Is there a need to go vote hunting in the IBM Support Organisation to get eyes on this requirement?
    Please let me go so I can do that if it's a necessary evil.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    13 Jan 07:44am

    Stemming should be replaced with lemmatization in Discovery, as lemmatization takes into account  the part of speech of the original word.

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    12 Dec, 2019 05:14pm

    We need a support search feature that works natively as do most common search tools.  The current Support Search tool returns a vast number of "false positives" due to stemming and lemmatization. When we search with the word "integrator" we do not want results that contain the words "integration" or "integrated" but not our original search. 

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    12 Dec, 2019 05:12pm

    We need a support search feature that works natively as do most common search tools.  The current Support Search tool returns a vast number of "false positives" due to stemming and lemmatization. When we search with the keyword "integrator" we do not results that do not contain that keyword but contain the words "integration" or "integrated". 

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    12 Dec, 2019 04:53pm

    Yes, that is not an idea, that is a requirement,

    when I search for "efs" I don't want to see results matching "EF", because

    then the actual match I should get are lost in the number of matches I don't expect.
    We are support and looking for exact match on technical things.
    Not looking for approximate random results

  • Avatar40.8f183f721a2c86cd98fddbbe6dc46ec9
    Guest commented
    27 Jun, 2019 04:36pm

    Do NOT solve this with words or terms that should not be stemmed. Using quotes (phrase searching" should be able to stop stemming on any word you choose. This is standard practice and anything else would likely lead to unexpected results in the future. Keep in mind that counter-intuitive search engine results can quickly make IBM support look "stupid" because, say, we don't even know about our own documentation.

NOTICE TO EU RESIDENTS: per EU Data Protection Policy, if you wish to remove your personal information from the IBM ideas portal, please login to the ideas portal using your previously registered information then change your email to "anonymous@euprivacy.out" and first name to "anonymous" and last name to "anonymous". This will ensure that IBM will not send any emails to you about all idea submissions