IBM Data and AI

Welcome to the IBM Data and AI Ideas Portal for Clients!

We welcome and appreciate your feedback on IBM Data and AI Products to help make them even better than they are today!
Before you submit an idea, please perform a search first as a similar idea may have already been reported in the portal. If a related idea is not yet listed, please create a new idea and include with it a description which includes expected behavior as well as why having this feature would improve the service and how it would address your use case.
IBM Employees:
Clients:
  • Our team welcomes any feedback and suggestions you have for improving our offerings / products! This forum allows us to connect your offering / product improvement ideas with IBM product and engineering teams.

  • If you have not registered on this portal please click on the following link and register. To complete registration you will need to open the email you will receive from Aha to confirm your identity. http://ibm.biz/IBM-Data-and-AI-Portal-Register

Additional Information:
  • The shorter URL for this site is: https://ibm.biz/IBM-Data-and-AI-Ideas

  • To view our roadmaps: http://ibm.biz/Data-and-AI-Roadmaps

  • Reminder: This is not the place to submit defects or support needs, please use normal support channel for these cases

  • Please do not use the Ideas Portal for reporting bugs - we ask that you report bugs or issues with the product by contacting IBM support.

Defect in parsing of webhose news site scraping

webhose provides articles scraped from news sites and the parsing of these sites is used for Watson NLU enrichments

Some urls contain additional articles that are extraneous to the main article. The parser includes this extraneous text as part of the main article text. Subsequently this contaminated text is enriched within Watson Discovery and the NLU results include entities from the extraneous text. This situation results in articles being tagged as very relevant to an entity that is not related in any way to the main article of the URL and provides a False Positive match to a query request. In additon, these extraneous articles can change over time so the exttraneous articles present at the time of scraping are no longer present on future calls to the main article URL.

Here is a specific example, query articles with IBM & Zillow as entities with the keyword - patent. The returned articles include articles that are not relevant to these entities

URL of returned article queried per above: https://www.law.com/2020/08/31/how-a-trial-lawyer-survived-a-14-hour-zoom-hearing/?slreturn=20200810142110

  • Avatar32.5fb70cce7410889e661286fd7f1897de Guest
  • Sep 11 2020
  • Under Review
Who would benefit from this IDEA? All users of WDN as Flase Positives will be reduced
  • Attach files

NOTICE TO EU RESIDENTS: per EU Data Protection Policy, if you wish to remove your personal information from the IBM ideas portal, please login to the ideas portal using your previously registered information then change your email to "anonymous@euprivacy.out" and first name to "anonymous" and last name to "anonymous". This will ensure that IBM will not send any emails to you about all idea submissions