Query expansion via synonyms is a very powerful yet easy to configure feature. However, the current limitation of 500 entries across all expanded_terms arrays is too low for most use cases. For enterprise corpora covering a multitude of topics, we estimate that a limit in the range of 10k to 30k synonyms would be required per collection in order to apply the available thesauri. For scientific applications (think about diseases, drugs, proteins) there might even be a higher limit required. If there is a workaround via enrichment, of course that would also be a solution.
|Who would benefit from this IDEA?||As a user of a search engine I want to obtain search results with high recall without having to think about all possible synonyms so that I do not miss any relevant documents or passages.|
NOTICE TO EU RESIDENTS: per EU Data Protection Policy, if you wish to remove your personal information from the IBM ideas portal, please login to the ideas portal using your previously registered information then change your email to "firstname.lastname@example.org" and first name to "anonymous" and last name to "anonymous". This will ensure that IBM will not send any emails to you about all idea submissions