IBM Data & AI

 Welcome to the IBM Data & AI Ideas Portal for Clients! 

We welcome and appreciate your feedback on IBM Data & AI Products to help make them even better than they are today!
Before you submit an idea, please perform a search first as a similar idea may have already been reported in the portal.  If a related idea is not yet listed, please create a new idea and include with it a description which includes expected behavior as well as why having this feature would improve the service and how it would address your use case.
IBM Employees:
Clients:
  • Our team welcomes any feedback  and suggestions you have for improving our offerings / products!  This forum allows us to connect your offering / product improvement ideas with IBM product and engineering teams.
  • If you have not registered on this portal please click on the following link and register.  To complete registration you will need to open the email you will receive from Aha to confirm your identity. http://ibm.biz/IBM-Data-and-AI-Portal-Register
Additional Information:
  • The shorter URL for this site is: https://ibm.biz/IBM-Data-and-AI-Ideas
  • To view our roadmaps: http://ibm.biz/Data-and-AI-Roadmaps
  • Reminder: This is not the place to submit defects or support needs, please use normal support channel for these cases
  • Please do not use the Ideas Portal for reporting bugs - we ask that you report bugs or issues with the product by contacting IBM support.

can we make file connector to read file only in directory with name matched to the wildcard pattern

  1. The File connector is unable to read expected partitions data :  Expected to read data from all partitions excluding part_num=0, but it is reading data from part_num=1, part_num=10, part_num=11, part_num=12 and part_num=2 only.

      Expected Format as per UNIX HDFS - #AllHdfsProjectDirectories.$EDM_HUB_PROCESSED_BASE_HDFS_DIR##jp_HDFS_DIR#/#jpTableName#/part_num=[1-9]*/*
  2. The File connector is duplicating data while reading data using “read multiple files” option : Expected data from all partitions as one time, but the it is giving each partition data 4 times or 8 times or 12 times based on the CORES in APT file.
  • Anil Daniyala
  • Jan 8 2020
  • Need More Information
Why is it useful?
Who would benefit from this IDEA? we have left few heavy load jobs with noteworthy execution times. This patch will be a potential riposte for us, to extract ORC file data directly in parallel from each partition of a table.
How should it work?

1) Do you have a proposed solution for this problem?

2) and what would the measurable benefits of the solution be?

3) What are you doing today to work around the problem? (or what else did you try?)

Idea Priority Medium
Priority Justification this will help in bringing batch time down
Customer Name New York Life Insurance
Submitting Organization
Submitter Tags
  • Attach files

NOTICE TO EU RESIDENTS: per EU Data Protection Policy, if you wish to remove your personal information from the IBM ideas portal, please login to the ideas portal using your previously registered information then change your email to "anonymous@euprivacy.out" and first name to "anonymous" and last name to "anonymous". This will ensure that IBM will not send any emails to you about all idea submissions