Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Delivered
Workspace Spectrum LSF
Created by Guest
Created on Apr 24, 2019

Extend time period that historical run time is tracked for fairshare

We use historical run time as the main component of fairshare to determine job priority and ensure that users receive the appropriate amount of run time on our clusters.

The current behavior in LSF with respect to historical run time is that any time a group is updated / touched or the mbatchd gets restarted, the historical run time for that group is reset to include only the run time from jobs that are in mbatchd memory (see TS002071715). Since we sync group membership every hour, this effectively means that the length of time that jobs are considered for historical run time is governed by how long they stay in mbatchd memory. The length of time that jobs stay in mbatchd memory is controlled by two parameters, CLEAN_PERIOD_DONE, for jobs that complete successfully, and CLEAN_PERIOD, for all other jobs. CLEAN_PERIOD can be set to an arbitrarily long time, but CLEAN_PERIOD_DONE controls most of the jobs on the system and is capped at 1 week.

We would like historical run time to consider jobs much older than 1 week. On our Slurm clusters, for example, the PriorityDecayHalfLife (time to decay to 50% of original value) is 1 week. This is approximately equivalent to an LSF HIST_HOURS (time to decay to 10% of original value) of 3 weeks. Ideally, we'd like to allow a job's priority to decay to ~1% of its original value before cutting it off. That would currently require keeping 6 weeks worth of jobs in mbatchd memory, which is probably not feasible. I did notice that a job's contribution to the historical run time isn't lost immediately when the job is purged from mbatchd memory. It doesn't change until you touch the group (e.g. with ‘bconf set …'). This implies that the historical run time is being tracked separately from the full job record somehow.

Our specific requests are:

(1) that the historical run time to persist independently of the underlying job record. E.g. make the historical run time persist (and decay) for 2*HIST_HOURS.

(2) Since that request would constitute a major change to the behavior of LSF and possibly require significant coding efforts, we'd also like to be able to set CLEAN_PERIOD_DONE to at least 3 weeks as a stopgap measure.

(3) mbatchd should warn about these various incompatible time cutoffs on a ‘badmin reconfig' and ‘badmin mbdrestart'. If it would have let us know that HIST_HOURS was going to be meaningless when set to a value much larger than CLEAN_PERIOD and that CLEAN_PERIOD_DONE was being automatically set to a lower value than CLEAN_PERIOD, we could have started addressing these issues a lot sooner.