Skip to Main Content
IBM Data and AI Ideas Portal for Customers


This portal is to open public enhancement requests against products and services offered by the IBM Data & AI organization. To view all of your ideas submitted to IBM, create and manage groups of Ideas, or create an idea explicitly set to be either visible by all (public) or visible only to you and IBM (private), use the IBM Unified Ideas Portal (https://ideas.ibm.com).


Shape the future of IBM!

We invite you to shape the future of IBM, including product roadmaps, by submitting ideas that matter to you the most. Here's how it works:


Search existing ideas

Start by searching and reviewing ideas and requests to enhance a product or service. Take a look at ideas others have posted, and add a comment, vote, or subscribe to updates on them if they matter to you. If you can't find what you are looking for,


Post your ideas

Post ideas and requests to enhance a product or service. Take a look at ideas others have posted and upvote them if they matter to you,

  1. Post an idea

  2. Upvote ideas that matter most to you

  3. Get feedback from the IBM team to refine your idea


Specific links you will want to bookmark for future use

Welcome to the IBM Ideas Portal (https://www.ibm.com/ideas) - Use this site to find out additional information and details about the IBM Ideas process and statuses.

IBM Unified Ideas Portal (https://ideas.ibm.com) - Use this site to view all of your ideas, create new ideas for any IBM product, or search for ideas across all of IBM.

ideasibm@us.ibm.com - Use this email to suggest enhancements to the Ideas process or request help from IBM for submitting your Ideas.

IBM Employees should enter Ideas at https://ideas.ibm.com


Status Not under consideration
Workspace Spectrum LSF
Created by Guest
Created on Dec 2, 2014

Cgroups CPU subsystem support

We want to ensure that jobs get the CPU time according to the amount of slots requested, especially on nodes shared by multiple jobs. This has always been an important goal in HPC environments.

To achieve this we can use affinity, however CPU affinity necessarily means CPU pinning (cpusets). Enforcing a basic 1 slot = 1 core affinity onto every job will lead us to have gaps when jobs with more restrictive affinity setting (same, cpubind, distribute, etc.) come into play. We do not want to enforce CPU pinning on all the jobs unless user explicitly requests this feature.

This can be achieve using the Cgroups cpu subsystem:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html

In principle, it should be as easy as setting cpu.shares=slots_reserved_on_this_node on every single job because that value is relative to the total of shares. In that case it would be handy to manage oversubscribed nodes as well, where the amount of jobs is bigger than the amount of cores, and prevent the process starvation caused by greedy process.

  • Guest
    Reply
    |
    Sep 8, 2016

    Closed by the request of the client.

  • Guest
    Reply
    |
    May 9, 2016

    Thanks for the detailed further explanation of the problem.

    The truth is we evaluated this option before implementing a default CPU binding (affinity). Affinity is, for obvious reasons, not compatible with cpu.shares since each job is allocated to an exclusive set of physical cores, so jobs are not competing each other.

    Our idea was about the simplest case. An oversubscribing job could use more than the CPU resources requested/allocated while those resources are free or not being used at that moment. However, at soon as another job is scheduled, processing capacity would be distributed as per cpu.shares=allocated_slots_on_this_node formula.

    This RFE can be closed since we are not pursuing this feature anymore. Thanks again.

  • Guest
    Reply
    |
    Dec 15, 2014

    Prior to LSF supporting cgroups, a number of clients had implemented this themselves by adjusting the cpu.shares in the pre & post exec and also considering other factors - such as queue priority.

    During the LSF-cgroup implementation work we looked at enabling a mix of shares and affinity - which resulted in some very strange behaviour and interactions - some of these were clearly bugs in the cgroup subsystem.

    Likewise, if you take the case of a single core cpu intensive job, and put a highly threaded interactive job on the other 7 cores, then the cpu intensive job gets throttled.

    As per the first paragraph, it could be argued that jobs in higher priority queues should get more os cpu.shares at runtime - so it could be argued that cpu.shares=allocated_slots_on_this_node*queue_pri is the right weighting, but then someone else would argue individual job priority and sla's should also be accounted for.

    It is something we may reconsider in the future, but it is not currently in plan.

  • Guest
    Reply
    |
    Dec 3, 2014

    Creating a new RFE based on Community RFE #62791 in product Platform LSF.