Skip to end of metadata
Go to start of metadata


I. Job Control
Jobs will be controlled through the batch system using Torque and Moab.

  1. Node sharing. No node sharing.
  2. Allocations. Allocations will be handled through the regular CHPC allocation committee. Allocations on owner nodes will be at the direction of node owners.
  3. Best effort to allocate nodes of same CPU speed.
  4. Max time limit for jobs will be as outlined in the QOS definitions below.
  5. Scheduling is set based on a current highest priority set for every job, excluding freecycle jobs which are scheduled in backfill (bestfit) mode.
  6. Fairshare boost in priority at user level. Minimal boost to help users who haven't been running recently. Our Fairshare window is two weeks.
  7. Expansion Factor small boost in priority as queue time increases. Ratio between requested wall time versus eligible queue time.
  8. Reward for parallelism. Set at the global level.
  9. Max idle jobs in queue per user set to 5. This does not limit the number of jobs a user submits. Only top 5 will be eligible to run and accrue queue time priority.
  10. Standing Reservations

    Reservation

    Access

    Accounts

    Node/core count

    Node specification

    Reservation Acceptance Criteria

    General_67

    general

    <pi>

    67/804

    em075-em140 & em144

    qos=general, freecycle

    General_GPUdevel_2

    by request (GPUs)

    general-gpu

    6/72 + 2 GPU per node

    em001-em002

    Account=general-gpu, walltime 1 hour or less, M-F 8am-8pm

    General_6

    by request (GPUs)

    general-gpu

    6/72 + 2 GPU per node

    em001-em006

    Account=general-gpu

    Cheatham_6

    restricted

    cheatham-em-gpu

    6/12 + 2 GPU per node

    em007-em012

    Account=cheatham-em-gpu

    Kaplan_14

    restricted

    kaplan-em

    14/168

    em013-em014 & em059-em070

    Account=kaplan-em, oguest

    Bolton_12

    restricted

    bolton-em

    12/144

    em015-em018 & em051-em058

    Account=bolton-em, oguest

    Baron_4restrictedbaron-em4/48em019-em022Account=baron-em, oguest
    Yandell_10restrictedyandell-em10/120em023-em032Account=yandell-em, oguest

    Zpu_4

    restricted

    zpu-em

    4/48

    em033-em036

    Account=zpu-em, oguest

    Voelk_14restrictedvoelk-em14/168em037-em050Account=voelk-em, oguest

    Avey_2

    restricted

    avey-em

    2/24

    em071-em072

    Account=avey-em, oguest

    Gregg_2

    restricted

    gregg-em

    2/24

    em073-em074

    Account=gregg-em, oguest

    Facelli_1restrictedfacelli-em1/12em395Account=facelli-em, oguest

    Total

     

     

    142/1704

     

     

  11. Job Priorities

Majority of a job's priority will be set based on a quality of service definition or QOS. The following initial QOS's to be defined:

QOS

Reservation

Priority

Description

general

General_67

++

  • Allocation required
  • Preemptor status
  • MAX runtime 72 hours

bolton

Bolton_12

++

  • Allocation required
  • Preemptor status
  • MAX runtime 14 days

kaplan

Kaplan_14

++

  • Allocation required
  • Preemptor status
  • MAX runtime 14 days

zpu

ZpPu_4

++

  • Allocation required
  • Preemptor status
  • MAX runtime 14 days

gregg

Gregg_2

++

  • Allocation required
  • Preemptor status
  • MAX runtime 14 days

avey

Avey_2

++

  • Allocation required
  • Preemptor status
  • MAX runtime 14 days
yandellYandell_10++
  • Allocation required
  • Preemptor status
  • MAX runtime 14 days
baronBaron_4++
  • Allocation required
  • Preemptor status
  • MAX runtime 14 days
voelkVoelk_14++
  • Allocation required
  • Preemptor status
  • MAX runtime 14 days
facelliFacelli_1++
  • Allocation required
  • Preemptor status
  • MAX runtime 14 days

long

General_67

++

  • Allocation required
  • Special permission required
  • Limited to 2 nodes at any time
  • MAX runtime 14 days

cGPU

Cheatham_6

++

  • Not under allocation control
  • Restricted to cheatham group
  • Preemptor status
  • MAX runtime 14 days

gGPU

General_6

++

  • Not under allocation control
  • Requires user to request general-gpu account (email issues to request access)
  • MAX runtime 24 hours

freecycle (preemptee) (on general reserved nodes only)

General_67

Flat or zero

  • Out of allocation required
  • Preemptee status
  • Jobs killed when preemptor wants to run
  • MAX runtime 72 hours (same as general)
  • Run in "backfill" only (not priority)
  • Jobs limited to 25% per user in freecycle (12 nodees) when there is competition

oguest (owner-guest) (preemptee)

All listed above except for General_67 and GPU nodes

Flat or zero

  • Requires user request owner-guest account
  • All charges go against owner-guest account
  • Preemptee status but treated as allocated work (reservations respected)
  • Jobs killed when preemptor wants to run
  • MAX runtime 72 hours

oGPU

Cheatham_6

Flat or zero

  • Preemptee status but treated as allocated work (reservations respected)
  • Jobs killed when preemptor wants to run
  • MAX runtime 72 hours
  • No labels