Center for HPC Cluster Resource Management and Scheduling
Home

Cluster Resources



Services

General Downloads

Moab Workload Manager

Moab Cluster Manager

Moab Access Portal

Maui Scheduler

Silver Grid Scheduler

Torque

Gold

SSS-RM




Cluster Framework


Maui Trace File Format, version 310


    Maui supports a trace format for workload (jobs) and another for resources (nodes).

16.3  Workload Traces

    Workload traces fully describe all scheduling relevant aspects of batch jobs including resources requested and utilized, time of all major scheduling event (i.e., submission time, start time, etc), the job credentials used, and the job execution environment.  Each job trace is composed of a single line consisting of 44  whitespace delimited fields as shown in the table below.
 
Field Name Field Index Data Format Default Value Details
JobID 1 <STRING> [NO DEFAULT] Name of job, must be unique
Nodes Requested 2 <INTEGER> 0 Number of nodes requested (0 = no node request count specified)
Tasks Requested 3 <INTEGER> 1 Number of tasks requested
User Name 4 <STRING> [NO DEFAULT] Name of user submitting job
Group Name 5 <STRING> [NO DEFAULT] Primary group of user submitting job
Wallclock Limit 6 <INTEGER> 1 Maximum allowed job duration in seconds
Job Completion State 7 <STRING> Completed One of Completed, Removed, NotRun
Required Class 8 <STRING> [DEFAULT:1] Class/queue required by job specified as square bracket list of <QUEUE>[:<QUEUE INSTANCE>] requirements.  (ie, [batch:1]) 
Submission Time 9 <INTEGER> 0 Epoch time when job was submitted
Dispatch Time 10 <INTEGER> 0 Epoch time when scheduler requested job begin executing
Start Time 11 <INTEGER> 0 Epoch time when job began executing  (NOTE:  usually identical to 'Dispatch Time')
Completion Time 12 <INTEGER> 0 Epoch time when job completed execution
Required Network Adapter 13 <STRING> [NONE] Name of required network adapter if specified
Required Node
 Architecture
14 <STRING> [NONE] Required node architecture if specified
Required Node
 Operating System
15 <STRING> [NONE] Required node operating system if specified
Required Node
 Memory
 Comparison
16 one of >, >=, =, <=, < >= Comparison for determining compliance with required node memory
Required Node
 Memory
17 <INTEGER> 0 Amount of required configured RAM (in MB) on each node
Required Node Disk
 Comparison
18 one of >, >=, =, <=, < >= Comparison for determining compliance with required node disk
Required Node Disk 19 <INTEGER> 0 Amount of required configured local disk (in MB) on each node 
Required Node
 Attributes
20 <STRING> [NONE] square bracket enclosed list of node features required by job if specified (ie '[fast][ethernet]') 
System Queue
 Time
21 <INTEGER> 0 Epoch time when job met all fairness policies
Tasks Allocated 22 <INTEGER> <TASKS REQUESTED> Number of tasks actually allocated to job  (NOTE:  in most cases, this field is identical to field #3, Tasks Requested)
Required Tasks Per
 Node
23 <INTEGER> -1 Number of Tasks Per Node required by job or '-1' if no requirement specified
QOS 24 <STRING>[:<STRING>] [NONE] QOS requested/delivered using the format <QOS_REQUESTED>[:<QOS_DELIVERED>]  (ie, 'hipriority:bottomfeeder')
JobFlags 25 <STRING>[:<STRING>]... [NONE] square bracket delimited list of job attributes (i.e., [BACKFILL][BENCHMARK][PREEMPTEE])
Account Name 26 <STRING> [NONE] Name of account associated with job if specified
Executable 27 <STRING> [NONE] Name of job executable if specified
Comment 28 <STRING> [NONE] Resource manager specific list of job attributes if specified.  See the Resource Manager Extension Overview for more info.
Bypass Count 29 <INTEGER> -1 Number of time job was bypassed by lower priority jobs via backfill or '-1' if not specified
ProcSeconds
 Utilized
30 <DOUBLE> 0 Number of processor seconds actually utilized by job
Partition Name 31 <STRING> [DEFAULT] Name of partition in which job ran
Dedicated Processors per Task 32 <INTEGER> 1 Number of processors required per task
Dedicated Memory per Task 33 <INTEGER> 0 Amount of RAM (in MB) required per task
Dedicated Disk per Task 34 <INTEGER> 0 Amount of local disk (in MB) required per task
Dedicated Swap per Task 35 <INTEGER> 0 Amount of virtual memory (in MB) required per task
Start Date 36 <INTEGER> 0 Epoch time indicating earliest time job can start
End Date 37 <INTEGER> 0 Epoch time indicating latest time by which job must complete 
Allocated Host List 38 <STRING>[:<STRING>]... [NONE] colon delimited list of hosts allocated to job (ie, node001:node004)
Resource Manager Name 39 <STRING> [NONE] Name of resource manager if specified
Required Host Mask 40 <STRING>[<STRING>]... [NONE] List of hosts required by job.  (if taskcount > #hosts, scheduler must use these nodes in addition to others, if taskcount < #host, scheduler must select needed hosts from this list) 
Reservation 41 <STRING> [NONE] Name of reservation required by job if specified
Set Description 42 <STRING>:<STRING>[:<STRING>] [NONE] Set constraints required by node in the form <SetConstraint>:<SetType>[:<SetList>] where SetConstraint is one of ONEOF, FIRSTOF, or ANYOF, SetType is one of PROCSPEED, FEATURE, or NETWORK, and SetList is an optional colon delimited list of allowed set attributes, (i.e. 'ONEOF:PROCSPEED:350:450:500')
Application Simulator Data 43 <STRING>[:<STRING>] [NONE] Name of application simulator module and associated configuration data (i.e., 'HSM:IN=infile.txt:140000;OUT=outfile.txt:500000')
RESERVED FIELD 1 44 <STRING> [NONE] RESERVED FOR FUTURE USE
NOTE:  if no applicable value is specified, the exact string '[NONE]' should be entered.
 

Sample Workload Trace:

    'SP02.2343.0 20  20  570  519  86400  Removed  [batch:1]  887343658  889585185  889585185  889585411  ethernet  R6000  AIX43  >=  256  >=  0  [NONE] 889584538  20  0 0  2  0  test.cmd 1001  6  678.08  0  1  0  0  0  0  0  [NONE]  0  [NONE]  [NONE]  [NONE]  [NONE]  [NONE]'



16.2   Resource Traces

    Resource traces fully describe all scheduling relevant aspects of a batch system's compute resources.  In most cases, each resource trace describes a single compute node providing information about configured resources, node location, supported classes and queues, etc.  Each resource trace consists of a single line composed of 21 whitespace delimited fields.  Each field is described in detail in the table below.
 
Field Name Field Index Data Format Default Value Details
Resource Type 1 one of COMPUTENODE COMPUTENODE currently the only legal value is 'COMPUTENODE'
Event Type 2 one of AVAILABLE, DEFINED, or DRAINED [NONE] when AVAILABLE, DEFINED, or DRAINED is specified, node will start in the state Idle, Down, or Drained respectively.
NOTE:  node state can be modified using the nodectl command.
Event Time 3 <EPOCHTIME> 1 time event occurred.  (currently ignored)
Resource ID 4 <STRING> N/A for 'COMPUTENODE' resources, this should be the name of the node.
Resource Manager Name 5 <STRING> [NONE] name of resource manager resource is associated with
Configured Swap 6 <INTEGER> 1 amount of virtual memory (in MB) configured on node
Configured Memory 7 <INTEGER> 1 amount of real memory (in MB) configured on node (i.e. RAM)
Configured Disk 8 <INTEGER> 1 amount of local disk (in MB) on node available to batch jobs
Configured Processors 9 <INTEGER> 1 number of processors configured on node
Resource Frame Location 10 <INTEGER> 1 number of frame containing node (SP2 only)
Resource Slot Location 11 <INTEGER> 1 Number of first frame slot used by node (SP2 only)
Resource Slot Use Count 12 <INTEGER> 1 Number of frame slots used by node (SP2 only)
Node Operating System 13 <STRING> [NONE] node operating system
Node Architecture 14 <STRING> [NONE] node architecture
Configured Node Features 15 <STRING> [NONE] square bracket delimited list of node features/attributes (ie, '[amd][s1200]')
Configured Run Classes 16 <STRING> [batch:1] square bracket delimited list of CLASSNAME:CLASSCOUNT pairs.
Configured Network Adapters 17 <STRING> [NONE] square bracket delimited list of configured network adapters (ie, '[atm][fddi][ethernet]')
Relative Resource Speed 18 <DOUBLE> 1.0 relative machine speed value
RESERVED FIELD 1 19 <STRING> [NONE] [NONE]
RESERVED FIELD 2 20 <STRING> [NONE] [NONE]
RESERVED FIELD 3 21 <STRING> [NONE] [NONE]
NOTE:  if no applicable value is specified, the exact string '[NONE]' should be entered.

Sample Resource Trace:

    'COMPUTENODE AVAILABLE 0 cluster008 PBS1 423132 256 7140 2 -1 -1 1 LINUX62 AthlonK7 [s950][compute]  [batch:2]  [ethernet][atm] 1.67  [NONE]  [NONE]  [NONE]'