Job Management Systems for High Performance Compute Clusters
Today’s typical High Performance Computing Systems are clusters of high performance servers. They are tied together by a high speed network fabric (e.g. Infiniband) and a software stack, which enables convenient and efficient management and use of the cluster.
A core component of the system is the job scheduler. It allows users to define computation jobs to run on the system unattended as batch jobs in the background and system administrators to manage these jobs and plan and schedule them for optimal use of the system.
For Windows HPC clusters, Microsoft has been offering the HPC pack1 to create, manage, and run HPC applications on Windows HPC clusters. The latest version of the HPC Pack is Microsoft HPC Pack 2012 R2 Update 1. The Microsoft HPC Pack supports only Microsoft Windows based systems while other Job Management Systems are available for multiple platforms including Windows and Linux.
Altair’s PBS Professional2 is the trusted leader in high-performance computing (HPC) workload management. Proven for over 20 years at thousands of global sites, PBS Professional efficiently schedules HPC workloads across all forms of computing infrastructure. Easily scaling to support hundreds of thousands of processors – from clusters to the largest HPC systems – PBS Professional ensures you receive the maximum value from your computing investments.
PBS Works by Altair
The PBS Works suite of HPC workload management and cloud enablement products allows HPC users and enterprises of any size to maximize ROI on existing hardware and software resources.
PBS Works includes:
- PBS Professional®, Altair’s market-leading HPC workload manager and job scheduler
- Compute Manager, a web-based portal for simplified job submission and management
- Display Manager, a web-based portal for remote visualization of data and applications
- PBS Analytics, an easy to use job accounting and reporting product
PBS Works is a complete package for users of HPC clusters and it includes:
- A modern web interface
- Remote desktop capability
- Strict user access controls
- Many new and improved features in version 13 including power management and provisioning
PBS Works – an Award Winning Solution
PBS Works was named “#1 HPC Software” by HPCwire readers, Altair’s PBS Works workload management suite is used by thousands of organizations worldwide to simplify the administration and use of HPC clusters, clouds and supercomputing environments.
PBS Works was awarded:
“Best HPC Software or Technology” 2014 Annual HPCwire Readers’ Choice Awards
Job Management with PBS Professional
PBS Professional delivers key benefits to users in a wide range of industries, by providing fast, powerful scheduling and simplified workload management.
- Increase ROI on hardware and software investments by driving up application performance, reducing support costs, and keeping system utilization in the 90-99% range
- Simplify management of HPC systems with fast, easy job submission, job arrays, user-friendly GUIs, plus integration with the leading suite of portal-based management products
- Save time with faster job execution, automated tasks and simplified management
- Improve job completion rates with smart scheduling to prevent job failures, and ensure workload requirements are met
- Maximize application performance with higher scalability and throughput, optimized and topology-aware job placement, and increased system availability and uptime
- Meet complex requirements via intelligent workload management, policy-based scheduling and a flexible, easy to use plugin framework for customizations
- Minimize risk with the industry leader in HPC workload management security (offering EAL3+ certification and RedHat SELinux cross-domain security support)
- Ensure solution longevity with global support in 20 countries from Altair’s experts, the leaders in HPC workload management client satisfaction
- Million-core scalability – tested to 50,000+ nodes
- Fast, reliable startup of huge MPI jobs – tested at tens of thousands of MPI ranks; minimizes delays caused by faulty nodes
- Fast throughput – supports 1,000,000+ jobs per day
- Cgroups support** eliminates resource contention so jobs run faster and don’t interfere with each other or the OS
- Comprehensive health check framework monitors your health check script behavior – either checks run or node is marked down
- Expanded hook events for even more plugin extensibility and customization
- Expanded scheduling priority formula with full math functions (e.g., sqrt(), ceil(), …), conditional expressions, and a threshold for job start eligibility
- Fine-grained targeting for preemption, configurable at the queue level (admin controlled)
- General fairshare formula enables accruals by queue, license use, time of day, power use, even combinations of these
- Expanded Windows support: Intel MPI and MPICH2; UNC paths for stdin, stdout, and file staging
- Support for SLES 12 and RHEL 7
- Custom resources can be created directly using qmgr, without the need to restart the server
- Long job and reservation names supported
- Support of Windows and Linux platforms
- More job ordering and job priority options
- More queuing options
- Time slot allocation and job classes
- Professional support provided
- Ease of use
- Efficient and adaptable job scheduling on large clusters, supercomputers and cloud based systems
- Full technical support
What’s new in version 13?
PBS Professional version 13 is architected for Exascale with the following key features:
** Limited availability, ask Altair for more info.
PBS Professional vs. Microsoft HPC Pack
On the Microsoft Windows Platform PBS Professional offers many features which are not supported by Microsoft HPC Pack.
Some of the important advantages of PBS Professional are:
A comprehensive comparison between PBS Professional and MS HPC Pack is provided below.
|Features||Sub-Features||MS HPC||PBS Pro|
|To see the comprehensive table, click here to download the PDF.|
|HIGH AVAILABILITY/FAILOVER SUPPORT||Supported?|
|Built-in support?||Supported through Windows HA functionality|
|HPC Server 2008 R2||Supported only on Datacentre and Enterprise additions which is expensive||Supported with all editions|
|HPC Server 2012||Supported with all editions||Supported with all editions|
|EXTREME SCALABILITY/PERFORMANCE||Scales to hundreds of thousands of cores||No numbers published|
|NETWORK TOPOLOGY SUPPORT||Supported?|
|Flexible?||No. Only 5 fixed configurations are supported.|
|Priority based FCFS|
|Round Robin Queue order|
|By Queue order|
|Job Sort Formula|
|Multiple Queues and Queue Priorities|
|Sorting jobs on one or more keys e.g. resource|
Summary – PBS Professional
PBS Professional is an excellent solution as a job management system for Windows and Linux platforms.
PBS Professional provides:
PBS Professional can be installed on existing Windows and Linux HPC Clusters and replace legacy job management systems, with a modern fully supported system.