About Projects News
RENCI at UNC-Chapel Hill
Home | About | Resources | Research Computing | Projects | News | Default Style

 LSF (Load Sharing Facility)


Overview

We use LSF (Load Sharing Facility) from Platform Computing, Inc. for job management. It helps us balance the workload on our central computational servers while giving you access to the software and hardware you need to get your work done regardless of where you are logged in.

LSF does load sharing within a cluster, or group of hosts. The hosts in the three LSF clusters include large SMP servers such as Cypress, which can run resource-intensive applications like Gaussian, as well as sets of smaller hosts, such as many of those in the Emerald cluster.

Other hosts in the cluster are clients such as the Isis login nodes or even personal workstations. Clients do not execute LSF jobs, but they do provide all of the LSF commands that allow you to submit jobs and monitor the progress of jobs remotely from the client. These client commands are available on the servers as well. To verify that the LSF client commands are available on your system, use the lsid command: if the lsid command works, then all of the other LSF commands (e.g., bsub , bjobs ) should work as well, and the LSF man pages should be available.

Jobs are programs or commands you [ http://help.unc.edu/?id=4488 ] submit to a [ http://help.unc.edu/?id=4485 ] queue for scheduling and execution in an LSF [ http://help.unc.edu/?id=4487 ] cluster. You can [ http://help.unc.edu/?id=4489 ] monitor your jobs while they are in the queue. All LSF jobs run in queues, even interactive programs. A queue is associated with one or more servers, and has various limits defined, such as the number of jobs that can run at the same time. The bqueues command lists all of the queues currently defined for your cluster.

Each server also has resources associated with it, such as the amount of memory, CPU type and speed, or type of operating system. An example would be a Sun Fire 15K with 24 1.05 GHz CPUs and 48 GB of memory running Solaris 9. Another type of resource might be a specific software application such as SAS or Stata. The lshosts command displays the characteristics of the hosts including which resources each host has. To see the characteristics of a single host use lshosts [host_name] . For example: lshosts bc13-n12 run on Emerald.

You can give LSF a particular set of resource requirements and let it find the best server on which to run your job. If more than one server meets your criteria, it will run your job on the server that has the lightest load. Use the lshosts command to see what resources are available in the cluster.

On the conifers cluster (cedar/cypress) there is no AFS service at all. On the coral cluster (Emerald) AFS files are available to the hosts and the Emerald login nodes but LSF does not automatically renew tokens. Therefore, be aware of which cluster you are using for they are not the same.

Learn more

Click on the following links to learn more about LSF.

FAQs

Please read [ http://help.unc.edu/?id=6273 ] LSF Common FAQs regarding the following questions useful for users new to this service.

  • What is fairshare scheduling?

  • How do I tell LSF to put my job output in a file instead of sending it to me by email?

  • How many LSF jobs can I run at the same time?

  • How can I access a larger temp directory space for jobs submitted with LSF?

Additional help

[ http://its.unc.edu/research-computing.html ] Research Computing home page

UNC Home | About | Resources | Research Computing | Projects | News | Text Style
RENCI @ UNC-Chapel Hill | ITS Manning | Chapel Hill, North Carolina 27117
phone: | fax: