Project Set-up

Project will be completed on UW’s Klone server

  • Login: ssh xx@klone.hyak.uw.edu with username replacing xx, provide password, and confirm DUO push notification

Roberts lab information on Klone node: https://robertslab.github.io/resources/klone_Data-Storage-and-System-Organization/

Server structure:

[xx@klone-login03 ~]$ cd ../../
[xx@klone-login03 mmfs1]$ ls
admin  apsearch  data  encrypted  gscratch  home  slurmdata  ssg  sw

## path to our working space on srlab: /mmfs1/gscratch/srlab/<shelly-UW_NetID>

Storage:

  • Login node (/mmfs1/home/<UW_NetID>): 10 GB
  • Roberts lab has 1 TB in srlab folder (/gscratch/srlab/)
  • Temporary storage (“Scrubbed”): 200 TB in user folder (/gscratch/scrubbed/<UW_NetID>)

To see space and file utilization: hyakstorage

Data management resources

Steven Roberts lab handbook: https://robertslab.github.io/resources/Data-Management/
UW’s system: https://hyak.uw.edu/systems
UW’s how to run job: https://hyak.uw.edu/docs/hyak101/basics/jobs

Tips and tricks

squeue -A srlab: check what jobs are being run on srlab
squeue -A srlab -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %R %c %m": check jobs and show CPUs and memory used by each job
hyakalloc -g srlab: check how many resources are in use and free for use
hyakalloc -p ckpt: check what resources are currently available on all of hyak
squeue | grep <username>: check what jobs you are running scontrol write batch_script <job id>: create a .sh file in the current directory showing the job that is currently running. This is helpful to see what step the pipeline is on and which sample it is processing. You can also see which work directory subfolder the job is being run in. These subfolders contain the file .command.run which are equivalen to the file created by the scontrol command. The file .command.sh is a bash script with the pipeline step that gets called within the .command.run slurm script. The file .command.log shows the screen output from job, for instance with the bismark align command you can see how many sequences are being processed.

Hyak info

https://robertslab.github.io/resources/klone_Running-a-Job/ Total GB: 450
Total cpu: 32
Partition to use: -p cpu-g2-mem2x

Shelly added config file for nextflow /gscratch/srlab/strigg/bin/uw_hyak_srlab.config that will use other nodes if nobody is using them. Below is an excerpt from the .config file:

process {
    executor = 'slurm'
    queue = { task.attempt == 1 ? 'ckpt' : 'cpu-g2-mem2x' }
    maxRetries = 1
    clusterOptions = { "-A srlab" }
    scratch = '/gscratch/scrubbed/srlab/'
    resourceLimits = [
                cpus: 16,
                memory: '150.GB',
                time: '72.h'
        ]
}

the queue variable first attempts to use the ckpt partition. If available the task will run and have an R when you squeue | grep <username>, otherwise it will have PD. Once the task is launched it has ~ 5 hours to run because the ckpt partition has a time limitation for jobs. If the job doesn’t complete in time, it will be reported as FAILED in the pipeline_trace.txt file that gets created in the outdir, and then the job will get submitted again through a new slurm script to be run on the cpu-g2-mem2x partition.This code block also includues resourceLimits set so that no job (slurm script for an individual sample) will run longer than 72 hours.