This is a description of the nf-core/denovotranscript pipeline for de novo transcriptome assembly of paired-end short reads from bulk RNA-seq.

nf-core/transfuse metro map

  1. Read QC of raw reads (FastQC)
  2. Adapter and quality trimming (fastp)
  3. Read QC of trimmed reads (FastQC)
  4. Remove rRNA or mitochondrial DNA (optional) (SortMeRNA)
  5. Transcriptome assembly using any combination of the following:

  6. Redundancy reduction with Evidential Gene tr2aacds. A transcript to gene mapping is produced from Evidential Gene’s outputs using gawk.
  7. Assembly completeness QC (BUSCO)
  8. Other assembly quality metrics (rnaQUAST)
  9. Transcriptome quality assessment with TransRate, including the use of reads for assembly evaluation. This step is not performed if profile is set to conda or mamba.
  10. Pseudo-alignment and quantification (Salmon)
  11. HTML report for raw reads, trimmed reads, BUSCO, and Salmon (MultiQC)

1. Set-up samplesheet

prepare a samplesheet with your input data (each row represents a pair of fastq files (paired end)) that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,AEG588A4_S4_L003_R2_001.fastq.gz
TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,AEG588A5_S5_L003_R2_001.fastq.gz
TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,AEG588A6_S6_L003_R2_001.fastq.gz
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,AEG588A6_S6_L004_R2_001.fastq.gz

2. Set-up config file with parameter choices

  • run with rnaSPAdes using all 3 filters
  • no quantification because will run through nf-core RNAseq pipeline

3. Create slurm script

4. Assess transcriptomes and decide best to use as reference in RNAseq pipeline with reference

Plan is to start with Day 30 RNAseq datasets from Roberto’s 2023 thermotolerance study: https://doi.org/10.1016/j.cbd.2023.101089