salmon

Salmon is a non-alignment method for quantifying expression using a kmer based method that can be used for both reference and de novo assembly-based expression quantification. In the literature and in my experience, Salmon gives essentially identical results to Kallisto. But in my experience, Salmon is much less likely to fail. I recommend Salmon for standard differential expression analysis gribskovgribskov

Requirements

Reference transcripts in Fasta format – not the reference genome
Read files for each sample in Fastq format

Installing using conda

My script below assumes that you are using a locally installed version of Salmon installed in an anaconda environment called salmon. You may be able to use my environment, but the installation is so easy, maybe you should just install your own copy.

$ module load anaconda
$ conda config —add channels conda-forge
$ conda config —add channels bioconda
$ conda create -n salmon salmon
$ conda activate salmon (this only needs to be done if using interactively, and of course, in the script)

Notes

  • if your data is recent you probably have a stranded library, that is, all the r1 and r2 have a consistent orientation with respect to the coding sequence r1 > < r2
  • if you see very low read mapping, one possibility is you have an unstranded library so the orientation is random
  • the -l option specifies the library type -l ISF means an inward facing stranded library (the most common case)
  • Salmon can be run in mapping mode, which is very fast or in alignment mode. My reading of the manual is that alignment mode is slowere (because you have to map the reads to the transcriptome sequences) but is more accurate and yields higher counts. gribskovgribskov

Salmon mapping mode

#!/bin/bash
#SBATCH --job-name salmon
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --account standby
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --time=4:00:00

#--------------------------------------------------------------------------------------------------
# Salmon - Count RNA reads without mapping
# Runs salmon installed in /depot/mgribsko/apps
#
#   sbatch salmon.slurm         # run on cluster
#   salmon.slurm                # generate commands for examination before running
#--------------------------------------------------------------------------------------------------
data="../split"
r1suffix=_r1.fastq.gz
r2suffix=_r2.fastq.gz
target=$data/C*$r1suffix
#C16T72R2_r1.fastq.gz

index=false
#index=false
reference_fasta=../potato.trinity.fasta
reference_index=trinity.idx
#exe=/depot/mgribsko/apps/bin
exe=/scratch/bell/mgribsko/salmon-1.9.0_linux_x86_64/bin
#--------------------------------------------------------------------------------------------------
# begin script - hopefully nothing below needs to be changed
#--------------------------------------------------------------------------------------------------
echo -e "#--------------------------------------------------------------------------------------------------"
echo "Salmon read quantification"
echo "reference: $reference_fasta"
echo "target sequences:$target"
echo -e "#--------------------------------------------------------------------------------------------------"

shopt -s nullglob

debug=true
if [ $SLURM_SUBMIT_HOST ]; then
    echo -e "#--------------------------------------------------------------------------------------------------"
    echo -e "# SLURM detected - Commands will be executed"
    echo -e "#--------------------------------------------------------------------------------------------------"
    echo -e "Setting working directory to $SLURM_SUBMIT_DIR\n"
    cd $SLURM_SUBMIT_DIR
    debug=false
    # this is for a version of salmon i installed locally in my directory
    module load anaconda
#    source activate salmon
else
    echo -e "#--------------------------------------------------------------------------------------------------"
    echo -e "# Debug Mode - Commands will NOT be executed"
    echo -e "#--------------------------------------------------------------------------------------------------"
fi

#--------------------------------------------------------------------------------------------------
# Create index
# salmon index -t athal.fa.gz -i athal_index
#--------------------------------------------------------------------------------------------------

shopt -s nullglob

if $index; then
    com_i="$exe/salmon index -t $reference_fasta -i $reference_index"
    dat
    echo -e "Building Salmon index $reference_index from $reference_fasta"
    echo -e  "$com_i\n"
    if ! $debug; then
        $com_i
    fi
else
    echo -e "Using existing index $reference_index\n"
fi

#--------------------------------------------------------------------------------------------------
# process all samples -- in this case all files with the suffix .paired.fastq
# make a list of samples based on the R1 files, R2 file names are created automatically
#      --incompatPrior 1 \
#      -validateMappings \
#--------------------------------------------------------------------------------------------------
echo -e "Counting $target reads with Salmon\n"
for r1 in $target; do

    # generate r2 name from r1
    r2="${r1/$r1suffix/$r2suffix}"

    # generate  output file name from r1 by removing directory and suffix
    out=${r1##\.*/}
    out=${out%$r1suffix*}
    out="$out.salmon"

    command="$exe/salmon quant --index $reference_index \
      -p 16 \
      -l ISF \
      -1 $r1 \
      -2 $r2 \
      -o $out"

    date
    echo -e "command: $command\n"
    if ! $debug; then
        #echo 'submitting'
        $command &
    fi

done
wait

date

#--------------------------------------------------------------------------------------------------
#
# salmon v1.0.0
#
# Usage:  salmon -h|--help or
#         salmon -v|--version or
#         salmon -c|--cite or
#         salmon [--no-version-check] <COMMAND> [-h | options]
#
# Commands:
#      index Create a salmon index
#      quant Quantify a sample
#      alevin single cell analysis
#      swim  Perform super-secret operation
#      quantmerge Merge multiple quantifications into a single file
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License