Salmon is a non-alignment method for quantifying expression using a kmer based method that can be used for both reference and de novo assembly-based expression quantification. In the literature and in my experience, Salmon gives essentially identical results to Kallisto. But in my experience, Salmon is much less likely to fail. I recommend Salmon for standard differential expression analysis gribskov
Requirements
Reference transcripts in Fasta format – not the reference genome
Read files for each sample in Fastq format
Installing using conda
My script below assumes that you are using a locally installed version of Salmon installed in an anaconda environment called salmon. You may be able to use my environment, but the installation is so easy, maybe you should just install your own copy.
$ module load anaconda
$ conda config —add channels conda-forge
$ conda config —add channels bioconda
$ conda create -n salmon salmon
$ conda activate salmon (this only needs to be done if using interactively, and of course, in the script)
Notes
- if your data is recent you probably have a stranded library, that is, all the r1 and r2 have a consistent orientation with respect to the coding sequence r1 > < r2
- if you see very low read mapping, one possibility is you have an unstranded library so the orientation is random
- the -l option specifies the library type -l ISF means an inward facing stranded library (the most common case)
- Salmon can be run in mapping mode, which is very fast or in alignment mode. My reading of the manual is that alignment mode is slowere (because you have to map the reads to the transcriptome sequences) but is more accurate and yields higher counts. gribskov
Salmon mapping mode
#!/bin/bash
#SBATCH --job-name salmon
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --account standby
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --time=4:00:00
#--------------------------------------------------------------------------------------------------
# Salmon - Count RNA reads without mapping
# Runs salmon installed in /depot/mgribsko/apps
#
# sbatch salmon.slurm # run on cluster
# salmon.slurm # generate commands for examination before running
#--------------------------------------------------------------------------------------------------
data="../split"
r1suffix=_r1.fastq.gz
r2suffix=_r2.fastq.gz
target=$data/C*$r1suffix
#C16T72R2_r1.fastq.gz
index=false
#index=false
reference_fasta=../potato.trinity.fasta
reference_index=trinity.idx
#exe=/depot/mgribsko/apps/bin
exe=/scratch/bell/mgribsko/salmon-1.9.0_linux_x86_64/bin
#--------------------------------------------------------------------------------------------------
# begin script - hopefully nothing below needs to be changed
#--------------------------------------------------------------------------------------------------
echo -e "#--------------------------------------------------------------------------------------------------"
echo "Salmon read quantification"
echo "reference: $reference_fasta"
echo "target sequences:$target"
echo -e "#--------------------------------------------------------------------------------------------------"
shopt -s nullglob
debug=true
if [ $SLURM_SUBMIT_HOST ]; then
echo -e "#--------------------------------------------------------------------------------------------------"
echo -e "# SLURM detected - Commands will be executed"
echo -e "#--------------------------------------------------------------------------------------------------"
echo -e "Setting working directory to $SLURM_SUBMIT_DIR\n"
cd $SLURM_SUBMIT_DIR
debug=false
# this is for a version of salmon i installed locally in my directory
module load anaconda
# source activate salmon
else
echo -e "#--------------------------------------------------------------------------------------------------"
echo -e "# Debug Mode - Commands will NOT be executed"
echo -e "#--------------------------------------------------------------------------------------------------"
fi
#--------------------------------------------------------------------------------------------------
# Create index
# salmon index -t athal.fa.gz -i athal_index
#--------------------------------------------------------------------------------------------------
shopt -s nullglob
if $index; then
com_i="$exe/salmon index -t $reference_fasta -i $reference_index"
dat
echo -e "Building Salmon index $reference_index from $reference_fasta"
echo -e "$com_i\n"
if ! $debug; then
$com_i
fi
else
echo -e "Using existing index $reference_index\n"
fi
#--------------------------------------------------------------------------------------------------
# process all samples -- in this case all files with the suffix .paired.fastq
# make a list of samples based on the R1 files, R2 file names are created automatically
# --incompatPrior 1 \
# -validateMappings \
#--------------------------------------------------------------------------------------------------
echo -e "Counting $target reads with Salmon\n"
for r1 in $target; do
# generate r2 name from r1
r2="${r1/$r1suffix/$r2suffix}"
# generate output file name from r1 by removing directory and suffix
out=${r1##\.*/}
out=${out%$r1suffix*}
out="$out.salmon"
command="$exe/salmon quant --index $reference_index \
-p 16 \
-l ISF \
-1 $r1 \
-2 $r2 \
-o $out"
date
echo -e "command: $command\n"
if ! $debug; then
#echo 'submitting'
$command &
fi
done
wait
date
#--------------------------------------------------------------------------------------------------
#
# salmon v1.0.0
#
# Usage: salmon -h|--help or
# salmon -v|--version or
# salmon -c|--cite or
# salmon [--no-version-check] <COMMAND> [-h | options]
#
# Commands:
# index Create a salmon index
# quant Quantify a sample
# alevin single cell analysis
# swim Perform super-secret operation
# quantmerge Merge multiple quantifications into a single file