Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with mapping of scriptSeq of RNASeq

    Hello everyone,

    I have been trying to process some RNAseq data prepared using the ScriptSeq protocol and sequenced on a HiSeq machine. My pipeline was to remove adaptors with trimmomatic, align with bowtie2 against the transcriptome and then use express to quantify the transcripts.

    However when I ran express I got a warning message about "The observed alignments appear disproportionately in the forward-reverse order". I have been trying to understand what could cause this to happen on paired ended data. After aligning each pair individual against the transcriptome I noticed the first pair aligns most of the time on the forward strand but the reverse pair seem to align on both strands. See below:

    pair_1
    **********************************************
    Stats for BAM file(s):
    **********************************************

    Total reads: 5975216
    Mapped reads: 2530358 (42.3476%)
    Forward strand: 5836104 (97.6719%)
    Reverse strand: 139112 (2.32815%)
    Failed QC: 0 (0%)
    Duplicates: 0 (0%)
    Paired-end reads: 0 (0%)

    pair_2
    **********************************************
    Stats for BAM file(s):
    **********************************************

    Total reads: 5994394
    Mapped reads: 2543964 (42.4391%)
    Forward strand: 3587426 (59.8463%)
    Reverse strand: 2406968 (40.1536%)
    Failed QC: 0 (0%)
    Duplicates: 0 (0%)
    Paired-end reads: 0 (0%)

    Shouldn't reads always align to the reverse strand on the second file or am I getting this wrong? And if so what could have cause this to happen? I am just puzzled by the data since the pairs are always supposed to be forward-reverse right?

  • #2
    Originally posted by Bacms View Post
    Shouldn't reads always align to the reverse strand on the second file or am I getting this wrong? And if so what could have cause this to happen? I am just puzzled by the data since the pairs are always supposed to be forward-reverse right?
    No, both reads have a 50% chance of aligning to both strands. The bias for read 1 is very strange. Perhaps you could share your command lines, which may be helpful.

    Also, what organism is it? And is there a reason you are mapping with bowtie2 rather than an RNA-seq aligner, and mapping reads as single-ended rather than paired? Also, posting the FastQC report may help.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      No, both reads have a 50% chance of aligning to both strands. The bias for read 1 is very strange. Perhaps you could share your command lines, which may be helpful.
      But with the new Illumina protocol you are supposed to get strand specific reads right? Or do you expect to get 50% even with strand specific?

      Originally posted by Brian Bushnell View Post
      Also, what organism is it? And is there a reason you are mapping with bowtie2 rather than an RNA-seq aligner, and mapping reads as single-ended rather than paired? Also, posting the FastQC report may help.
      This is Chlamydomonas Reinhardtii and the reason for using bowtie is that as far as I can tell there is no way of using tophat/cufflinks as input to express but I may be wrong.
      What is the best way to attach the report from fastqc as the attachment size is to small to attach.

      Comment


      • #4
        Originally posted by Bacms View Post
        But with the new Illumina protocol you are supposed to get strand specific reads right? Or do you expect to get 50% even with strand specific?
        My mistake, I did not notice you were mapping to the transcriptome. When mapping to the genome you would expect 50-50 because half the transcripts should be on each strand, but transcriptome mapping with a strand-specific protocol indeed should have read 1 map almost entirely to one strand and read 2 to the other, since they are all presented in the sense orientation.

        What is the best way to attach the report from fastqc as the attachment size is to small to attach.
        Hmmm... I think you can output it as a pdf which appears to have a 19MB size limit. Otherwise, just post the most relevant images individually, like base content, quality, and anything it fails.

        P.S. And I still recommend you post your mapping command line; you should perform the mapping on both reads at once and it's not clear to me if you are doing that.

        Comment


        • #5
          For ScriptSeq libraries, use –fr secondstrand. -fr secondstrand means that the strand being synthesized on the sequencer is the sense strand for Read 1.

          Olaf

          Comment


          • #6
            [QUOTE=Brian Bushnell;147990]My mistake, I did not notice you were mapping to the transcriptome. When mapping to the genome you would expect 50-50 because half the transcripts should be on each strand, but transcriptome mapping with a strand-specific protocol indeed should have read 1 map almost entirely to one strand and read 2 to the other, since they are all presented in the sense orientation.

            Ok yes I am aware that when you are aligning to the genome you should get ~50/50%


            Originally posted by Brian Bushnell View Post
            Hmmm... I think you can output it as a pdf which appears to have a 19MB size limit. Otherwise, just post the most relevant images individually, like base content, quality, and anything it fails.
            I will double check the fastqc for an option to output in pdf

            Originally posted by Brian Bushnell View Post
            P.S. And I still recommend you post your mapping command line; you should perform the mapping on both reads at once and it's not clear to me if you are doing that.
            Will do but since I am using a python script to perform the system commands I didn't have the individual commands for all steps. Here they are now:
            #Run fastqc
            Running fastqc v0.11.2
            fastqc --outdir=../results/140526_I453_FCC4LT4ACXX_L1_Index1/ ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_1.fq ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_2.fq

            #Running trimmomatic
            java -jar trimmomatic-0.32.jar PE -threads 24 -trimlog trim_log.txt ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_1.fq ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_2.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1P.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1U.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2P.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2U.fq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:40

            #Align against the transcriptome using bowtie
            bowtie2 -k 100 -p 22 --phred64 --un-conc ../results/140526_I453_FCC4LT4ACXX_L1_Index1/unmapped.fq -x Creinhardtii_281_v5.5.transcript -1 ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1P.fq -2 ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2P.fq -S ../results/140526_I453_FCC4LT4ACXX_L1_Index1/cDNA.bowtie

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Working...
            X