Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with mapping of scriptSeq of RNASeq

    Hello everyone,

    I have been trying to process some RNAseq data prepared using the ScriptSeq protocol and sequenced on a HiSeq machine. My pipeline was to remove adaptors with trimmomatic, align with bowtie2 against the transcriptome and then use express to quantify the transcripts.

    However when I ran express I got a warning message about "The observed alignments appear disproportionately in the forward-reverse order". I have been trying to understand what could cause this to happen on paired ended data. After aligning each pair individual against the transcriptome I noticed the first pair aligns most of the time on the forward strand but the reverse pair seem to align on both strands. See below:

    pair_1
    **********************************************
    Stats for BAM file(s):
    **********************************************

    Total reads: 5975216
    Mapped reads: 2530358 (42.3476%)
    Forward strand: 5836104 (97.6719%)
    Reverse strand: 139112 (2.32815%)
    Failed QC: 0 (0%)
    Duplicates: 0 (0%)
    Paired-end reads: 0 (0%)

    pair_2
    **********************************************
    Stats for BAM file(s):
    **********************************************

    Total reads: 5994394
    Mapped reads: 2543964 (42.4391%)
    Forward strand: 3587426 (59.8463%)
    Reverse strand: 2406968 (40.1536%)
    Failed QC: 0 (0%)
    Duplicates: 0 (0%)
    Paired-end reads: 0 (0%)

    Shouldn't reads always align to the reverse strand on the second file or am I getting this wrong? And if so what could have cause this to happen? I am just puzzled by the data since the pairs are always supposed to be forward-reverse right?

  • #2
    Originally posted by Bacms View Post
    Shouldn't reads always align to the reverse strand on the second file or am I getting this wrong? And if so what could have cause this to happen? I am just puzzled by the data since the pairs are always supposed to be forward-reverse right?
    No, both reads have a 50% chance of aligning to both strands. The bias for read 1 is very strange. Perhaps you could share your command lines, which may be helpful.

    Also, what organism is it? And is there a reason you are mapping with bowtie2 rather than an RNA-seq aligner, and mapping reads as single-ended rather than paired? Also, posting the FastQC report may help.

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      No, both reads have a 50% chance of aligning to both strands. The bias for read 1 is very strange. Perhaps you could share your command lines, which may be helpful.
      But with the new Illumina protocol you are supposed to get strand specific reads right? Or do you expect to get 50% even with strand specific?

      Originally posted by Brian Bushnell View Post
      Also, what organism is it? And is there a reason you are mapping with bowtie2 rather than an RNA-seq aligner, and mapping reads as single-ended rather than paired? Also, posting the FastQC report may help.
      This is Chlamydomonas Reinhardtii and the reason for using bowtie is that as far as I can tell there is no way of using tophat/cufflinks as input to express but I may be wrong.
      What is the best way to attach the report from fastqc as the attachment size is to small to attach.

      Comment


      • #4
        Originally posted by Bacms View Post
        But with the new Illumina protocol you are supposed to get strand specific reads right? Or do you expect to get 50% even with strand specific?
        My mistake, I did not notice you were mapping to the transcriptome. When mapping to the genome you would expect 50-50 because half the transcripts should be on each strand, but transcriptome mapping with a strand-specific protocol indeed should have read 1 map almost entirely to one strand and read 2 to the other, since they are all presented in the sense orientation.

        What is the best way to attach the report from fastqc as the attachment size is to small to attach.
        Hmmm... I think you can output it as a pdf which appears to have a 19MB size limit. Otherwise, just post the most relevant images individually, like base content, quality, and anything it fails.

        P.S. And I still recommend you post your mapping command line; you should perform the mapping on both reads at once and it's not clear to me if you are doing that.

        Comment


        • #5
          For ScriptSeq libraries, use –fr secondstrand. -fr secondstrand means that the strand being synthesized on the sequencer is the sense strand for Read 1.

          Olaf

          Comment


          • #6
            [QUOTE=Brian Bushnell;147990]My mistake, I did not notice you were mapping to the transcriptome. When mapping to the genome you would expect 50-50 because half the transcripts should be on each strand, but transcriptome mapping with a strand-specific protocol indeed should have read 1 map almost entirely to one strand and read 2 to the other, since they are all presented in the sense orientation.

            Ok yes I am aware that when you are aligning to the genome you should get ~50/50%


            Originally posted by Brian Bushnell View Post
            Hmmm... I think you can output it as a pdf which appears to have a 19MB size limit. Otherwise, just post the most relevant images individually, like base content, quality, and anything it fails.
            I will double check the fastqc for an option to output in pdf

            Originally posted by Brian Bushnell View Post
            P.S. And I still recommend you post your mapping command line; you should perform the mapping on both reads at once and it's not clear to me if you are doing that.
            Will do but since I am using a python script to perform the system commands I didn't have the individual commands for all steps. Here they are now:
            #Run fastqc
            Running fastqc v0.11.2
            fastqc --outdir=../results/140526_I453_FCC4LT4ACXX_L1_Index1/ ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_1.fq ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_2.fq

            #Running trimmomatic
            java -jar trimmomatic-0.32.jar PE -threads 24 -trimlog trim_log.txt ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_1.fq ../fastq/140526_I453_FCC4LT4ACXX_L1_Index1_2.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1P.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1U.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2P.fq ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2U.fq ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:20 MINLEN:40

            #Align against the transcriptome using bowtie
            bowtie2 -k 100 -p 22 --phred64 --un-conc ../results/140526_I453_FCC4LT4ACXX_L1_Index1/unmapped.fq -x Creinhardtii_281_v5.5.transcript -1 ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_1P.fq -2 ../results/140526_I453_FCC4LT4ACXX_L1_Index1/140526_I453_FCC4LT4ACXX_L1_Index1_2P.fq -S ../results/140526_I453_FCC4LT4ACXX_L1_Index1/cDNA.bowtie

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Genetic Variation in Immunogenetics and Antibody Diversity
              by seqadmin



              The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
              11-06-2024, 07:24 PM
            • seqadmin
              Choosing Between NGS and qPCR
              by seqadmin



              Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
              10-18-2024, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 11:09 AM
            0 responses
            23 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Today, 06:13 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 11-01-2024, 06:09 AM
            0 responses
            30 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-30-2024, 05:31 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Working...
            X