Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired end inconsistency in tophat SAM

    Hi there,

    Apologies if I missing something obvious here...
    In the SAM output of tophat (accepted_hits.sam) I found read pairs where the two reads mapped to different chromosomes (3rd column). However the column MRNM (mate reference sequence, 7th column) has '=' meaning both reads mapped to the same reference.

    Here's an example:

    Code:
    EBRI093151_0001:8:100:268:457#NNNNNN	145	10	65535404	255	35M	=	85418059	0	ATTTGCACTATTACACTTAAATTGTTATCCTTTTT	_aaa`b^baaaa_b_baabaaababababbbabba	NM:i:2
    EBRI093151_0001:8:100:268:457#NNNNNN	97	5	85418059	255	35M	=	65535404	0	ATTTTTACATCAGTCCCTTTAACACAAATCCATAT	aabaabaabaaaa^a_a`a`^``a_a_aa_^]`aa	NM:i:0
    I discovered this while running HTSeq to count reads mapping to GTF features. I get the warning:
    Code:
    Warning: Incorrect 'proper_pair' flag value for read pair EBRI093151_0001:8:100:269:1653#NNNNNN
    For info, this is the tophat command I used:
    Code:
    tophat \
      --output-dir /exports/.../20100122_RNAseq_CTRL/ \
      --mate-inner-dist 130 \
      --mate-std-dev 30 \
      --solexa1.3-quals \
      --GFF /exports/.../Sus_scrofa.Sscrofa9.56.gff3 \
      /exports/.../bowtie/current/indexes/Sscrofa9.56_plus_Human_ribosomal_DNA_complete_repeating_unit \
      /exports/.../RNAseq_CTRL_1_35bp.fastq \
      /exports/.../RNAseq_CTRL_2_35bp.fastq
    Is this a bug in tophat (tophat-1.0.13)? Does anyone know whether such pairs are reliable?

    Many thanks

    Dario

  • #2
    Originally posted by dariober View Post
    Hi there,

    Apologies if I missing something obvious here...
    In the SAM output of tophat (accepted_hits.sam) I found read pairs where the two reads mapped to different chromosomes (3rd column). However the column MRNM (mate reference sequence, 7th column) has '=' meaning both reads mapped to the same reference.

    Here's an example:

    Code:
    EBRI093151_0001:8:100:268:457#NNNNNN	145	10	65535404	255	35M	=	85418059	0	ATTTGCACTATTACACTTAAATTGTTATCCTTTTT	_aaa`b^baaaa_b_baabaaababababbbabba	NM:i:2
    EBRI093151_0001:8:100:268:457#NNNNNN	97	5	85418059	255	35M	=	65535404	0	ATTTTTACATCAGTCCCTTTAACACAAATCCATAT	aabaabaabaaaa^a_a`a`^``a_a_aa_^]`aa	NM:i:0
    I discovered this while running HTSeq to count reads mapping to GTF features. I get the warning:
    Code:
    Warning: Incorrect 'proper_pair' flag value for read pair EBRI093151_0001:8:100:269:1653#NNNNNN
    For info, this is the tophat command I used:
    Code:
    tophat \
      --output-dir /exports/.../20100122_RNAseq_CTRL/ \
      --mate-inner-dist 130 \
      --mate-std-dev 30 \
      --solexa1.3-quals \
      --GFF /exports/.../Sus_scrofa.Sscrofa9.56.gff3 \
      /exports/.../bowtie/current/indexes/Sscrofa9.56_plus_Human_ribosomal_DNA_complete_repeating_unit \
      /exports/.../RNAseq_CTRL_1_35bp.fastq \
      /exports/.../RNAseq_CTRL_2_35bp.fastq
    Is this a bug in tophat (tophat-1.0.13)? Does anyone know whether such pairs are reliable?

    Many thanks

    Dario
    The current version (1.1.2) checks the distance between the paired reads. If the distance is longer than #mate-inner-dist + mate-std-dist#, it doesn't output such a pair. So, I think it's probably a bug in the version (1.0.13) you use. Can you please run it again with 1.1.2 or we'll release 1.1.3 soon with some fix in strand-specific RNA-Seq and some additions.

    Comment


    • #3
      Originally posted by Daehwan View Post
      The current version (1.1.2) checks the distance between the paired reads.
      Thanks Daehwan, I will switch to 1.1.2. By the way, in some post, which I can't find again, I think I read that tophat 1.1.x doesn't require to specify the options --mate-inner-dist and --mate-std-dist since it will infer them by its own. Is this correct?
      If the distance is longer than #mate-inner-dist + mate-std-dist#, it doesn't output such a pair.
      Isn't a distance of "mate-inner-dist + mate-std-dist" too conservative to reject a 'proper pair'? I would say that something like mate-inner-dist + 2(or 3)stdev is still acceptable (just a thought...).
      So, I think it's probably a bug in the version (1.0.13) you use. Can you please run it again with 1.1.2 or we'll release 1.1.3 soon with some fix in strand-specific RNA-Seq and some additions.
      Looking forward to new updates then!

      All the best
      Dario

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X