Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat2 - reads removed from the analysis?

    Hello
    We analyse Solid 5500 RNA seq reads. I used tophat2 for the alignment. We first performed the analysis ONLY on 2 chr in order to have a quick reply (that's why our number of reads aligned - see below - is very low)
    We noticed some reads were "missing" at the end of the pipeline and we're wondering why.

    These are .info files
    ::::::::::::::
    left_kept_reads.info
    ::::::::::::::
    min_read_len=50
    max_read_len=50
    reads_in =25357934
    reads_out=25319876
    ::::::::::::::
    right_kept_reads.info
    ::::::::::::::
    min_read_len=35
    max_read_len=35
    reads_in =25357934
    reads_out=25283245

    and samtools flagstat done on accepted_hits.bam

    1552361 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    1552361 + 0 mapped (100.00%:-nan%)
    1552361 + 0 paired in sequencing
    769043 + 0 read1
    783318 + 0 read2
    549840 + 0 properly paired (35.42%:-nan%)
    661184 + 0 with itself and mate mapped
    891177 + 0 singletons (57.41%:-nan%)
    5522 + 0 with mate mapped to a different chr
    5522 + 0 with mate mapped to a different chr (mapQ>=5)

    and 15435015 reads are in the unmapped.bam files

    --> so that we've got 15.435.015 unmapped + 1.552.361 mapped ~ 17.000.000 reads have been analysed.
    --> we had 25.357.934 + 25.357.934 reads_in to analyse ~ 50.000.000 reads were available for the analysis.

    We're wondering where are the other reads. We expected summing the number of reads in accepted + unmapped bam files would lead to the number of reads_in but it's not the cas. If you have any explanation may I ask you to help us please?
    Thanks a lot for your time

  • #2
    any idea please?
    thanks

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Genetic Variation in Immunogenetics and Antibody Diversity
      by seqadmin



      The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
      11-06-2024, 07:24 PM
    • seqadmin
      Choosing Between NGS and qPCR
      by seqadmin



      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
      10-18-2024, 07:11 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 11:09 AM
    0 responses
    24 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Today, 06:13 AM
    0 responses
    20 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 11-01-2024, 06:09 AM
    0 responses
    30 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 10-30-2024, 05:31 AM
    0 responses
    21 views
    0 likes
    Last Post seqadmin  
    Working...
    X