Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • extract_only_concordant_paired_reads_from_bam_file

    Hi everyone,

    I am trying to extract only concordant paired reads from my bam file using a command that I found in posts with similar subject:

    https://www.biostars.org/p/95929/

    https://www.biostars.org/p/119316/

    https://broadinstitute.github.io/pic...ain-flags.html

    However the command that I use:

    samtools view -b -f 0x2 accepted_hits.bam -o accepted_hits_conc.bam

    results to the production of the accepted_hits_conc.bam file, which is much smaller than expected.

    Specifically the original accepted_hits.bam file has size 1.4 Gb while the accepted_hits_conc.bam has size only 141 Mb.

    Also by running samtools I get:

    samtools view -c accepted_hits.bam
    34745648

    samtools view -c accepted_hits_conc.bam
    3188162

    Now what troubles me is that the align_summary.txt returned by tophat2 reports:

    Left reads:
    Input : 18363154
    Mapped : 17461735 (95.1% of input)
    of these: 1424801 ( 8.2%) have multiple alignments (1497233 have >1)
    Right reads:
    Input : 18363154
    Mapped : 17283913 (94.1% of input)
    of these: 1424801 ( 8.2%) have multiple alignments (1485742 have >1)
    94.6% overall read mapping rate.

    Aligned pairs: 16585377
    of these: 1424801 ( 8.6%) have multiple alignments
    3788640 (22.8%) are discordant alignments
    69.7% concordant pair alignment rate.

    If there are 16585377 aligned pairs from an input of 18363154 paired end reads and a 69.7% concordant pair alignment rate, why do I get so small output from samtools?

    Also I saw in other posts that using this approach is preferable to using the --no-discordant option of tophat2. But why is that?

    Shouldn't I get exactly the concordant paired reads as output, if I specify the --no-discordant and --no-mixed options in tophat2?

  • #2
    Answering your first question about samtools. One thing you could try is to see exactly what flags you have. Try
    Code:
    samtools view accepted_hits.bam | cut -f 2 | sort | uniq -c
    That will, at least, tell you what you have in the file.

    Comment


    • #3
      Thank you westerman

      I saw exactly the flags and the corresponding reads for each one.

      Wouid it be correct if I wanted to extract reads with certain flags (specifically 67, 131, 115 and 179) to do so with:

      samtools view -b -f number_of_flag accepted_hits.bam -o flag_accepted_hits.bam

      and then merge them together with:

      samtools merge output_accepted_hits.bam flag1_accepted_hits.bam ...

      It may be a silly question but with the samtools merge, the reads merge or they concatenate into a single bam file?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin



        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        05-24-2024, 01:16 PM
      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 05-24-2024, 07:15 AM
      0 responses
      195 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-23-2024, 10:28 AM
      0 responses
      217 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-23-2024, 07:35 AM
      0 responses
      218 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-22-2024, 02:06 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Working...
      X