Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert bam paired alignment to fastq

    Hi,

    I would like to take a a paired-end alignment from Bowtie in bam and get the full aligned sequence from the reference without using a GUI viewer?

    Thanks,
    -Sam

  • #2
    I'm not sure if your question makes much sense without knowing more of what you want as output. You have:

    1) A reference sequence
    2) A BAM file containing the reads aligned to the reference.

    'Full aligned sequence' means ... what?

    Generally at this point people want either

    a) The SNP/Indels
    b) The consensus sequence
    c) The pileup sequence

    In all cases you can use the 'mpileup' part of 'samtools' to get the above information. Google and/or look through the posting on seqanswers in order to find out the exact command to use.

    If, by chance, you want something different than the possibilities above then let us know explicitly what output you expect.

    Comment


    • #3
      check out samtools.

      Comment


      • #4
        Yes, out of context it probably doesn't make much sense.
        I have a string of twenty short 5 bp reads with 100 to 400 bp gaps. It's similar to Complete Genomics four tag reads.

        There's no aligner that I've found that takes n-mates (they only take paired mates).
        I'm doing a first pass by running the twenty as consecutive sets of pairs: e.g. generate all matches for pairs 1 and 2. Pull the sequence they map to out of the reference call it 1_2. Align 1_2 with 3. That gives 1_2_3, align 1_2_3 with mate 4, get 1_2_3_4 etc. Keep going until all twenty are iteratively mapped in pairs.

        To do this I need the full sequence of 1_2 which is going to be 110 to 410 bp.

        The best way I've come up to do this is as fasta for now (if the proof of concept works I'll go back and carry through the quality scores:

        bowtie -p 4 -a -f -I 110 -X 410 --ff -v 0 -S -y hg19.fasta -1 1.fa -2 2.fa > 1_2.sam 2> 1_2.info


        # Create a sorted bam file
        samtools view -f 3 -F 12 -S -b 1_2.sam > 1_2.bam
        samtools sort 1_2.bam 1_2_sorted
        samtools index 1_2_sorted.bam 1_2_sorted.fai
        samtools fixmate 1_2_sorted.bam 1_2_sorted_fixmate.bam

        # Create a paired end bed file
        bamToBed -bedpe -mate1 -i 1_2_sorted_fixmate.bam > 1_2_sorted_fixmate.bedpe

        # Take the start position of pair 1 and the end position of pair 2 and make a bed file, the gsed adds a + to indicate it's in the plus direction.
        cut -f 1,2,6,7 1_2_sorted_fixmate.bedpe | gsed 's/$/\t+/' > 1_2_sorted_fixmate_region.bed

        # get the sequence from the reference
        bedtools getfasta -name -fi hg19.fasta -bed 1_2_sorted_fixmate_region.bed -fo 1_2.fa

        # Use the 1_2 as the mate 1 and continue down the line:
        bowtie -p 4 -a -f -I 110 -X 410 --ff -v 0 -S -y hg19.fasta -1 1_2.fa -2 3.fa > 1_2_3.sam 2> 1_2_3.info


        Really I need an aligner that can take twenty very short-reads with known gap distributions.

        Comment


        • #5
          twenty short 5 bp reads with 100 to 400 bp gaps

          You mean 50bp, right?

          Comment


          • #6
            It is 5 bp. I know the order they occur in and the gap distribution. It's kind of like Complete genomics 5-10-10-10 taged read, except more shorter reads and larger gaps.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Developments in Metagenomics
              by seqadmin





              Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
              09-23-2024, 06:35 AM
            • seqadmin
              Understanding Genetic Influence on Infectious Disease
              by seqadmin




              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
              09-09-2024, 10:59 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 10-02-2024, 04:51 AM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-01-2024, 07:10 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-30-2024, 08:33 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-26-2024, 12:57 PM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Working...
            X