Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Realigning unmapped reads to a second reference

    I am looking for suggestions for the following issue/task:

    I am using a reference genome for which one of my scaffolds is an incomplete (e.g. is a much smaller subset of one of the genomic regions that I am after based on the complete genome of a closely related species). All other scaffolds are relatively complete for my focal species.

    What I would like to do is first align my Illumina paired-end reads to my focal species' reference (all scaffolds) and then take the reads that align to my short scaffold PLUS the discarded reads and realign them to the more complete version of this particular region from the closely-related model organism in order to pick up any additional reads that might cover that might align to that region.

    I think the approach outlined below will work, but I'm looking for feedback and answers to the questions in my comments before I proceed with my big files!

    # After initial trimming, alignment with bwa and removal of duplicates with Piccard, get reads aligning to target scaffold based on bed file
    # ??? does removing duplicates with Piccard keep unpaired reads ???

    samtools view -L scaffoldName myAlign.bam -b > myScaffold.bam

    # get discarded reads (reads that are paired but unmapped or with a mate that is unmapped)
    # ??? Are these the right flags for this??? I'm most uncertain about the 0x0001 flag ???

    samtools view -F -0x0001 -f 0x0004 -f 0x0008 -b myAlign.bam -b > discarded.bam

    # merge target aligned and discarded

    samtools merge targetDiscarded.bam

    # use Picard's SamToFastq to convert merged bam to fastq
    # ??? based on the above the unpaired fastq should be empty right ???

    java -Xmx2g SamToFastq.jar INPUT=targetDiscarded.bam FASTQ=r1.fastq SECOND_END_FASTQ=r2.fastq UNPAIRED_FASTQ=unpaired.fastq

    # align to second species ref with bwa mem

    bwa mem -M modelSpeciesRef r1.fastq r2.fastq unpaired.fastq > finalAlignment.sam

    ??? I know there is also a way to do this using the aln and sampe function in samtools...how does this compare?

    Thanks in advance for any advice/help

  • #2
    Why not just concatenate your reference with the model organism's scaffold and map to the combined fasta? I guess I'm not really sure what your goal is.

    Comment


    • #3
      Because we do have part of the region from our target species, I was hoping to make use of this information...In theory if the target and model are close enough it shouldn't matter but I do think it would be good to try it both ways...

      Comment


      • #4
        You can use BBSplit to map to two references simultaneously. In your case, you could set "ref=organism.fa,scaffold.fa" and then the reads would just go to whichever they match best. If you use the flag "ambig2=all" then any read that maps equally well to both references would be assigned to both of them, while if you set "ambig2=best" then it would preferentially map to the first reference, which is probably what you want.

        Comment


        • #5
          Unfortunately I'm constrained to use BWA as my project is part of a larger project with the alignment step of the pipeline already in place...

          I'm really asking if there are any flaws in my thinking as to what the script I have indicated is doing...(e.g. the different flags and file conversions)...

          Thanks

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Genetic Variation in Immunogenetics and Antibody Diversity
            by seqadmin



            The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
            11-06-2024, 07:24 PM
          • seqadmin
            Choosing Between NGS and qPCR
            by seqadmin



            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
            10-18-2024, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 11:09 AM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Today, 06:13 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 11-01-2024, 06:09 AM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-30-2024, 05:31 AM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Working...
          X