Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • swNGS
    Member
    • Nov 2011
    • 83

    Ideas on how to count the number of specific amplicons in aligned amplicon data

    I have a Haloplex library prep'd data. I would like to know if anyone can think of a way of counting the occurrence of a particular fragment that is derived from the original library once I've aligned the paired-end reads to the reference genome. I've assumed I can only do this from the aligned bam since the insert size, and hence the original fragment can be inferred his way.

    So, an example as follows:
    Fragment A drives from genomic position chr12 12345678 - 12345780

    How many times does this feature in the aligned bam file?
    I dont want to know the read depth across this fragment/insert ( could easily do that with bedtools).

    The flip side of this question, is how about generating a frequency of occurrence of all inferred inserts in the aligned file? With genomic aligned coordrs of each inferred insert? Not a lookup this time, just a count of how many exist in the aligned bam file

    Thanks
  • ETHANol
    Senior Member
    • Feb 2010
    • 308

    #2
    intersectBed from Bedtools. Can that do what you are trying to do?
    --------------
    Ethan

    Comment

    • swNGS
      Member
      • Nov 2011
      • 83

      #3
      I'm having another stab at this question, and I've tried to visually represent it in an attempt to explain better what I am trying to do:

      I have a library that is comprised of thousands of overlapping amplicons.
      I have the genomic coordinates of all the amplicons that go to make the library.

      I want to be able to count the number of occurrences of paired reads that are derived from the original amplicons. The rationale for this is to look for amplicon drop out in a single sample relative to other multiplexed samples as indicative of a possible variant under a site involved in generating the original amplicon (probe binding site or restriction endonuclease recognition site)

      The following image attempts to explain it:
      Green track is the expected amplicons.
      Blue is the region of interest



      Amplicon1 = chr19:11203020-11203404
      Amplicon2 = chr19:11203059-11203231
      Amplicon3 = chr19:11202984-11203057

      I want to be able to count the aligned read pairs that represent each amplicon.
      If no amplicons overlapped, I could probably do this relatively easily , but I'm slightly stumped about what to do with situations like amplicons 1& 2... So in that circumstance, I dont want reads from amplicon2 counting towards amplicon1.

      The end result should be something like:

      Genomic_coord AmpliconID Ocurrances
      chr19:11203020-11203404 Amplicon1 153
      chr19:11203059-11203231 Amplicon2 15
      chr19:11202984-11203057 Amplicon3 48

      While I can find plenty of tools to produce summary info on insert size ranges, I can find anything to specifically do this.

      I think intersectBed is probably a place to start as suggested above, but cant get my head around the caveats that I listed. I was hoping there would be other helpful suggestions out there.

      Thanks,

      Chris

      Comment

      • frozenlyse
        Senior Member
        • Sep 2008
        • 135

        #4
        In R, the GenomicRanges package has the function countOverlaps which allows you to specify type="within" - that way the query must be completely inside the subject.

        Comment

        • swNGS
          Member
          • Nov 2011
          • 83

          #5
          How would that work if I have two amplicons where one lies entirely within another?
          I'm not sure if the image has uploaded correctly so here is a link

          Comment

          • frozenlyse
            Senior Member
            • Sep 2008
            • 135

            #6
            Ah sorry I didn't notice one was inside the other.

            One (hacky) thing you could do is to subtract the counts of amplicon 2 from amplicon 3. Another is that if the reads from amplicons will always start and end at defined positions, you could use type="equal", so that way only reads that start and end exactly at your amplicon start/ends will be counted.

            Comment

            • Chipper
              Senior Member
              • Mar 2008
              • 323

              #7
              I would use a short perl thing with the fragment start and length as a hash key:

              samtools view file.bam | perl -e ' while (<>) {split; $id="$_[2]:$_[3]:$_[8]";$hash{$id}++;}foreach (sort keys %hash){print $_ ."\t".$hash{$_}."\n";}' > fragments.count

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              41 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              36 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              192 views
              0 reactions
              Last Post seqadmin  
              Working...