Header Leaderboard Ad

Collapse

A reference-genome guided compression stratigy for storing mapped reads

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A reference-genome guided compression stratigy for storing mapped reads

    Recent days, I encounter space problem for storing mapped reads. Then, I came up with some new ideas.

    As developing of sequencing more and more reads would be generated. A very significant property is that all reads came from a genome with very few errors. A wise method to store mapped reads could be

    1. Use reference genome as guide.
    2. Store the mapped position of every read and the error information
    3. Use difference between quality scores to store quality score under the assumption that adjacent quality score would not change too much. This part could be lossy.
    4. Discard or keep other useless fields
    5. Sort the reads by genome position, then chromosome ID could be saved and the mapped coordinates could be saved by storing differences only.

    If we just keep the mapped location and error information, each read may costs only several bytes because

    1. chromosome IDs for each read costs 1bit, encoding whether current read has changed chromosome/strand if we fix an order beforehand
    2. relative position for each read cost 1byte, allowing 128 difference (e.g. for DNA sequencing, suppose the whole genome are covered)
    3. Error information: k mismatches need k * 4 bits if do not allow indels.

    The above 3 information is actually most frequently used. Such a reference-genome guided compression strategy should be much effective than SAM/BAM format. If there is no effective reference genome, a genome based on reads could be constructed using Bruijn graph like idea.

    How do you guys think?
    Last edited by feeldead; 09-17-2011, 05:08 AM. Reason: For correcting a typo in the subject

  • #2
    You may be interested in this paper, which describes a similar strategy:

    http://genome.cshlp.org/content/21/5/734.short

    Comment


    • #3
      Thanks. That's what I mean.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
        by seqadmin



        Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
        03-21-2023, 01:49 PM
      • seqadmin
        Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
        by seqadmin




        Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
        03-10-2023, 05:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 01:40 PM
      0 responses
      7 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-29-2023, 11:44 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-24-2023, 02:45 PM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2023, 12:26 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Working...
      X