Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SAM format readgroup, what is it exactly?

    EDIT:I found an old thread saying that each read group refers to a lane for the Illumina platform. My question now becomes, if I had two lanes of the same library and sample, could I assign them different read group ID while be able to merge them as a single dataset for downstream analysis? Thanks and sorry for not looking up thoroughly in the first place.


    Could anyone comment on what exactly does read group means physically?
    Under SAM file format, the RG header consists of different subfields. I am having a hard time imagining what exactly is readgroup(ID), library(LB) and sample(SM) referring to in real life. My guess is that the SM refers to the sample-name I assign to the DNA material that's being prepped up as library for example HUMAN_SAMPLE_A, and then LB is just another name I make up when I finish making a library for example batch1_of_HUMAN_SAMPLE_A, batch2_of_HUMAN_SAMPLE_A, and readgroup is another name but I have no idea how it links to the real world and/or how it affects downstream analysis.
    thanks,

    CSoong
    Last edited by csoong; 12-23-2010, 01:58 PM. Reason: found an old thread about read group

  • #2
    All the tags from the @RG record are optional except ID.

    The record is useful when you have a BAM containing data (alignments) from multiple sources. The level of granularity tries to capture all the different possibilities, meaning, you may have reads from different libraries, different runs, different instruments, different platforms, etc... The @RG record allows you to have one single BAM but still be able to determine (with all the detail you want) from where that read was coming from.

    The ID tag in the @RG record links together reads that are under the same group (You define what a group means for you with all the other tags in the RG record).
    -drd

    Comment


    • #3
      Thanks again Drio.

      I pasted the example from sam1.pdf below: (SAM format spec pdf file page 4)
      It has 2 RG headers, how could one tell which RG ID the trailing 2 reads belong to? I don't see a correspondence between the read records and RG IDs.
      ~~~~
      I see it, the info is in the last column
      ~~~~
      @HD VN:1.0
      @SQ SN:chr20 LN:62435964
      @RG ID:L1 PU:SC_1_10 LB:SC_1 SM:NA12891
      @RG ID:L2 PU:SC_2_12 LB:SC_2 SM:NA12891
      read_28833_29006_6945 99 chr20 28833 20 10M1D25M = 28993 195 \
      AGCTTAGCTAGCTACCTATATCTTGGTCTTGGCCG <<<<<<<<<<<<<<<<<<<<<
      <9/,&,22;;<<< NM:i:1 RG:Z:L1

      read_28701_28881_323b 147 chr20 28834 30 35M = 28701 -168 \
      ACCTATATCTTGGCCTTGGCCGATGCGGCCTTGCA <<<<<;<<<<7;:<<<6;<<<<<<<<<<<<7<<<< MF:i:18 RG:Z:L2
      Last edited by csoong; 12-23-2010, 03:36 PM. Reason: I sEE it

      Comment


      • #4
        Check the RG field at the end of each read entry. Read1 points toread group L1 and read 2 to L2.
        -drd

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          Today, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 07:17 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        19 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-29-2024, 10:49 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Working...
        X