Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to tag reads for alignment

    Hi all,

    I have raw fastq files of paired-end short reads for two samples. I would like to tag the reads of the two samples with some identifier and pool them into a single fastqfile and perform the alignment and variant calling. Below is the detailed explanation of what i want to achieve.

    In general, we align the reads of the two samples independently and add read groups to the bam files using samtools or picard tools and perform variant calling using GATK or samtools. Here the variant calling algorithm will treat them as two different samples based on the readgroup information.

    But, i would like to tag the reads of both samples with two different read groups before doing alignment which would produce a single alignment bam file carrying the read group information of both the samples. And this bam file would be used for variant calling, where the algorithm treats it as two samples from read group information.

    Could anyone help!!!

  • #2
    Hi Meher,
    That may need a small script. This necessitates that you copy paste the first sequence from both files to give one an idea of how to help you.

    Comment


    • #3
      Originally posted by Apexy View Post
      Hi Meher,
      That may need a small script. This necessitates that you copy paste the first sequence from both files to give one an idea of how to help you.
      Hi,

      I can provide the few line but before that, i have a question,

      Does it make any difference to the final alignment result if we tag them and perform the alignment d generate a single bam file when compared to aligning independently and merging the 2bam files of two samples into a single bam file?.

      Would there be any bias in the alignment if we choose one method over the other?

      Comment


      • #4
        Better way is to perform the alignments separately, assigning unique read group IDs (some aligners, e.g. bowtie, will add read group IDs during alignment) and then merging the BAM files before proceeding to variant detection. Pay attention to the header information which is attached to the merged output as you need to make sure that every read group ID present in the file is referenced in the header. samtools merge does not handle this automatically, you have to supply a properly formatted header. I'm not sure if Picard MergeSamFiles properly merges the header or not.

        But I do wonder why you want to do this. GATK does not require merged BAM files; from the GATK Best Practices document:

        Because the GATK can dynamically merge BAM files, it isn't critical to have merged files by lane into sample bams, or even samples bams into cohort bams.

        Comment


        • #5
          Hello Meher,
          I do not think it matters if the insert size in both sample is expected to be the same. At least with bowtie (specified by -1 and -2) all you need is to tell it which file is which. However, you must pay particular attention during merging in relation to header info. There is an extensive manual here
          Last edited by Apexy; 11-16-2012, 06:07 AM.

          Comment


          • #6
            Originally posted by kmcarr View Post
            Better way is to perform the alignments separately, assigning unique read group IDs (some aligners, e.g. bowtie, will add read group IDs during alignment) and then merging the BAM files before proceeding to variant detection. Pay attention to the header information which is attached to the merged output as you need to make sure that every read group ID present in the file is referenced in the header. samtools merge does not handle this automatically, you have to supply a properly formatted header. I'm not sure if Picard MergeSamFiles properly merges the header or not.

            But I do wonder why you want to do this. GATK does not require merged BAM files; from the GATK Best Practices document:
            Yes, it is not required to merge bams. The actual task which i want to accomplish is to detect the variants from the two samples in a single VCF file and infer the depth of the variant from both the samples(i.e if a variant has depth 100, i would like to find how many of the reads came from each of these samples). Performing multisample variant calling on the two bam files using GATK will accomplish this.

            But, I would really like to know if there could be any biases in doing as described as above. when compared to doing a single alignment by tagging the reads before alignment and then performing variant calling.

            Which of these would get rid of any biases, if they are supposed to be present.

            Comment


            • #7
              Originally posted by Apexy View Post
              Hello Meher,
              I do not think it matters if the insert size in both sample is expected to be the same. At least with bowtie (specified by -1 and -2) all you need is to tell it which file is which. However, you must pay particular attention during merging in relation to header info. There is an extensive manual here
              Hi any way these are the first few lines,
              sample1_1.fastq

              @HWI-ST188:1:1101:1225:2112#0/1
              AGANAGTAAGTAAAATCTATTATGATATTCTTATAAAGAAAAGCCCACTTTTGAAGATTTCAGAAGTGCTTCTAAAGGAGGTAGCGCGGCATAATACTGGG
              +
              Z^_BS\ccgg`eghhhhhhhhhhhhhhhhhhhhhhhhgggdcfhhhhhhhhhdhghhfhbghhff]]egfdghf]cdgfbdTZacebbababb_bb]`cb`
              @HWI-ST188:1:1101:1221:2160#0/1
              TTCNAATAAAATAAATAAAAGATGAGATGAATATTCATTTTGACTTCATTTTCTACTTTTTTTTCAGAATACTTAAAGTTTGAGAGAAATGTGAGACAACT
              +
              __bBS`ccggcggiihhfghicghhiieghihehihfibghifhehhffhiiiiffghiiiiihdggg_b`bddbbcbabdd`_`bc``Y_bbZ_T_^BBB


              sample1_2.fastq

              @HWI-ST188:1:1101:1225:2112#0/2
              ATGAATCAGATTGAAAATGCAAACTGTGACATGAGGCAGAGGCATTTATTTTATTTNGTGGGGAATCGGGAAAGGAAATTGCTAGGTTTCTGCAGCCCCAG
              +
              bbbeeeeegffgcgifhhihihiiif`agh`ghifhhhhiiihhcffXagXcce_cBL[Z_eaghfeedcS\^`dcbZZZ`b`bY^T]_bb]RGYba^[^_
              @HWI-ST188:1:1101:1221:2160#0/2
              TTAAATCTTAAAAGTGTATGTAAAAATGTTCAAAATATTAGTTTTCTTTAAATTTTNGTAGAAAAGGCATTATCTTCACATTAAGTGACATGAGATAACGC
              +
              bbbeeeeegggfghQbK`hhbigiiieh[ddgdgfhbgfffS^fddgiiidXaeSXBOO^eg`efbghfYHWbee_cffgccV`g]b_gHZZZZ^Y_bBBB

              Comment


              • #8
                Hello,

                Something must have made me forget to replythis. if you want to merge two fastq files, use the attached script. However, I cannot relate the info ( two samples) with the sequences you provided. If you have 2 samples (paired), then you should in fact have 4 files. I think you have just provided a forward seq from sample1_1.fastq and a reverse seq from sample1_2.fastq
                Attached Files

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin







                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has...
                  12-02-2024, 01:49 PM
                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  11-06-2024, 07:24 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-02-2024, 09:29 AM
                0 responses
                139 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-02-2024, 09:06 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-02-2024, 08:03 AM
                0 responses
                38 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-22-2024, 07:36 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X