Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genes of interest from SAM/BAM files

    Hi,
    I am trying to compare paired-end Illumina data with standard MLST approaches for genotyping bacteria. What is the best way to extract known marker genes (using coordinates from a reference) from SAM/BAM files created by mapping samples to this reference?

    Ultimately, I just want to compare the genotyping capacity of this shotgun data with traditional, MLST methods and ensure that we are getting good coverage of the MLST markers of choice. I would like to generate consensus sequences from short read data of the makers of interest and am unsure the best way to do this.

    Thank you!

  • #2
    I am not sure what you want to do, but look at bedtools. it may help you.

    Comment


    • #3
      1. http://seqanswers.com/forums/showthread.php?t=39766

      Assuming that you are referring to extracting reads that map to certain marker genes: samtools view should allow you to pull out reads from specified gene regions.

      2. http://seqanswers.com/forums/showthread.php?t=38969

      Samtools mpileup should generate the consensus sequence.

      Comment


      • #4
        Thank you for your help!

        Comment


        • #5
          I am trying to generate a consensus sequence using the following command:

          samtools mpileup -uf B31.fna Sample_Bbcap10_006_004_MLST.bam | bcftools view -cg - | /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl vcf2fq > cns.fq

          and struggling with the error:

          Use of uninitialized value in length at /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl line 544, <> line 57.


          Does anyone have any insight on this? I'm just beginning with SAMtools and really appreciate the support.

          Comment


          • #6
            Have you indexed your reference genome file (B31.fna) with samtools faidx? That is a fasta format file correct?

            Code:
            $ samtools faidx B31.fna
            Last edited by GenoMax; 02-21-2014, 11:44 AM.

            Comment


            • #7
              Yes, it is a fasta and I did index with samtools faidx.

              Comment


              • #8
                looks to me the error occurs at last step. does "vcfutils.pl" takes stdin?

                Comment


                • #9
                  @ksw9: This may be an obvious question but was the reference file used for generation of the indexes and the alignments to generate the BAM's? Based on the name it looks like that may be a 454 sequence file. What aligner did you use?

                  Comment


                  • #10
                    I used BWA for mapping paired-end Illumina reads to the reference genome.

                    Comment


                    • #11
                      Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                      I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                      samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                      then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                      Thank you for all of your help!

                      Comment


                      • #12
                        Originally posted by ksw9 View Post
                        Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                        I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                        samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                        then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                        Thank you for all of your help!
                        Good to know, thanks!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Recent Advances in Sequencing Technologies
                          by seqadmin







                          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                          Long-Read Sequencing
                          Long-read sequencing has...
                          Yesterday, 01:49 PM
                        • seqadmin
                          Genetic Variation in Immunogenetics and Antibody Diversity
                          by seqadmin



                          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                          11-06-2024, 07:24 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 09:29 AM
                        0 responses
                        77 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 09:06 AM
                        0 responses
                        40 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 08:03 AM
                        0 responses
                        27 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 11-22-2024, 07:36 AM
                        0 responses
                        65 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X