Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genes of interest from SAM/BAM files

    Hi,
    I am trying to compare paired-end Illumina data with standard MLST approaches for genotyping bacteria. What is the best way to extract known marker genes (using coordinates from a reference) from SAM/BAM files created by mapping samples to this reference?

    Ultimately, I just want to compare the genotyping capacity of this shotgun data with traditional, MLST methods and ensure that we are getting good coverage of the MLST markers of choice. I would like to generate consensus sequences from short read data of the makers of interest and am unsure the best way to do this.

    Thank you!

  • #2
    I am not sure what you want to do, but look at bedtools. it may help you.

    Comment


    • #3
      1. http://seqanswers.com/forums/showthread.php?t=39766

      Assuming that you are referring to extracting reads that map to certain marker genes: samtools view should allow you to pull out reads from specified gene regions.

      2. http://seqanswers.com/forums/showthread.php?t=38969

      Samtools mpileup should generate the consensus sequence.

      Comment


      • #4
        Thank you for your help!

        Comment


        • #5
          I am trying to generate a consensus sequence using the following command:

          samtools mpileup -uf B31.fna Sample_Bbcap10_006_004_MLST.bam | bcftools view -cg - | /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl vcf2fq > cns.fq

          and struggling with the error:

          Use of uninitialized value in length at /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl line 544, <> line 57.


          Does anyone have any insight on this? I'm just beginning with SAMtools and really appreciate the support.

          Comment


          • #6
            Have you indexed your reference genome file (B31.fna) with samtools faidx? That is a fasta format file correct?

            Code:
            $ samtools faidx B31.fna
            Last edited by GenoMax; 02-21-2014, 11:44 AM.

            Comment


            • #7
              Yes, it is a fasta and I did index with samtools faidx.

              Comment


              • #8
                looks to me the error occurs at last step. does "vcfutils.pl" takes stdin?

                Comment


                • #9
                  @ksw9: This may be an obvious question but was the reference file used for generation of the indexes and the alignments to generate the BAM's? Based on the name it looks like that may be a 454 sequence file. What aligner did you use?

                  Comment


                  • #10
                    I used BWA for mapping paired-end Illumina reads to the reference genome.

                    Comment


                    • #11
                      Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                      I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                      samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                      then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                      Thank you for all of your help!

                      Comment


                      • #12
                        Originally posted by ksw9 View Post
                        Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                        I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                        samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                        then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                        Thank you for all of your help!
                        Good to know, thanks!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Best Practices for Single-Cell Sequencing Analysis
                          by seqadmin



                          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                          06-06-2024, 07:15 AM
                        • seqadmin
                          Latest Developments in Precision Medicine
                          by seqadmin



                          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                          Somatic Genomics
                          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                          05-24-2024, 01:16 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 07:23 AM
                        0 responses
                        8 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-17-2024, 06:54 AM
                        0 responses
                        11 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-14-2024, 07:24 AM
                        0 responses
                        24 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 06-13-2024, 08:58 AM
                        0 responses
                        17 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X