Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Genes of interest from SAM/BAM files

    Hi,
    I am trying to compare paired-end Illumina data with standard MLST approaches for genotyping bacteria. What is the best way to extract known marker genes (using coordinates from a reference) from SAM/BAM files created by mapping samples to this reference?

    Ultimately, I just want to compare the genotyping capacity of this shotgun data with traditional, MLST methods and ensure that we are getting good coverage of the MLST markers of choice. I would like to generate consensus sequences from short read data of the makers of interest and am unsure the best way to do this.

    Thank you!

  • #2
    I am not sure what you want to do, but look at bedtools. it may help you.

    Comment


    • #3
      1. http://seqanswers.com/forums/showthread.php?t=39766

      Assuming that you are referring to extracting reads that map to certain marker genes: samtools view should allow you to pull out reads from specified gene regions.

      2. http://seqanswers.com/forums/showthread.php?t=38969

      Samtools mpileup should generate the consensus sequence.

      Comment


      • #4
        Thank you for your help!

        Comment


        • #5
          I am trying to generate a consensus sequence using the following command:

          samtools mpileup -uf B31.fna Sample_Bbcap10_006_004_MLST.bam | bcftools view -cg - | /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl vcf2fq > cns.fq

          and struggling with the error:

          Use of uninitialized value in length at /home/bioinfo/software/samtools/samtools-0.1.19/bcftools/vcfutils.pl line 544, <> line 57.


          Does anyone have any insight on this? I'm just beginning with SAMtools and really appreciate the support.

          Comment


          • #6
            Have you indexed your reference genome file (B31.fna) with samtools faidx? That is a fasta format file correct?

            Code:
            $ samtools faidx B31.fna
            Last edited by GenoMax; 02-21-2014, 11:44 AM.

            Comment


            • #7
              Yes, it is a fasta and I did index with samtools faidx.

              Comment


              • #8
                looks to me the error occurs at last step. does "vcfutils.pl" takes stdin?

                Comment


                • #9
                  @ksw9: This may be an obvious question but was the reference file used for generation of the indexes and the alignments to generate the BAM's? Based on the name it looks like that may be a 454 sequence file. What aligner did you use?

                  Comment


                  • #10
                    I used BWA for mapping paired-end Illumina reads to the reference genome.

                    Comment


                    • #11
                      Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                      I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                      samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                      then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                      Thank you for all of your help!

                      Comment


                      • #12
                        Originally posted by ksw9 View Post
                        Okay, I think my problem was in my original .bed file which now has contigs appropriately labeled.

                        I guess my question is this: if I have extracted reads mapping only to loci of interest using:
                        samtools view -h -L MLST.bed Sample_006_004.bam > Sample_006_004_MLST.sam

                        then should I create a new reference sequence including only the loci of interest in order to generate the consensus sequence only for the loci of interest (and not generate the consensus for the entire genome)?

                        Thank you for all of your help!
                        Good to know, thanks!

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Exploring the Dynamics of the Tumor Microenvironment
                          by seqadmin




                          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                          07-08-2024, 03:19 PM
                        • seqadmin
                          Exploring Human Diversity Through Large-Scale Omics
                          by seqadmin


                          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                          06-25-2024, 06:43 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 07:20 AM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-16-2024, 05:49 AM
                        0 responses
                        36 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-15-2024, 06:53 AM
                        0 responses
                        40 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 07-10-2024, 07:30 AM
                        0 responses
                        41 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X