Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mixter
    Member
    • May 2010
    • 22

    Consensus FASTA from BAM files

    Hi,

    I'm currently trying to get a consensus sequence that is specific to individual humans, from the 1000 genomes project data available in BAM format at ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/

    I want to do multiple sequence alignments of certain loci between individuals, so I need the individual sequence in FASTA format. Currently, I've tried to do samtools pileup -c, followed by conversion to fastq (samtools) and then to fasta (maq tools):

    samtools pileup -c -f /gcm/opt/genomes/hg18-chromosomes.fa ${file}.bam > ${file}.pileup
    /opt/samtools/samtools-0.1.7a/misc/samtools.pl pileup2fq ${file}.pileup > ${file}.fastq
    /opt/maq/maq-0.7.1/bin/fq_all2std.pl fq2fa ${file}.fastq > ${file}.fasta

    The FASTQ generated from consensus pileup seems ok, although the whole sequence is on a single line. Using MAQ's fq2fa, however, this is converted into a much smaller FASTA file, with quality score data instead of sequence in there.

    So, any ideas what I may be doing wrong or alternative tool suggestions to accomplish this task, in case the converters are buggy, would be much appreciated.

    Thanks!
  • DZhang
    Senior Member
    • Jun 2010
    • 177

    #2
    Hi,

    Take a look at this post, which may be helpful to your situation:



    Douglas

    Comment

    • bioyulj
      Junior Member
      • May 2010
      • 1

      #3
      good! thanks very much!

      Comment

      • vyellapa
        Member
        • Oct 2011
        • 59

        #4
        I have complete genomics data which I converted to BAM using CGA tools. However the fastq conversion from .bam is giving an error "too many .bam files". This I believe is due to large size of the bam(~660 GB). The bam is large mainly due to numerous N's in the file(complete genomics data). Can these N's be cropped to make the file smaller for execution by samtools?

        Comment

        • Gabeloooooo
          Junior Member
          • Nov 2012
          • 8

          #5
          My understanding was to use :
          samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

          With the reference from 1000 genomes being hs37d5.fa and the BAM file for the specific patient downloaded.

          Does that command get you a consensus sequence though (once you convert fq to fa)?

          And is there a way to just get a consensus chromosome (example chr4) instead of processing the whole thing?

          Comment

          • Gabeloooooo
            Junior Member
            • Nov 2012
            • 8

            #6
            FQ to FA?

            Okay, this command indeed seems to produce a decent FQ file:

            samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

            I keep reading conflicting posts (fastx, awk, fastq2fasta.pl,etc.) on how to convert that FQ output to FASTA (fa file)

            My question
            ::::
            Given my source is from 1000 genomes, what would be the most reliable way of changing that FQ file to FA?
            ::::

            Keep in mind, my goal is to get a consensus chromosome.

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Originally posted by Gabeloooooo View Post

              My question
              ::::
              Given my source is from 1000 genomes, what would be the most reliable way of changing that FQ file to FA?
              ::::

              Keep in mind, my goal is to get a consensus chromosome.
              flexlex posted this tool from Heng Li in a recent thread that can do the conversion you are looking for.



              Look at the examples included on that page.

              Comment

              • Gabeloooooo
                Junior Member
                • Nov 2012
                • 8

                #8
                Thank you! Will try it out and report back on success/failure

                Comment

                • Gabeloooooo
                  Junior Member
                  • Nov 2012
                  • 8

                  #9
                  Conversion seems to work fine. Well, it gives no errors and data 'looks' good

                  Is there a way to get the consensus sequence aligned with the reference?

                  Say the reference FASTA file has 90,354,753 bp and the ouput from
                  'samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq'
                  gives a file with 243,189,260 bp

                  My question
                  ::::
                  Is there a process where I could end up with a consensus sequence that is also 90,354,753 bp (same as reference)?
                  ::::


                  For this particular case, I don't care if getting this result discards a lot of info about the patient.

                  Comment

                  • binlangman
                    Member
                    • Dec 2013
                    • 11

                    #10
                    How to generate consensus sequence using BAM file?

                    Hi! I have a BAM file, which is generated by using BWA, and I'd like to generate a consensus sequence or a set of contigs. How can I generate a consensus sequence from BAM file? Which tools can do it?
                    Thanks!

                    Comment

                    • swbarnes2
                      Senior Member
                      • May 2008
                      • 910

                      #11
                      I have found that getting a fastq out of vcfutils is pretty much never what I want. So I altered the program so it will output fasta instead. Happily, it's a perl program, which means that it is a plain ordinary text file, so it's easy to modify.

                      You are looking for these lines

                      Code:
                        print "\@$chr\n"; &v2q_print_str($seq);
                        print "+\n"; &v2q_print_str($qual)
                      Change them to this

                      Code:
                        print "\[COLOR="Red"]>[/COLOR]$chr\n"; &v2q_print_str($seq);
                        [COLOR="Red"]#[/COLOR]print "+\n"; &v2q_print_str($qual)
                      You are changing the @ of fastq format to the > of fasta format, and the # means "Skip this line". That's the line where the script would print the quality scores.

                      Comment

                      • binlangman
                        Member
                        • Dec 2013
                        • 11

                        #12
                        Dear swbarnes2,
                        Can the program you said generate consensus sequence file from alignment file(BAM file)? And how can I get the perl script? Can you send it to me via e-mail? My e-mail adress is [email protected].I'm looking forward your reply as soon as possible. Thanks very much!

                        Comment

                        • GenoMax
                          Senior Member
                          • Feb 2008
                          • 7142

                          #13
                          Originally posted by binlangman View Post
                          Dear swbarnes2,
                          Can the program you said generate consensus sequence file from alignment file(BAM file)? And how can I get the perl script? Can you send it to me via e-mail? My e-mail adress is [email protected].I'm looking forward your reply as soon as possible. Thanks very much!
                          @swbarnes2 was referring to editing the "samtools-0.1.19/bin/vcfutils.pl" script (that you will find in the samtools package). The lines quoted are almost at the end of that script file.

                          Comment

                          • binlangman
                            Member
                            • Dec 2013
                            • 11

                            #14
                            generate consensus from a BAM file

                            I run the following command in oder to generate consensus from a BAM file:
                            samtools mpileup -uf NC_010473.fasta sample.sorted.bam | bcftools view -cg - | vcfutils.pl vcf2fq > consensus.fq
                            Why is the output not a fastq file? And the output looks strange. The format of output is as follows:
                            @NC_010473
                            AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
                            TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
                            TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
                            ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
                            AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG
                            CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
                            ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
                            ...............................................................
                            Why does the result look so strange?

                            Comment

                            • swbarnes2
                              Senior Member
                              • May 2008
                              • 910

                              #15
                              How is that not a fastq file? It's just multiline. The first line is the name, beginning with "@". Then follows the DNA sequence. Then you'll see a +, maybe the name will repeat, and then you'll get the quality string.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 10:09 AM
                              0 responses
                              8 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...