Unconfigured Ad

**DZhang** · 05-22-2011, 05:56 AM

Hi,

Take a look at this post, which may be helpful to your situation:

Why MAQ consensus seq better than SAMtools consensus ?? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=8910

Variant discovery in previously sequenced genomes/regions

Douglas

https://www.contigexpress.com

**bioyulj** · 05-22-2011, 06:17 AM

good! thanks very much!

**vyellapa** · 10-12-2011, 08:39 AM

I have complete genomics data which I converted to BAM using CGA tools. However the fastq conversion from .bam is giving an error "too many .bam files". This I believe is due to large size of the bam(~660 GB). The bam is large mainly due to numerous N's in the file(complete genomics data). Can these N's be cropped to make the file smaller for execution by samtools?

**Gabeloooooo** · 11-13-2012, 12:49 PM

My understanding was to use :
samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

With the reference from 1000 genomes being hs37d5.fa and the BAM file for the specific patient downloaded.

Does that command get you a consensus sequence though (once you convert fq to fa)?

And is there a way to just get a consensus chromosome (example chr4) instead of processing the whole thing?

**Gabeloooooo** · 11-14-2012, 06:49 AM

FQ to FA?

Okay, this command indeed seems to produce a decent FQ file:

samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

I keep reading conflicting posts (fastx, awk, fastq2fasta.pl,etc.) on how to convert that FQ output to FASTA (fa file)

My question
::::
Given my source is from 1000 genomes, what would be the most reliable way of changing that FQ file to FA?
::::

Keep in mind, my goal is to get a consensus chromosome.

**GenoMax** · 11-14-2012, 07:16 AM

Originally posted by Gabeloooooo View Post

My question
::::
Given my source is from 1000 genomes, what would be the most reliable way of changing that FQ file to FA?
::::

Keep in mind, my goal is to get a consensus chromosome.

flexlex posted this tool from Heng Li in a recent thread that can do the conversion you are looking for.

GitHub - lh3/seqtk: Toolkit for processing sequences in FASTA/Q formats

https://github.com/lh3/seqtk

Toolkit for processing sequences in FASTA/Q formats - lh3/seqtk

Look at the examples included on that page.

**Gabeloooooo** · 11-14-2012, 07:17 AM

Thank you! Will try it out and report back on success/failure

**Gabeloooooo** · 11-14-2012, 12:20 PM

Conversion seems to work fine. Well, it gives no errors and data 'looks' good

Is there a way to get the consensus sequence aligned with the reference?

Say the reference FASTA file has 90,354,753 bp and the ouput from
'samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq'
gives a file with 243,189,260 bp

My question
::::
Is there a process where I could end up with a consensus sequence that is also 90,354,753 bp (same as reference)?
::::

For this particular case, I don't care if getting this result discards a lot of info about the patient.

**binlangman** · 05-05-2014, 09:57 PM

How to generate consensus sequence using BAM file?

Hi! I have a BAM file, which is generated by using BWA, and I'd like to generate a consensus sequence or a set of contigs. How can I generate a consensus sequence from BAM file? Which tools can do it?
Thanks!

**swbarnes2** · 05-06-2014, 11:09 AM

I have found that getting a fastq out of vcfutils is pretty much never what I want. So I altered the program so it will output fasta instead. Happily, it's a perl program, which means that it is a plain ordinary text file, so it's easy to modify.

You are looking for these lines

Code:

  print "\@$chr\n"; &v2q_print_str($seq);
  print "+\n"; &v2q_print_str($qual)

Change them to this

Code:

  print "\[COLOR="Red"]>[/COLOR]$chr\n"; &v2q_print_str($seq);
  [COLOR="Red"]#[/COLOR]print "+\n"; &v2q_print_str($qual)

You are changing the @ of fastq format to the > of fasta format, and the # means "Skip this line". That's the line where the script would print the quality scores.

**binlangman** · 05-06-2014, 04:39 PM

Dear swbarnes2,
Can the program you said generate consensus sequence file from alignment file(BAM file)? And how can I get the perl script? Can you send it to me via e-mail? My e-mail adress is [email protected].I'm looking forward your reply as soon as possible. Thanks very much!

**GenoMax** · 05-07-2014, 07:31 AM

Originally posted by binlangman View Post

Dear swbarnes2,
Can the program you said generate consensus sequence file from alignment file(BAM file)? And how can I get the perl script? Can you send it to me via e-mail? My e-mail adress is [email protected].I'm looking forward your reply as soon as possible. Thanks very much!

@swbarnes2 was referring to editing the "samtools-0.1.19/bin/vcfutils.pl" script (that you will find in the samtools package). The lines quoted are almost at the end of that script file.

**binlangman** · 05-18-2014, 10:33 PM

generate consensus from a BAM file

I run the following command in oder to generate consensus from a BAM file：
samtools mpileup -uf NC_010473.fasta sample.sorted.bam | bcftools view -cg - | vcfutils.pl vcf2fq > consensus.fq
Why is the output not a fastq file? And the output looks strange. The format of output is as follows:
@NC_010473
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTC
TGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGG
TCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTAC
ACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGT
AACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGG
CTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGT
ACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
...............................................................
Why does the result look so strange?

**swbarnes2** · 05-19-2014, 09:07 AM

How is that not a fastq file? It's just multiline. The first line is the name, beginning with "@". Then follows the DNA sequence. Then you'll see a +, maybe the name will repeat, and then you'll get the quality string.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, Today, 10:09 AM	0 responses 8 views 0 reactions	Last Post by SEQadmin2 Today, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Yesterday, 08:59 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 Yesterday, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Consensus FASTA from BAM files

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News