Does anyone have a suggested best practice utility for this?
Header Leaderboard Ad
Collapse
Stand Alone Bam to FASTQ
Collapse
Announcement
Collapse
No announcement yet.
X
-
What are you trying to do?
Do you want to pull out the reads as FASTQ records?
Do you care about the strand used for reads which mapped to the reverse stand?
Do you care about how paired end reads are named?
You could try seqret from EMBOSS 6.3.0,
-
That is no problem. It's also some point that confused me, btw is still confusing me.
The experince I made, ist that the aligned quality values (qv) in the sam files from e.g. bowtie are different from the ones in the original file. I think the values you get after the alignment are the qv from the alignment and not the one from the original file.
Comment
-
Originally posted by dcfargo View PostI do care about recovery of all of the information.
I'd like to essentially recover all the initial text information that went into making the BAM file.
e.g. Support you had some paired FASTQ reads like this:
Code:@SRR001666.1/1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC + IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC @SRR001666.1/2 071112_SLXA-EAS1_s_7:5:1:817:345 length=36 AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA + IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/ ...
If on converting SAM/BAM back to FASTQ you specify suffixes of /1 and /2, the best you can hope to recover is:
Code:@SRR001666.1/1 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC + IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC @SRR001666.1/2 AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA + IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/ ...
Comment
-
Well you don't have to think that complicated. There are two libraries you can use, and than you have your converter. I e.g. prefer Java and use biojava to read/write FastQ (http://www.biojava.org/wiki/BioJavaownload_1.7.1) and use samtools (http://sourceforge.net/projects/picard/files/) to read BAM/SAM files (it's the same).
Then you only have to transform from a SAM Object to a FastQBuilder:
public FastqBuilder convert(SAMRecord element2) {
FastqBuilder builder = new FastqBuilder();
builder.withDescription(element2.getReadName());
builder.withQuality(element2.getBaseQualityString());
builder.withSequence(element2.getReadString());
return builder;
}
that's the easiest way.
good luck
Comment
-
Originally posted by dcfargo View PostThanks so much.
Given some information may be lost and we'll just have to accept that would the best model for conversion be 2 steps such as:
1) SAMtools for BAM -> SAM
2) followed by a home made script for SAM -> FASTQ
As mentioned above, EMBOSS 6.3.x can do SAM/BAM direct to FASTQ, although it may not do exactly what you want it to do.
You could also write a script to go from BAM to FASTQ, for example using pysam to access the samtools C API from Python.
Personally I've been doing with SAM/BAM to FASTQ in Biopython (to recover reads to redo a mapping), but this is with an experimental branch and is not ready for general use.
Comment
-
Originally posted by Martin R View PostWell you don't have to think that complicated. There are two libraries you can use, and than you have your converter. I e.g. prefer Java and use biojava to read/write FastQ (http://www.biojava.org/wiki/BioJavaownload_1.7.1) and use samtools (http://sourceforge.net/projects/picard/files/) to read BAM/SAM files (it's the same).
Then you only have to transform from a SAM Object to a FastQBuilder:
public FastqBuilder convert(SAMRecord element2) {
FastqBuilder builder = new FastqBuilder();
builder.withDescription(element2.getReadName());
builder.withQuality(element2.getBaseQualityString());
builder.withSequence(element2.getReadString());
return builder;
}
that's the easiest way.
good luck
Also I would reverse complement any reads mapped to the reverse strand to recover them in their original orientation pre-mapping.
Comment
-
Originally posted by maubp View PostAs mentioned above, EMBOSS 6.3.x can do SAM/BAM direct to FASTQ, although it may not do exactly what you want it to do.
This should be resolved in the next patch or point release though
Peter
Comment
-
For the sake of completeness, I will just mention that you can also achieve this with my Genozip program:
genozip file.bam <---- compresses the BAM file
genocat file.bam.genozip --output file.fq.gz <---- converts it to FASTQ
See documentation here: https://genozip.com/sam2fq.html
Paper here: https://www.researchgate.net/publica...ata_Compressor
Comment
-
Latest Articles
Collapse
-
by seqadmin
Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...-
Channel: Articles
09-07-2023, 11:15 PM -
-
by seqadmin
Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.
Whole Transcriptome RNA-seq
Whole transcriptome sequencing...-
Channel: Articles
08-31-2023, 11:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Multiplexed Biomarker Detection with Nanopore Technology: A Leap in Precision Diagnostics
by seqadmin
Started by seqadmin, Yesterday, 07:42 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:42 AM
|
||
Started by seqadmin, 09-22-2023, 09:05 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
09-22-2023, 09:05 AM
|
||
Started by seqadmin, 09-21-2023, 06:18 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-21-2023, 06:18 AM
|
||
Started by seqadmin, 09-20-2023, 09:17 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-20-2023, 09:17 AM
|
Comment