Seqanswers Leaderboard Ad

**nilshomer** · 05-07-2010, 01:33 AM

Originally posted by kasutubh View Post

Hello Everyone,

Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

Any help is hugely appreciated!

Thanks in advance,

Kaustubh Gokhale.

What program did you use to generate the BAM file? The SEQ/QUAL fields should be in basespace, with the original colors/color-qualities optionally in the CS/CQ tags.

**kasutubh** · 05-07-2010, 01:39 AM

These files were sent to me by the ABI guys. We had asked them to align the sequences to a reference. As a output they have sent these files. What is the raw data format of SOLiD? I need files in the fastq format.

**westerman** · 05-07-2010, 09:40 AM

The 'raw data' format from SOLiD is the color-space reads that look like FastA files. But often the core center people will do more processing in order to map the reads to the reference, do SNP calls, transcriptomes, etc. All of these subsequent steps will generate different types of files -- FastA-like, GFF, SAM, etc. No FastQ though.

**ambarrio** · 07-01-2010, 06:16 AM

I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

BamUtil: bam2FastQ - Genome Analysis Wiki

http://genome.sph.umich.edu/wiki/Bam2FastQ

**Brugger** · 07-02-2010, 05:11 AM

Firstly you have make sure that the unmapped reads exist in you bam file, if they do they will be in colour space as written above.

Secondly: doing a colourspace --> basespace transformation, will push the colour space technology considerably. I recently did a raw transformation from colourspace to basespace of 10000 reads. I know that these reads maps to the reference genome using colourspace. I then tried to redo the alignment using blat and only about 30% gave considerable hits.

Thirdly: as the qualities will be pr colour, and not pr base you have to transform the colour qualities into base qualities, this is important especially you clip low quality off. As I understand to get a base QV you should add the two colour QV surrounding a base.
The data was from a solid3 run, so if your run was done using solid4 you should get better results.

I am sure that there are other things you should consider before spending considerable time on this.

I know that curtain is using a similar approach for gapclosing, and that works, as I understand, in colourspace with out any great hacks.

**lh3** · 07-02-2010, 05:48 AM

The abstract of IMAGE said that "a practical approach that uses *Illumina* sequences". No, it does not work with SOLiD, unless they update the software after the publication. The base sequence is derived after the alignment. But for unmapped reads, you do not have base sequences.

**snetmcom** · 07-09-2010, 01:50 PM

I am 95% sure this person's BAM file is in basespace already. None of the AB tools output BAM until after mapping./

i have never converted BAM to fastq, but i imagine there is something in samtools.

**Brugger** · 07-10-2010, 04:47 AM

The whole idea of image is that it is using read pairs where only one read is mapped facing towards a gap. The other ends are then assembled and joined, if possible, with the end of the contig that the ends map to. As this read is unmapped it will only exist in colourspace and *not* basespace.

As Heng correctly points out the abstract states that IMAGE is for *Illumina* reads, that mean that it will not work with unmapped colourspace reads. Spending time getting IMAGE to do this task is like using pliers to remove a screw. What you really want is a tool (screwdriver) developed to the task at hand.

As I wrote above curtain should support colourspace assembly as it can use velvet that I know for certain supports colourspace assemblies.

**epigen** · 11-03-2010, 05:07 AM

BAM to fastq color space

Originally posted by ambarrio View Post

I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

http://genome.sph.umich.edu/wiki/Bam2FastQ

Have you found out by now? I need a tool that makes a fastq file (BFAST style) of the original color space sequence and quality scores from a BAM file (CS and CQ tags). It's not clear if the Bam2FastQ in the link does that or just uses the nucleotide space sequences and their qualities.

**drio** · 11-03-2010, 07:01 AM

What about this.

Code:

$ samtools view my.bam | ./bam2fastq.rb > my.fastq

**Brugger** · 11-03-2010, 08:06 AM

Code:

samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'

Should do it I think...

**epigen** · 11-04-2010, 08:19 AM

Originally posted by Brugger View Post

Code:

samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'

Should do it I think...

Almost perfect - it's just missing the "+" line, which I added here in case someone is interested:

Code:

samtools view bfast.bam | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n+\n$3/'

However, I was looking out for a tool that can reconstruct the paired end fastq. I guess I'll write one myself that operates on name sorted BAM files. Thanks for the neat Perl trick, this should help do the job.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Conversion of colourspace into basespace format.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News