Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conversion of colourspace into basespace format.

    Hello Everyone,

    Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

    Any help is hugely appreciated!

    Thanks in advance,

    Kaustubh Gokhale.
    Last edited by kasutubh; 05-07-2010, 12:37 AM.

  • #2
    Originally posted by kasutubh View Post
    Hello Everyone,

    Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

    Any help is hugely appreciated!

    Thanks in advance,

    Kaustubh Gokhale.
    What program did you use to generate the BAM file? The SEQ/QUAL fields should be in basespace, with the original colors/color-qualities optionally in the CS/CQ tags.

    Comment


    • #3
      These files were sent to me by the ABI guys. We had asked them to align the sequences to a reference. As a output they have sent these files. What is the raw data format of SOLiD? I need files in the fastq format.

      Comment


      • #4
        The 'raw data' format from SOLiD is the color-space reads that look like FastA files. But often the core center people will do more processing in order to map the reads to the reference, do SNP calls, transcriptomes, etc. All of these subsequent steps will generate different types of files -- FastA-like, GFF, SAM, etc. No FastQ though.

        Comment


        • #5
          I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

          Comment


          • #6
            Firstly you have make sure that the unmapped reads exist in you bam file, if they do they will be in colour space as written above.

            Secondly: doing a colourspace --> basespace transformation, will push the colour space technology considerably. I recently did a raw transformation from colourspace to basespace of 10000 reads. I know that these reads maps to the reference genome using colourspace. I then tried to redo the alignment using blat and only about 30% gave considerable hits.

            Thirdly: as the qualities will be pr colour, and not pr base you have to transform the colour qualities into base qualities, this is important especially you clip low quality off. As I understand to get a base QV you should add the two colour QV surrounding a base.
            The data was from a solid3 run, so if your run was done using solid4 you should get better results.

            I am sure that there are other things you should consider before spending considerable time on this.

            I know that curtain is using a similar approach for gapclosing, and that works, as I understand, in colourspace with out any great hacks.

            Comment


            • #7
              The abstract of IMAGE said that "a practical approach that uses *Illumina* sequences". No, it does not work with SOLiD, unless they update the software after the publication. The base sequence is derived after the alignment. But for unmapped reads, you do not have base sequences.

              Comment


              • #8
                I am 95% sure this person's BAM file is in basespace already. None of the AB tools output BAM until after mapping./

                i have never converted BAM to fastq, but i imagine there is something in samtools.

                Comment


                • #9
                  The whole idea of image is that it is using read pairs where only one read is mapped facing towards a gap. The other ends are then assembled and joined, if possible, with the end of the contig that the ends map to. As this read is unmapped it will only exist in colourspace and *not* basespace.

                  As Heng correctly points out the abstract states that IMAGE is for *Illumina* reads, that mean that it will not work with unmapped colourspace reads. Spending time getting IMAGE to do this task is like using pliers to remove a screw. What you really want is a tool (screwdriver) developed to the task at hand.

                  As I wrote above curtain should support colourspace assembly as it can use velvet that I know for certain supports colourspace assemblies.

                  Comment


                  • #10
                    BAM to fastq color space

                    Originally posted by ambarrio View Post
                    I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

                    http://genome.sph.umich.edu/wiki/Bam2FastQ
                    Have you found out by now? I need a tool that makes a fastq file (BFAST style) of the original color space sequence and quality scores from a BAM file (CS and CQ tags). It's not clear if the Bam2FastQ in the link does that or just uses the nucleotide space sequences and their qualities.

                    Comment


                    • #11
                      What about this.

                      Code:
                      $ samtools view my.bam | ./bam2fastq.rb > my.fastq
                      -drd

                      Comment


                      • #12
                        Code:
                        samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'
                        Should do it I think...

                        Comment


                        • #13
                          Originally posted by Brugger View Post
                          Code:
                          samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'
                          Should do it I think...
                          Almost perfect - it's just missing the "+" line, which I added here in case someone is interested:
                          Code:
                          samtools view bfast.bam | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n+\n$3/'
                          However, I was looking out for a tool that can reconstruct the paired end fastq. I guess I'll write one myself that operates on name sorted BAM files. Thanks for the neat Perl trick, this should help do the job.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Best Practices for Single-Cell Sequencing Analysis
                            by seqadmin



                            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                            06-06-2024, 07:15 AM
                          • seqadmin
                            Latest Developments in Precision Medicine
                            by seqadmin



                            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                            Somatic Genomics
                            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                            05-24-2024, 01:16 PM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Today, 07:24 AM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 08:58 AM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 06-12-2024, 02:20 PM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 06-07-2024, 06:58 AM
                          0 responses
                          184 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X