Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conversion of colourspace into basespace format.

    Hello Everyone,

    Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

    Any help is hugely appreciated!

    Thanks in advance,

    Kaustubh Gokhale.
    Last edited by kasutubh; 05-07-2010, 12:37 AM.

  • #2
    Originally posted by kasutubh View Post
    Hello Everyone,

    Sorry if this is a re-post..but is there any way to convert SOLiD .bam files data into basespace format. We are trying to use IMAGE algorithm (http://genomebiology.com/2010/11/4/R41) which needs the files to be in the fastq format.

    Any help is hugely appreciated!

    Thanks in advance,

    Kaustubh Gokhale.
    What program did you use to generate the BAM file? The SEQ/QUAL fields should be in basespace, with the original colors/color-qualities optionally in the CS/CQ tags.

    Comment


    • #3
      These files were sent to me by the ABI guys. We had asked them to align the sequences to a reference. As a output they have sent these files. What is the raw data format of SOLiD? I need files in the fastq format.

      Comment


      • #4
        The 'raw data' format from SOLiD is the color-space reads that look like FastA files. But often the core center people will do more processing in order to map the reads to the reference, do SNP calls, transcriptomes, etc. All of these subsequent steps will generate different types of files -- FastA-like, GFF, SAM, etc. No FastQ though.

        Comment


        • #5
          I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

          Comment


          • #6
            Firstly you have make sure that the unmapped reads exist in you bam file, if they do they will be in colour space as written above.

            Secondly: doing a colourspace --> basespace transformation, will push the colour space technology considerably. I recently did a raw transformation from colourspace to basespace of 10000 reads. I know that these reads maps to the reference genome using colourspace. I then tried to redo the alignment using blat and only about 30% gave considerable hits.

            Thirdly: as the qualities will be pr colour, and not pr base you have to transform the colour qualities into base qualities, this is important especially you clip low quality off. As I understand to get a base QV you should add the two colour QV surrounding a base.
            The data was from a solid3 run, so if your run was done using solid4 you should get better results.

            I am sure that there are other things you should consider before spending considerable time on this.

            I know that curtain is using a similar approach for gapclosing, and that works, as I understand, in colourspace with out any great hacks.

            Comment


            • #7
              The abstract of IMAGE said that "a practical approach that uses *Illumina* sequences". No, it does not work with SOLiD, unless they update the software after the publication. The base sequence is derived after the alignment. But for unmapped reads, you do not have base sequences.

              Comment


              • #8
                I am 95% sure this person's BAM file is in basespace already. None of the AB tools output BAM until after mapping./

                i have never converted BAM to fastq, but i imagine there is something in samtools.

                Comment


                • #9
                  The whole idea of image is that it is using read pairs where only one read is mapped facing towards a gap. The other ends are then assembled and joined, if possible, with the end of the contig that the ends map to. As this read is unmapped it will only exist in colourspace and *not* basespace.

                  As Heng correctly points out the abstract states that IMAGE is for *Illumina* reads, that mean that it will not work with unmapped colourspace reads. Spending time getting IMAGE to do this task is like using pliers to remove a screw. What you really want is a tool (screwdriver) developed to the task at hand.

                  As I wrote above curtain should support colourspace assembly as it can use velvet that I know for certain supports colourspace assemblies.

                  Comment


                  • #10
                    BAM to fastq color space

                    Originally posted by ambarrio View Post
                    I think you could use this software (if you don't have the time to develop yours) because it seems it does the task you are looking for. But I haven't found the place to download though (and I am interested as well). I think you have to email them personally maybe.

                    http://genome.sph.umich.edu/wiki/Bam2FastQ
                    Have you found out by now? I need a tool that makes a fastq file (BFAST style) of the original color space sequence and quality scores from a BAM file (CS and CQ tags). It's not clear if the Bam2FastQ in the link does that or just uses the nucleotide space sequences and their qualities.

                    Comment


                    • #11
                      What about this.

                      Code:
                      $ samtools view my.bam | ./bam2fastq.rb > my.fastq
                      -drd

                      Comment


                      • #12
                        Code:
                        samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'
                        Should do it I think...

                        Comment


                        • #13
                          Originally posted by Brugger View Post
                          Code:
                          samtools view bfast.bam  | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n$3/'
                          Should do it I think...
                          Almost perfect - it's just missing the "+" line, which I added here in case someone is interested:
                          Code:
                          samtools view bfast.bam | perl -pe 's/^(\w+?)\t.*\tCS:Z:(.*?)\t.*CQ:Z:(.*?)(\t.*|\z)/>$1\n$2\n+\n$3/'
                          However, I was looking out for a tool that can reconstruct the paired end fastq. I guess I'll write one myself that operates on name sorted BAM files. Thanks for the neat Perl trick, this should help do the job.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X