Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bfast: how does it recognize SOLiD mate pairs

    I have read through the bfast manual and although I generally find it very informative, I think it is unclear how SOLiD mate-pairs should be treated in the solid2fastq step. My SOLiD reads files have the following format: F3 and R3 mates are on the same rows of two separate files.

    What are the requirements for successful fastq conversion of such files, keeping mate-pair information in the conversion process?

    Should the names of mates be identical or is it ok to keep the F3 and R3s?

    Should the pairs be placed next to each other prior to conversion in one file, as is exemplified in Figure 4.4 (depicting the resulting fastq file for Illumina reads)?

    Regards

    //Carl

  • #2
    Originally posted by Calle View Post
    I have read through the bfast manual and although I generally find it very informative, I think it is unclear how SOLiD mate-pairs should be treated in the solid2fastq step. My SOLiD reads files have the following format: F3 and R3 mates are on the same rows of two separate files.

    What are the requirements for successful fastq conversion of such files, keeping mate-pair information in the conversion process?

    Should the names of mates be identical or is it ok to keep the F3 and R3s?

    Should the pairs be placed next to each other prior to conversion in one file, as is exemplified in Figure 4.4 (depicting the resulting fastq file for Illumina reads)?

    Regards

    //Carl
    Paired end or mate pair reads must have the same read name, so you have to strip off the trailing F3/R3 etc. The pairs/mates should be successive in the file.

    Comment


    • #3
      aligning paired end data with differing lengths

      I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other. How does this affect the usage of aligners that are designed to map reads of equal length?
      I'll have to work with such data soon and I'm just thinking about the complications. Unless I'm mistaken, BWA does not work with different read sizes. As to BFAST, the indexes for the genome are different for lengths <40 and >40.
      Will I have to apply a trick like adding 15 "." and according quality scores to the 35 bp reads, or shorten the 50 bp to 35 bp?
      I hope you can enlighten me so I don't have to resort to using BioScope.

      Comment


      • #4
        Originally posted by epigen View Post
        I just learned that the new ABI SOLiD mate pairs have
        differing lengths: 50 bp for the first read and 35 bp for the other. How does
        this affect the usage of aligners that are designed to map reads of equal
        length? I'll have to work with such data soon and I'm just thinking about the
        complications. Unless I'm mistaken, BWA does not work with different read
        sizes.
        BWA deals with different read lenghts without problems.

        As to BFAST, the indexes for the genome are different for lengths <40
        and >40. Will I have to apply a trick like adding 15 "." and according quality
        scores to the 35 bp reads, or shorten the 50 bp to 35 bp? I hope you can
        enlighten me so I don't have to resort to using BioScope.
        You can process both ends of the read using the recommended indexes.

        Also, try Bioscope. I haven't use it since version 1.0 but I've heart it
        is much friendly now.
        -drd

        Comment


        • #5
          Originally posted by epigen View Post
          I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other.
          Haven't heard about that but I suppose it could be done. Aside from a shorter run time and slightly less cost, I do not see the advantage of using 35 bp reads. Currently we are doing mate-pair runs of 50bp F3 with 50bp R3 and paired-end runs of 50bp F3 with 25bp F5. Personally I'd like to see the paired-end go up to 35bp since 25bp gets into 'noise' territory.

          I hope you can enlighten me so I don't have to resort to using BioScope.
          I'll agree with 'drio' that bioscope has become a lot more friendly. Or perhaps I have just gotten use to it. Like any tool with lots of 'blades' to handle the various tasks people may wish to do, Bioscope can seem intimidating.

          Comment


          • #6
            Originally posted by epigen View Post
            I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other.
            The reads you are talking about are not the SOLiD mate-pairs, but PAIRED-END. These are different from the mate-pairs, because library prep is the same as with fragment, but you get more data by additionally sequencing 25 or 35 bp from the other end of the fragment. Hope this helps!

            Comment


            • #7
              BFAST indexes for SOLiD 50+35 paired end reads

              Originally posted by bpetersen View Post
              The reads you are talking about are not the SOLiD mate-pairs, but PAIRED-END. These are different from the mate-pairs, because library prep is the same as with fragment, but you get more data by additionally sequencing 25 or 35 bp from the other end of the fragment. Hope this helps!
              Thanks for correcting me, indeed we have paired end of 50+35 bp.
              We have decided to use both BioScope and BFAST.
              For BFAST, I have the indexes for 50 bp already. Should I create additional indexes for the 35 bp ends or will it work well with the ones recommended for 50 bp? In this thread http://seqanswers.com/forums/showthread.php?t=3535 Nils and David had different recommendations for 35 bp and it seems the 50 bp indexes work better than the 25 bp indexes. Has anyone explicitely compared the performance?

              Comment


              • #8
                Originally posted by epigen View Post
                Thanks for correcting me, indeed we have paired end of 50+35 bp.
                We have decided to use both BioScope and BFAST.
                For BFAST, I have the indexes for 50 bp already. Should I create additional indexes for the 35 bp ends or will it work well with the ones recommended for 50 bp? In this thread http://seqanswers.com/forums/showthread.php?t=3535 Nils and David had different recommendations for 35 bp and it seems the 50 bp indexes work better than the 25 bp indexes. Has anyone explicitely compared the performance?
                We haven't tested the indexes on the 35bp end. We have found that running BWA on the 35bp and BFAST on the 50bp end works very well. That is why we incorporated parts of BWA into BFAST, creating a hybrid version.

                Comment


                • #9
                  50+35 bp SOLiD recipe

                  Originally posted by nilshomer View Post
                  We haven't tested the indexes on the 35bp end. We have found that running BWA on the 35bp and BFAST on the 50bp end works very well. That is why we incorporated parts of BWA into BFAST, creating a hybrid version.
                  Great that you have already experience to share! And now I know the true reason why BWA was incorporated in BFAST. I'll try that as soon as I get the data. Two questions come to my mind right now:
                  1. I assume that the indexes needed for bwaaln are the same as the ones that bwa index creates, right?
                  2. Which file do I have to specifiy with which parameter for bfast localalign:
                  -1 matches_from_bfastmatch_50bp -2 matches_from_bwaaln_35bp?

                  You might want to include the 50+35 bp SOLiD procedure in the manual. I find the examples given there very helpful and I'm sure other users would appreciate a "cooking recipe" for this, too, because 50+35 bp seem to become a standard for new SOLiD machines.

                  Thank you very much again Nils.
                  Best,
                  Barbara

                  Comment


                  • #10
                    1. use bfast match for the 50bp tag and create your bmf file. Do the same for the second tag but using bwaaln. Bfast match will use bfast indexees and bwaaln will use bwa indexes.

                    2. Yes. -1 50bp bmf -2 25bp bmf.
                    -drd

                    Comment


                    • #11
                      bfast bwaaln parameters

                      Thanks David!
                      Can I just use the defaults in bfast bwaaln? I was wondering about two parameters because those F5-P2 reads are already 35 bp:
                      "-l INT seed length [32]"
                      "-q INT quality threshold for read trimming down to 35bp [0]"

                      Best,
                      Barbara

                      Comment


                      • #12
                        Originally posted by drio View Post

                        Also, try Bioscope. I haven't use it since version 1.0 but I've heart it
                        is much friendly now.
                        I constantly have to fight with the big 'friendly' giant to make it work the way I want. I think documentation has improved substantially though. They have changed to BAM as input for post mapping.
                        RAM requirements are going up and 1.3 is coming soon.
                        I dislike the lack of community support though.


                        @Nils: I wasn't aware of the incorporation of BWA portion into BFAST so how does that work?
                        when we input paired end data it auto uses BWA for the 35bp and BFAST for the front 50 bp?
                        http://kevin-gattaca.blogspot.com/

                        Comment


                        • #13
                          Originally posted by KevinLam View Post
                          I constantly have to fight with the big 'friendly' giant to make it work the way I want. I think documentation has improved substantially though. They have changed to BAM as input for post mapping.
                          RAM requirements are going up and 1.3 is coming soon.
                          I dislike the lack of community support though.


                          @Nils: I wasn't aware of the incorporation of BWA portion into BFAST so how does that work?
                          when we input paired end data it auto uses BWA for the 35bp and BFAST for the front 50 bp?
                          The "bwa aln" command is incorporated into BFAST as the "bfast bwaaln" command, to support short reads (i.e. 35bp reads) The output format is BFAST-compatible (BMF) so it can be seamlessly input into "bfast localalign" and moved through the pipeline. Theoretically, you could run "bfast bwaaln" on both ends and it could go through the rest of the pipeline.

                          Comment


                          • #14
                            Originally posted by epigen View Post
                            Thanks David!
                            Can I just use the defaults in bfast bwaaln? I was wondering about two parameters because those F5-P2 reads are already 35 bp:
                            "-l INT seed length [32]"
                            "-q INT quality threshold for read trimming down to 35bp [0]"

                            Best,
                            Barbara
                            Yes, defaults values on 35bp have yielded good results. Sample your
                            data (for quicker testing) and try different options.
                            -drd

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Best Practices for Single-Cell Sequencing Analysis
                              by seqadmin



                              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                              06-06-2024, 07:15 AM
                            • seqadmin
                              Latest Developments in Precision Medicine
                              by seqadmin



                              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                              Somatic Genomics
                              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                              05-24-2024, 01:16 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, Today, 07:24 AM
                            0 responses
                            9 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, Yesterday, 08:58 AM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-12-2024, 02:20 PM
                            0 responses
                            16 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 06-07-2024, 06:58 AM
                            0 responses
                            184 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X