Header Leaderboard Ad

Collapse

bfast: how does it recognize SOLiD mate pairs

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bfast: how does it recognize SOLiD mate pairs

    I have read through the bfast manual and although I generally find it very informative, I think it is unclear how SOLiD mate-pairs should be treated in the solid2fastq step. My SOLiD reads files have the following format: F3 and R3 mates are on the same rows of two separate files.

    What are the requirements for successful fastq conversion of such files, keeping mate-pair information in the conversion process?

    Should the names of mates be identical or is it ok to keep the F3 and R3s?

    Should the pairs be placed next to each other prior to conversion in one file, as is exemplified in Figure 4.4 (depicting the resulting fastq file for Illumina reads)?

    Regards

    //Carl

  • #2
    Originally posted by Calle View Post
    I have read through the bfast manual and although I generally find it very informative, I think it is unclear how SOLiD mate-pairs should be treated in the solid2fastq step. My SOLiD reads files have the following format: F3 and R3 mates are on the same rows of two separate files.

    What are the requirements for successful fastq conversion of such files, keeping mate-pair information in the conversion process?

    Should the names of mates be identical or is it ok to keep the F3 and R3s?

    Should the pairs be placed next to each other prior to conversion in one file, as is exemplified in Figure 4.4 (depicting the resulting fastq file for Illumina reads)?

    Regards

    //Carl
    Paired end or mate pair reads must have the same read name, so you have to strip off the trailing F3/R3 etc. The pairs/mates should be successive in the file.

    Comment


    • #3
      aligning paired end data with differing lengths

      I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other. How does this affect the usage of aligners that are designed to map reads of equal length?
      I'll have to work with such data soon and I'm just thinking about the complications. Unless I'm mistaken, BWA does not work with different read sizes. As to BFAST, the indexes for the genome are different for lengths <40 and >40.
      Will I have to apply a trick like adding 15 "." and according quality scores to the 35 bp reads, or shorten the 50 bp to 35 bp?
      I hope you can enlighten me so I don't have to resort to using BioScope.

      Comment


      • #4
        Originally posted by epigen View Post
        I just learned that the new ABI SOLiD mate pairs have
        differing lengths: 50 bp for the first read and 35 bp for the other. How does
        this affect the usage of aligners that are designed to map reads of equal
        length? I'll have to work with such data soon and I'm just thinking about the
        complications. Unless I'm mistaken, BWA does not work with different read
        sizes.
        BWA deals with different read lenghts without problems.

        As to BFAST, the indexes for the genome are different for lengths <40
        and >40. Will I have to apply a trick like adding 15 "." and according quality
        scores to the 35 bp reads, or shorten the 50 bp to 35 bp? I hope you can
        enlighten me so I don't have to resort to using BioScope.
        You can process both ends of the read using the recommended indexes.

        Also, try Bioscope. I haven't use it since version 1.0 but I've heart it
        is much friendly now.
        -drd

        Comment


        • #5
          Originally posted by epigen View Post
          I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other.
          Haven't heard about that but I suppose it could be done. Aside from a shorter run time and slightly less cost, I do not see the advantage of using 35 bp reads. Currently we are doing mate-pair runs of 50bp F3 with 50bp R3 and paired-end runs of 50bp F3 with 25bp F5. Personally I'd like to see the paired-end go up to 35bp since 25bp gets into 'noise' territory.

          I hope you can enlighten me so I don't have to resort to using BioScope.
          I'll agree with 'drio' that bioscope has become a lot more friendly. Or perhaps I have just gotten use to it. Like any tool with lots of 'blades' to handle the various tasks people may wish to do, Bioscope can seem intimidating.

          Comment


          • #6
            Originally posted by epigen View Post
            I just learned that the new ABI SOLiD mate pairs have differing lengths: 50 bp for the first read and 35 bp for the other.
            The reads you are talking about are not the SOLiD mate-pairs, but PAIRED-END. These are different from the mate-pairs, because library prep is the same as with fragment, but you get more data by additionally sequencing 25 or 35 bp from the other end of the fragment. Hope this helps!

            Comment


            • #7
              BFAST indexes for SOLiD 50+35 paired end reads

              Originally posted by bpetersen View Post
              The reads you are talking about are not the SOLiD mate-pairs, but PAIRED-END. These are different from the mate-pairs, because library prep is the same as with fragment, but you get more data by additionally sequencing 25 or 35 bp from the other end of the fragment. Hope this helps!
              Thanks for correcting me, indeed we have paired end of 50+35 bp.
              We have decided to use both BioScope and BFAST.
              For BFAST, I have the indexes for 50 bp already. Should I create additional indexes for the 35 bp ends or will it work well with the ones recommended for 50 bp? In this thread http://seqanswers.com/forums/showthread.php?t=3535 Nils and David had different recommendations for 35 bp and it seems the 50 bp indexes work better than the 25 bp indexes. Has anyone explicitely compared the performance?

              Comment


              • #8
                Originally posted by epigen View Post
                Thanks for correcting me, indeed we have paired end of 50+35 bp.
                We have decided to use both BioScope and BFAST.
                For BFAST, I have the indexes for 50 bp already. Should I create additional indexes for the 35 bp ends or will it work well with the ones recommended for 50 bp? In this thread http://seqanswers.com/forums/showthread.php?t=3535 Nils and David had different recommendations for 35 bp and it seems the 50 bp indexes work better than the 25 bp indexes. Has anyone explicitely compared the performance?
                We haven't tested the indexes on the 35bp end. We have found that running BWA on the 35bp and BFAST on the 50bp end works very well. That is why we incorporated parts of BWA into BFAST, creating a hybrid version.

                Comment


                • #9
                  50+35 bp SOLiD recipe

                  Originally posted by nilshomer View Post
                  We haven't tested the indexes on the 35bp end. We have found that running BWA on the 35bp and BFAST on the 50bp end works very well. That is why we incorporated parts of BWA into BFAST, creating a hybrid version.
                  Great that you have already experience to share! And now I know the true reason why BWA was incorporated in BFAST. I'll try that as soon as I get the data. Two questions come to my mind right now:
                  1. I assume that the indexes needed for bwaaln are the same as the ones that bwa index creates, right?
                  2. Which file do I have to specifiy with which parameter for bfast localalign:
                  -1 matches_from_bfastmatch_50bp -2 matches_from_bwaaln_35bp?

                  You might want to include the 50+35 bp SOLiD procedure in the manual. I find the examples given there very helpful and I'm sure other users would appreciate a "cooking recipe" for this, too, because 50+35 bp seem to become a standard for new SOLiD machines.

                  Thank you very much again Nils.
                  Best,
                  Barbara

                  Comment


                  • #10
                    1. use bfast match for the 50bp tag and create your bmf file. Do the same for the second tag but using bwaaln. Bfast match will use bfast indexees and bwaaln will use bwa indexes.

                    2. Yes. -1 50bp bmf -2 25bp bmf.
                    -drd

                    Comment


                    • #11
                      bfast bwaaln parameters

                      Thanks David!
                      Can I just use the defaults in bfast bwaaln? I was wondering about two parameters because those F5-P2 reads are already 35 bp:
                      "-l INT seed length [32]"
                      "-q INT quality threshold for read trimming down to 35bp [0]"

                      Best,
                      Barbara

                      Comment


                      • #12
                        Originally posted by drio View Post

                        Also, try Bioscope. I haven't use it since version 1.0 but I've heart it
                        is much friendly now.
                        I constantly have to fight with the big 'friendly' giant to make it work the way I want. I think documentation has improved substantially though. They have changed to BAM as input for post mapping.
                        RAM requirements are going up and 1.3 is coming soon.
                        I dislike the lack of community support though.


                        @Nils: I wasn't aware of the incorporation of BWA portion into BFAST so how does that work?
                        when we input paired end data it auto uses BWA for the 35bp and BFAST for the front 50 bp?
                        http://kevin-gattaca.blogspot.com/

                        Comment


                        • #13
                          Originally posted by KevinLam View Post
                          I constantly have to fight with the big 'friendly' giant to make it work the way I want. I think documentation has improved substantially though. They have changed to BAM as input for post mapping.
                          RAM requirements are going up and 1.3 is coming soon.
                          I dislike the lack of community support though.


                          @Nils: I wasn't aware of the incorporation of BWA portion into BFAST so how does that work?
                          when we input paired end data it auto uses BWA for the 35bp and BFAST for the front 50 bp?
                          The "bwa aln" command is incorporated into BFAST as the "bfast bwaaln" command, to support short reads (i.e. 35bp reads) The output format is BFAST-compatible (BMF) so it can be seamlessly input into "bfast localalign" and moved through the pipeline. Theoretically, you could run "bfast bwaaln" on both ends and it could go through the rest of the pipeline.

                          Comment


                          • #14
                            Originally posted by epigen View Post
                            Thanks David!
                            Can I just use the defaults in bfast bwaaln? I was wondering about two parameters because those F5-P2 reads are already 35 bp:
                            "-l INT seed length [32]"
                            "-q INT quality threshold for read trimming down to 35bp [0]"

                            Best,
                            Barbara
                            Yes, defaults values on 35bp have yielded good results. Sample your
                            data (for quicker testing) and try different options.
                            -drd

                            Comment

                            Working...
                            X