Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • NanYu
    replied
    with the -U option added, it finished within 1 min!
    but it means BFAST will not consider the results for "mated-pair". How much it will affect the final result (compared with not using -U option)?
    If I use the -U option, may I use the SAMTools or other software to further filter the result before SNP discovery?
    Thanks!

    Leave a comment:


  • nilshomer
    replied
    Try using the "-U" option.

    Leave a comment:


  • NanYu
    replied
    Thanks for replying.

    I'm using "Version: 0.6.5a git:Revision: undefined$"

    Here is the command line I used for postprocess:

    bfast postprocess -f reference.fa -A 1 -O 1 -i bfast.aligned_subset.baf >bfast_subset.sam

    It took about 1 hour to finish this (using 1 CPU), with 5000 pairs. Memory usage is <2G while the system has 48G memory.


    Some output from the BFAST:

    ************************************************************
    Postprocessing...
    ************************************************************
    Estimating paired end distance...
    Used 2478 paired end distances to infer the insert size distribution.
    The paired end distance range was from -12772 to 8032.
    The paired end distance mean and standard deviation were -2466.06 and 3543.15.
    The inversion ratio was 0.004036 (10 / 2478).
    Reads processed: 5000
    ************************************************************
    Reads processed: 5000
    Alignment complete.
    ************************************************************
    Found 178 reads with no ends mapped.
    Found 491 reads with 1 end mapped.
    Found 4331 reads with 2 ends mapped.
    Found 4822 reads with at least one end mapping.
    ************************************************************
    Terminating successfully!
    ************************************************************
    Last edited by NanYu; 05-07-2011, 04:20 AM.

    Leave a comment:


  • nilshomer
    replied
    What options are you using and what version of BFAST?

    Leave a comment:


  • NanYu
    replied
    I'm having the same problem of having very slow speed in the postprocess step (mated-pair end Solid sequences of 50bp each). My testing data set is only 5000 pairs of sequences. but it took a long time to run the postporcess step. Is there a bug for the mated-pair end?

    Also, in #3 posts, it uses -v option, but those options are not mentioned in the manual. Can anyone let me know what it means?

    Thanks!

    Leave a comment:


  • sdvie
    replied
    Originally posted by epigen View Post
    I also split the short reads for bwaaln, then feed the resulting shortreads.<nr>.bmf file and the longreads.<nr>.bmf from match into localalign, like that:
    bfast localalign -f $REF -1 longreads.1.bmf -2 shortreads.1.bmf -A 1 -t -U -n 8 > local.1.baf

    Then I run postprocess, convert the sam into a sorted bam, and at the end merge all bam files.

    I also see that localalign is the most time-consuming step, it takes up to 4 days on 8 CPUs for 50 Mio read pairs. (I have >500 read pairs for each slide.) Next I'll try using -M 200 to see if that results in a speedup.
    good to know, thanks!

    cheers,
    Sophia

    Leave a comment:


  • epigen
    replied
    Originally posted by sdvie View Post
    Dear all using bfast bwaaln,

    I have a related question; in fact I am running an experiment similar to the one that epigen described, that is paired end SOLiD reads of 50+25 lenght.

    For bfast match, I split the 50-reads in several files containing each 10 Mio reads, as recommended.
    However, I am not sure, whether I can do the same with the 25-reads for bfast bwaaln, run bfast bwaaln separately for each reads fragment and put them all together again in the locaalign step?
    When using one single fastq file containing all the reads (105 Mio) this step seems to take a long time (2 days and still running).

    Any comments will be appreciated.

    Thanks,
    Sophia
    I also split the short reads for bwaaln, then feed the resulting shortreads.<nr>.bmf file and the longreads.<nr>.bmf from match into localalign, like that:
    bfast localalign -f $REF -1 longreads.1.bmf -2 shortreads.1.bmf -A 1 -t -U -n 8 > local.1.baf

    Then I run postprocess, convert the sam into a sorted bam, and at the end merge all bam files.

    I also see that localalign is the most time-consuming step, it takes up to 4 days on 8 CPUs for 50 Mio read pairs. (I have >500 read pairs for each slide.) Next I'll try using -M 200 to see if that results in a speedup.

    Leave a comment:


  • sdvie
    replied
    a related question

    Dear all using bfast bwaaln,

    I have a related question; in fact I am running an experiment similar to the one that epigen described, that is paired end SOLiD reads of 50+25 lenght.

    For bfast match, I split the 50-reads in several files containing each 10 Mio reads, as recommended.
    However, I am not sure, whether I can do the same with the 25-reads for bfast bwaaln, run bfast bwaaln separately for each reads fragment and put them all together again in the locaalign step?
    When using one single fastq file containing all the reads (105 Mio) this step seems to take a long time (2 days and still running).

    Any comments will be appreciated.

    Thanks,
    Sophia

    Leave a comment:


  • Jerry-cs
    replied
    Hi Drio, thank you. The reads are from a 50+35 PE run (The F3 reads have 50 bases, while the F5-P2 reads are 30 bases long). In addition, the localalign step also takes much time.

    Originally posted by drio View Post
    @jerry-cs Is this data from a 50+25 PE run? These are only 5 millions reads do you see those number across the whole run? bfast does not do very well with 25bp reads. That's what the bfast+bwa branch(git) is for.

    Leave a comment:


  • drio
    replied
    @jerry-cs Is this data from a 50+25 PE run? These are only 5 millions reads do you see those number across the whole run? bfast does not do very well with 25bp reads. That's what the bfast+bwa branch(git) is for.

    Leave a comment:


  • Jerry-cs
    replied
    I have the same problem . Below is the alignment result:

    20000000 in total
    0 QC failure
    0 duplicates
    8331480 mapped (41.66%)
    20000000 paired in sequencing
    10000000 read1
    10000000 read2
    3581888 properly paired (17.91%)
    3833680 with itself and mate mapped
    4497800 singletons (22.49%)
    241894 with mate mapped to a different chr
    221791 with mate mapped to a different chr (mapQ>=5)

    Because I'm new to NGS, I'm wondering if the above results reasonable.Thank you.


    Originally posted by epigen View Post
    For the whole set, ABI BioScope could map 34% as proper pairs, 42% of the reads were unmapped, and it reported Insert range 94-206. I split the data set for BFAST and from the 2 parts that finished, least 60% mapped but <20% in proper pairs.
    Am I doing something wrong? Or might it be because of bad read quality? Any help will be very much appreciated.

    Barbara

    Leave a comment:


  • nilshomer
    replied
    Originally posted by epigen View Post
    Sorry to ask again, but have you found the bug(s)? Especially the one causing "properly paired" reads with insert sizes of several 100 Mio bp is weird considering the lengths of the human chromosomes ...
    Sorry I have not had the chance to take a look at it. Could you send the report to [email protected]. Sorry again.

    Leave a comment:


  • epigen
    replied
    insert size bug

    Sorry to ask again, but have you found the bug(s)? Especially the one causing "properly paired" reads with insert sizes of several 100 Mio bp is weird considering the lengths of the human chromosomes ...

    Leave a comment:


  • nilshomer
    replied
    Thank-you for reporting it, I will take a look since it is probably a bug. I am on vacation for a few days for [Canadian] Thanksgiving!

    As for the 1.0 release, it's like google software (for the most part): always in beta.

    Leave a comment:


  • epigen
    replied
    negative insert sizes

    Thanks Nils.
    Now I found the bfast git version with additional parameters and run
    bfast postprocess -f $REF -i reads.baf -a 3 -A 1 -R -z -v 160 -s 20 -S 4.0 > reads.sam
    This results in a tremendous speedup compared to version 0.6.4e with options -a 3 -A 1 -R -z (more than 20 times faster) and even finds more pairs - definitively an improvement! However, I'm confused that the reported insert sizes are all negative.

    Example:
    2221_1132_511 99 chrX 141904000 255 50M = 141904140 -106 ATTTATCATGATTAACACCATTGTCTTCATTGTATATTTTCTAAGCTGCT ````````````````````````````````````````````````^; PG:Z:bfast AS:i:2500 MQ:i:255 NM:i:0 NH:i:1 IH:i:1 HI:i:1 MD:Z:50 CS:Z:T33003321312303011101301122021301133330002230232132 CQ:Z:BBBBBB@A?B5BBABA@@@BA@A>BBB?;BB=B=BAB=??@?B9@@A?:; CM:i:0 XA:i:3 XE:Z:--------------------------------------------------
    2221_1132_511 147 chrX 141904140 255 35M = 141904000 -106 ACCAGAAGCGTCTCTGATTCTGGGTGAGCAGTGAC 9W0")""^]`)"XX_()`_``^UZ`53```````` PG:Z:bfast AS:i:1000 MQ:i:255 NM:i:0 NH:i:0 IH:i:1 HI:i:1 MD:Z:35 CS:Z:T11211213211100122031122211332121301 CQ:Z:[email protected]?2:;B9=@8?7879A7=8>1)&59 CM:i:6 XA:i:3 XE:Z:--31-1----1----1---------1---------

    Result from version 0.6.4e:
    2221_1132_511 99 chrX 141904000 255 50M = 141904140 140 ATTTATCATGATTAACACCATTGTCTTCATTGTATATTTTCTAAGCTGCT ````````````````````````````````````````````````^; PG:Z:bfast AS:i:2500 MQ:i:255 NM:i:0 NH:i:1 IH:i:1 HI:i:1 MD:Z:50 CS:Z:T33003321312303011101301122021301133330002230232132 CQ:Z:BBBBBB@A?B5BBABA@@@BA@A>BBB?;BB=B=BAB=??@?B9@@A?:; CM:i:0 XA:i:3 XE:Z:--------------------------------------------------"""
    """2221_1132_511 147 chrX 141904140 255 35M = 141904000 -140 ACCAGAAGCGTCTCTGATTCTGGGTGAGCAGTGAC 9W0")""^]`)"XX_()`_``^UZ`53```````` PG:Z:bfast AS:i:1000 MQ:i:255 NM:i:0 NH:i:0 IH:i:1 HI:i:1 MD:Z:35 CS:Z:T11211213211100122031122211332121301 CQ:Z:[email protected]?2:;B9=@8?7879A7=8>1)&59 CM:i:6 XA:i:3 XE:Z:--31-1----1----1---------1---------

    Obviously postprocess now calculates the isize differently than before, but for consistency it should still be positive for one read of the pair. samtools flagstat does not complain, other tools might.
    Edit: I just noticed that some reads labeled as "proper pairs" (flag 67) have insert sizes up to 200 Mio bp. How comes?!

    Looking forward to bfast 0.6.4f (or will there be bfast v1.0 ?)

    Barbara
    Last edited by epigen; 10-07-2010, 08:58 AM. Reason: detected weird proper pairs

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Genetic Variation in Immunogenetics and Antibody Diversity
    by seqadmin



    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
    11-06-2024, 07:24 PM
  • seqadmin
    Choosing Between NGS and qPCR
    by seqadmin



    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
    10-18-2024, 07:11 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 11-08-2024, 11:09 AM
0 responses
168 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-08-2024, 06:13 AM
0 responses
132 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-01-2024, 06:09 AM
0 responses
78 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-30-2024, 05:31 AM
0 responses
26 views
0 likes
Last Post seqadmin  
Working...
X