Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • grassgirl
    replied
    Thanks, kmcarr! This solutions appears to have worked. At least I'm not getting that error message when I run it the way you state above.

    Leave a comment:


  • kmcarr
    replied
    Originally posted by grassgirl View Post
    Hi,

    I am also having some frustrations with cap3. I can't get it to recognize my qual file associated with my fasta file. I was told that cap3 picks this file up automatically if it is in the same directory as the fasta file, as long as the name before the file extension is the same...is that true? I've tried naming just the .fna file and also adding the qual file to the command line, but I still get the message "No file of quality values (.qual) is found".

    Here are the two ways I've tried to input the files:

    $cap3 2011_4_26Reads.fna 2011_4_26Reads.qual

    $cap3 2011_4_26Reads.fna

    Any suggestions?
    The file needs to be named 2011_4_26Reads.fna.qual. cap3 assumes it is the full file name, with any extension if present, then with .qual.

    Leave a comment:


  • grassgirl
    replied
    Also having troubles with cap3, not seeing .qual file

    Hi,

    I am also having some frustrations with cap3. I can't get it to recognize my qual file associated with my fasta file. I was told that cap3 picks this file up automatically if it is in the same directory as the fasta file, as long as the name before the file extension is the same...is that true? I've tried naming just the .fna file and also adding the qual file to the command line, but I still get the message "No file of quality values (.qual) is found".

    Here are the two ways I've tried to input the files:

    $cap3 2011_4_26Reads.fna 2011_4_26Reads.qual

    $cap3 2011_4_26Reads.fna

    Any suggestions?

    Leave a comment:


  • ganga.jeena
    replied
    Is there any limitation to the input sequence size into CAP3.
    I want to combine assemblies from 454 and illumina using CAP3. However I find most of the contigs in singleton_file generated by cap3.
    If someone can explain...kindly help me out.
    Thanks

    Leave a comment:


  • SLB
    replied
    cap3 input file

    I am trying to use Cap3 at the moment but having some difficulty. I am concatenating some .fna files with some a contigs file in fasta format. However, although both files appear to be running when submitted individually, the concatenated file fails with a segmentation error. I have tried removing any empty line at the end of each file before concatenating but this didn't help. I even tried manually combining both files but it didn't help!Anyone come across a simliar problem and can share a solution.

    Leave a comment:


  • ganga.jeena
    replied
    Happy new year

    This is too late...
    but just want to know if u tried PCAP ?

    Leave a comment:


  • shabhonam
    replied
    Hi

    I have assembled the reads from 454 and illumina as contig1 and contig2 respectively of the same species, can somebody will help me in merging these two contig files into one file using cap3. My aim is to get larger contigs.

    Thanks in Advance

    Leave a comment:


  • shruti
    replied
    I know it does not have an inbuilt option for running multithread. I was wondering if anyone has a workaround it to make the program faster.

    Leave a comment:


  • sklages
    replied
    You can't. cap3 is not multithreaded. If you are going for EST assemblies, cluster the data and then assemble the distinct clusters.

    cheers,
    Sven

    Leave a comment:


  • shruti
    replied
    Hello everybody,

    can somebody tell me how to run cap3 in multiple processors, mine is an 8 core machine and i want to utilize 4-5 core of it.

    Thanks in advance.

    Leave a comment:


  • Bharat
    replied
    Thankyou all of you for your Kind suggestions

    Now CAP3 is working fine for my 250K sequences data set as I am using 64 bit version of CAP3
    But my workstation looks like a hang.. I mean its processing goes too slow. May be the program uses almost full RAM.

    My another question, what will be the ideal configuration of a workstation that can works for whole genome assembly, annotaion and other analysis. In near future I have to deal with very very large data.

    Please give me your valuable suggestions
    Last edited by Bharat; 01-31-2010, 08:59 PM.

    Leave a comment:


  • gpertea
    replied
    I agree with Sven, the OP keeps asking for help while not paying attention to good suggestions (like updating CAP3) or to simple questions (what is that number? 2.5 million or 250K ?! Can't you see how confusing that formatting is ?)

    However, the TGICL error logs supplied above show that CAP3 still fails on many of the clusters, probably due to out of memory issues again (not sure what exit code 34304 is, you could ask the author of CAP3 about that). Again, I think you should upgrade CAP3 to the latest 64bit version if you haven't done it yet (make sure you replace or delete the CAP3 binary that comes with the tgicl package in the tgicl/bin/ subdirectory, that's very old).
    The largest cluster reported there has 12,681 reads which is still very large, and I see that you used 93% identity - you might want to try increasing the stringency of the clustering&assembly process to reduce the cluster noise or unwanted expansion. There is an entire section with advices for dealing with larger clusters in the README file that comes with TGICL, make sure you read that.
    If you did all these and you still have errors with TGICL, please contact me privately so we don't turn this public discussion thread into a specific TGICL debugging exercise.

    However as Sven and others suggested, you could also try other clustering/assembly packages, they might be more user friendly than TGICL and work better on your configuration.

    Leave a comment:


  • sklages
    replied
    You have a problem with your assembly.

    First use a 64bit cap3 on a 64bit system, update your cap3 if necessary.

    You get an error on assembly of clustered data, " ran out of memory!".
    That's pretty clear. You probably have one or more very deep clusters or preprocessing
    of your data (adaptors, barcodes, vector?) didn't work very well ( as a consequence you get "deep clusters").

    Have a look at the TGICL clustering results; are there very deep
    clusters which might cause cap3 to fail?

    I am still not sure if you mean 2.5 mio. sequences for the large dataset?

    You have removed vector contaminants (did you?) so I assume you are using
    sanger based data? Did you just remove vector reads or did you end-clip your
    data? For sanger data 'lucy' is doing a good job.

    Really, try to fnd out if preprocessing worked well, try to use different cluster/
    assembly programs and see if you get different results.

    Last but not least, be aware that 8G is not really much for assembly of
    huge transcriptome datasets.

    You don't supply enough info to effectively help you ..

    cheers,
    Sven

    Leave a comment:


  • Bharat
    replied
    Somebody please help me regarding TGICL

    Leave a comment:


  • Bharat
    replied
    Thanks to all for providing me help and suggestions

    @Gpertea

    The "err_log" file in asm_1 folder displays as
    sh: line 1: 17975 Aborted cap3 CL4 -p 93 > CL4.align
    Error! cap3 failure detected (code=34304) on: CL4
    sh: line 1: 18941 Aborted cap3 CL71 -p 93 > CL71.align
    Error! cap3 failure detected (code=34304) on: CL71
    sh: line 1: 19795 Aborted cap3 CL131 -p 93 > CL131.align
    Error! cap3 failure detected (code=34304) on: CL131
    sh: line 1: 19842 Aborted cap3 CL135 -p 93 > CL135.align
    Error! cap3 failure detected (code=34304) on: CL135
    sh: line 1: 23712 Aborted cap3 CL409 -p 93 > CL409.align
    Error! cap3 failure detected (code=34304) on: CL409
    sh: line 1: 24016 Aborted cap3 CL431 -p 93 > CL431.align
    Error! cap3 failure detected (code=34304) on: CL431
    sh: line 1: 24613 Aborted cap3 CL474 -p 93 > CL474.align
    Error! cap3 failure detected (code=34304) on: CL474
    sh: line 1: 28899 Aborted cap3 CL779 -p 93 > CL779.align
    Error! cap3 failure detected (code=34304) on: CL779
    sh: line 1: 29114 Aborted cap3 CL795 -p 93 > CL795.align
    Error! cap3 failure detected (code=34304) on: CL795
    sh: line 1: 4258 Aborted cap3 CL1326 -p 93 > CL1326.align
    Error! cap3 failure detected (code=34304) on: CL1326
    sh: line 1: 6597 Aborted cap3 CL1492 -p 93 > CL1492.align
    Error! cap3 failure detected (code=34304) on: CL1492
    sh: line 1: 7594 Aborted cap3 CL1563 -p 93 > CL1563.align
    Error! cap3 failure detected (code=34304) on: CL1563
    sh: line 1: 7865 Aborted cap3 CL1583 -p 93 > CL1583.align
    Error! cap3 failure detected (code=34304) on: CL1583
    sh: line 1: 7954 Aborted cap3 CL1590 -p 93 > CL1590.align
    Error! cap3 failure detected (code=34304) on: CL1590
    where as "err_tgicl_My_data_set.fasta.log" in main tgicl folder displays me error as

    >>> --- Initialization [My_data_set.fasta] started at Jan 14 11:59:28 2010
    tgicl running options:
    tgicl My_data_set.fasta
    Standard log file: tgicl_My_data_set.fasta.log
    Error log file: err_tgicl_My_data_set.fasta.log
    Using 1 CPUs for clustering and assembly
    Path is : /root/Desktop/Software_Collection/Assembly/tgicl_linux/bin:/root/Desktop/Software_Collection/Assembly/tgicl_linux:/usr/lib64/qt-3.3/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
    -= Rebuilding My_data_set.fasta indices =-
    34071 entries from file My_data_set.fasta were indexed in file My_data_set.fasta.cidx
    >>> --- clustering [My_data_set.fasta] started at Jan 14 11:59:30 2010
    Launching distributed clustering:
    psx -p 1 -n 1000 -i My_data_set.fasta -d cluster -C '/root/Desktop/Software_Collection/Assembly/tgicl_linux/My_data_set.fasta:94:30:40:' -c '/root/Desktop/Software_Collection/Assembly/tgicl_linux/bin/tgicl_cluster.psx'
    WAITING for all children to finish before starting last child!
    WAITING for all children to finish!
    <<< --- clustering [My_data_set.fasta] finished at Jan 14 12:06:37 2010
    Running transitive closure command: gzip -cd My_data_set.fasta_cl_tabhits_*.Z | tclust PID=94 OVL=40 OVHANG=30 -o My_data_set.fasta_cl_clusters

    PID=94
    OVHANG=30
    OVL=40
    Total t-clusters: 2889
    Largest cluster has 12681 nodes
    *** all done ***
    The clusters are stored in file 'My_data_set.fasta_cl_clusters'.

    >>> --- ASSEMBLE [My_data_set.fasta] started at Jan 14 12:06:58 2010
    WAITING for all children to finish before starting last child!
    WAITING for all children to finish!

    Process terminated with an error, at step 'ASSEMBLE'!
    tgicl (My_data_set.fasta) encountered an error at step ASSEMBLE
    Working directory was /root/Desktop/Software_Collection/Assembly/tgicl_linux.
    Is there any wrong with within script. tgicl is a binary file and I am unable to read it. what sort of modification required in this script???

    ----------------------------------------------------------------------------------------------------

    @sklages

    I am using 32bit CAP3 on a 64bit machine?

    As i have notified above, the program works fine with small data set and give me error " ran out of memory!" when I am using large data sets.

    I am using 2,50,000 EST sequences in a file.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 05-14-2024, 07:03 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
44 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
54 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
42 views
0 likes
Last Post seqadmin  
Working...
X