Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • yifangt
    replied
    assembly output format for visualisation

    Another question!
    I want to view my assembly with other software e.g. Tablets, Mauve or OSlay, etc. Is there any way in Ray to convert the output files to ACE, MAQ, SAM or BAM format for those post-assembly programs?
    In the FAQ section of your site there is question about the AMOS format for the output, but I did not do that. Do I have to run the assembly again and have the -amos option on? But unfortunately the AMOS format is not universal for other programs to read.
    I was trying to figure out what the output files are about, but not sure which one I should use for those visualization programs, or which one should be used for perl/shell script for the format converstion.
    Appreciate if you could give me any clue.

    Thanks!

    YT

    Leave a comment:


  • seb567
    replied
    Originally posted by yifangt View Post
    Hello Sebastien:

    Can I ask what is the maximum value that I can set MAXKMERLENGTH for Ray 2.1? So far I can see the largest value ever set is 128, and for Velvet I ever saw is 151. Somewhere I saw "There is an arbitary number that can be set for MAXKMERLENGTH", but could not find the link anymore. Can I confirm the max value for Ray I can set? Thanks a lot!

    YT
    A message in Ray has a maximum size of 4000 bytes and 2 bits are necessary per nucleotide. The maximum is therefore 4000 / 2 = 2000.

    However, read lengths and sequencing errors will be limiting factors here.

    Leave a comment:


  • yifangt
    replied
    maximum kmer length?

    Hello Sebastien:

    Can I ask what is the maximum value that I can set MAXKMERLENGTH for Ray 2.1? So far I can see the largest value ever set is 128, and for Velvet I ever saw is 151. Somewhere I saw "There is an arbitary number that can be set for MAXKMERLENGTH", but could not find the link anymore. Can I confirm the max value for Ray I can set? Thanks a lot!

    YT

    Leave a comment:


  • seb567
    replied
    Originally posted by yaximik View Post
    Its two quad core Xeon E5620 with 96GB memory and nVIDIA NV 300 in the double display mode.
    Therefore, hardware should not be a problem with such a nice computer.

    Is your user experience with Hawkeye or Tablet problematic only with AMOS files generated by Ray or the issue is also occurring with AMOS files generated by other tools ?

    Leave a comment:


  • yaximik
    replied
    When the AMOS file format was implemented, I tested Hawkeye, Tablet, and Bank-transact.
    You can submit a ticket and I will eventually look at that, but this feature has not really changed since it was implemented.
    What is the hardware (memory, processor, video card) on which you are running Hawkeye ?
    Its two quad core Xeon E5620 with 96GB memory and nVIDIA NV 300 in the double display mode.

    Leave a comment:


  • seb567
    replied
    Originally posted by yaximik View Post
    How the average length is calculated?
    I guess after reads are aligned to assembly, correct?
    Yes, but all of this happens in the de Bruijn graph -- there is no aligner in the process.


    But I thought that assembly depends on paired end infomation, so unless I am wrong one has a logical short circuit here - paired reads are distanced based on assembly, which depends on distance between paired reads.
    Yes, it's like a bootstrapping process: distances are sampled from seeds (similar to unitigs), and then the empirical distribution is used to extend longer contigs by matching paired reads to the distribution.

    I like your short circuit.

    *

    It is not that I am maliciously after how algorithm was designed.
    On the contrary, science advances when curious people step in.

    I am trying to guess where such discrepancy between Bioanalyzer and assembler is coming from. Could it be that Bioanalyzer traces for libraries are so misleading, so I have really no idea about size of libraries I am sequencing?
    One hypothesis is that the population of molecules analyzed by the Bioanalyzer is a superset of the molecules that are present on the sequencing flow cell after library preparation.

    Or autocalc is misled somehow in library size estimation?
    That may be the case too, but I would be surprised by that.

    Leave a comment:


  • seb567
    replied
    Originally posted by yaximik View Post
    Tried to view AMOS.afg file (37.1 GB) using a couple of programs. Tablet is painfully slow, but it eventually quit reporting error in some line. Hawkeye (AMOS package) successfully imported assembly in bank. and even opened graphic window showing contig 1, but then hung forever and has to be killed.
    Code:
    [yaximik@G5NNJN1 ~]$ hawkeye
    START DATE: Mon Mar 11 11:06:54 2013
    Bank is: /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk
        0%                                            100%
    AFG ..................................................
    Messages read: 175403161
    Objects added: 175403161
    Objects deleted: 0
    Objects replaced: 0
    END DATE:   Mon Mar 11 12:13:09 2013
    Opening /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk... [160.12s]
    Indexing Contigs   .......... [83.11s] 107326772 reads in 1409913 contigs
    Scaffold information not available
    Mates not available:WHAT: Could not open bank file, /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk/FRG.ifo, No such file or directory
    LINE: 1264
    FILE: Bank_AMOS.cc
    
    Features not available
    Initialize Display .Loading AssemblyStats...[8.95s]
    .Loading Features...      [0.01s]
    .Loading Libraries...     [0.00s]
    .Loading Scaffolds....Loading Contigs...       [186.21s]
    ....Loading NCharts...       [21.83s]
    . [217.01s]
    Loading Contig 1... [0.05s] 109076 reads
    Loading reads...         [343.52s]
    Total Load Time: [803.92s]
    Loading mates ..................................................
    inserts: 108933 mated: 0 matelisted: 0 unmated: 108933 happy: 0 unhappy: 0
    Paint: coverage contigs insetcovfeat readcovfeat features inserts
    width: 12457 swidth: 778 height: 26357..
    Killed
    [yaximik@G5NNJN1 ~]$
    What viewer can be used to view assembly?
    When the AMOS file format was implemented, I tested Hawkeye, Tablet, and Bank-transact.

    You can submit a ticket and I will eventually look at that, but this feature has not really changed since it was implemented.

    What is the hardware (memory, processor, video card) on which you are running Hawkeye ?

    For visualization, I am working on Ray Cloud Browser.

    Leave a comment:


  • yaximik
    replied
    Quote:
    If not, is it an average fragment length in the library?

    Yes.

    Quote:
    Such as surmised from BioAnalyzer trace, for example?

    Yes, but the BioAnalyzer will also include sequencing adapters in the evaluation whereas these are not included in sequencing reads usually.
    How the average length is calculated? I guess after reads are aligned to assembly, correct? But I thought that assembly depends on paired end infomation, so unless I am wrong one has a logical short circuit here - paired reads are distanced based on assembly, which depends on distance between paired reads.
    It is not that I am maliciously after how algorithm was designed. I am trying to guess where such discrepancy between Bioanalyzer and assembler is coming from. Could it be that Bioanalyzer traces for libraries are so misleading, so I have really no idea about size of libraries I am sequencing? Or autocalc is misled somehow in library size estimation?

    Leave a comment:


  • yaximik
    replied
    Tried to view AMOS.afg file (37.1 GB) using a couple of programs. Tablet is painfully slow, but it eventually quit reporting error in some line. Hawkeye (AMOS package) successfully imported assembly in bank. and even opened graphic window showing contig 1, but then hung forever and has to be killed.
    Code:
    [yaximik@G5NNJN1 ~]$ hawkeye
    START DATE: Mon Mar 11 11:06:54 2013
    Bank is: /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk
        0%                                            100%
    AFG ..................................................
    Messages read: 175403161
    Objects added: 175403161
    Objects deleted: 0
    Objects replaced: 0
    END DATE:   Mon Mar 11 12:13:09 2013
    Opening /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk... [160.12s]
    Indexing Contigs   .......... [83.11s] 107326772 reads in 1409913 contigs
    Scaffold information not available
    Mates not available:WHAT: Could not open bank file, /home/yaximik/AssRefMap/SC/Ray/RayOutput/AMOS.afg.bnk/FRG.ifo, No such file or directory
    LINE: 1264
    FILE: Bank_AMOS.cc
    
    Features not available
    Initialize Display .Loading AssemblyStats...[8.95s]
    .Loading Features...      [0.01s]
    .Loading Libraries...     [0.00s]
    .Loading Scaffolds....Loading Contigs...       [186.21s]
    ....Loading NCharts...       [21.83s]
    . [217.01s]
    Loading Contig 1... [0.05s] 109076 reads
    Loading reads...         [343.52s]
    Total Load Time: [803.92s]
    Loading mates ..................................................
    inserts: 108933 mated: 0 matelisted: 0 unmated: 108933 happy: 0 unhappy: 0
    Paint: coverage contigs insetcovfeat readcovfeat features inserts
    width: 12457 swidth: 778 height: 26357..
    Killed
    [yaximik@G5NNJN1 ~]$
    What viewer can be used to view assembly?

    Leave a comment:


  • seb567
    replied
    Originally posted by yaximik View Post
    Got to be another reason. The assembly file by minia includes max contig of 16091 nt. Without this dataset, Ray produced assembly with max contig/scaffold of 46428 nt.
    Then the problem is presumably caused by the lack of support for multiline fasta files for reads in Ray.

    Please do submit a ticket if you feel this should be fixed.


    That is puzzling. The combined adaptor length (both sides) is standard at 120 bp, so autocalc is then a way off (600-120=480, but estimated is ~150). Obviously much smaller library size should affect scaffolding. Would that be better to provide real numbers? Also, i guess the narrower distribution should be better, correct? This can be done by refractionation of the library and collecting narrow distribution, say +/-5%.
    You can plot your distributions.

    LibraryStatistics.txt contains averages, but you have all the signal in Library0.txt, Library1.txt. If you are using the git version of Ray, this information is now in LibraryData.xml

    Leave a comment:


  • yaximik
    replied
    The maximum read length is 65536 nucleotides.
    Got to be another reason. The assembly file by minia includes max contig of 16091 nt. Without this dataset, Ray produced assembly with max contig/scaffold of 46428 nt.

    The 600 bp +/- 15% presumably includes adapters that are not in sequencing reads.
    That is puzzling. The combined adaptor length (both sides) is standard at 120 bp, so autocalc is then a way off (600-120=480, but estimated is ~150). Obviously much smaller library size should affect scaffolding. Would that be better to provide real numbers? Also, i guess the narrower distribution should be better, correct? This can be done by refractionation of the library and collecting narrow distribution, say +/-5%.

    Leave a comment:


  • seb567
    replied
    Originally posted by yaximik View Post
    Hi,
    What is the meaning of averageOuterDistance and standardDeviation for paired end files?
    The outer distance is the sum of the gap size, the length of the left read and the length of the right read.

    This is computed for paired reads and mate pairs.


    Is it just average read length in the dataset?
    No.

    If so, then why it is not required for single read file?
    It only applies for pairs.


    If not, is it an average fragment length in the library?
    Yes.

    Such as surmised from BioAnalyzer trace, for example?
    Yes, but the BioAnalyzer will also include sequencing adapters in the evaluation whereas these are not included in sequencing reads usually.


    If so, then default autocalc may give very wrong estimate, could it? For example, one of my paired read runs was done with a library of 600 bp +/- 15%, but during assembly autocalc estimate was something 150 bp - how this can be so much off?
    The 600 bp +/- 15% presumably includes adapters that are not in sequencing reads.

    You can run another application on your data (like ABySS) and you'll see that Ray's right.

    Leave a comment:


  • seb567
    replied
    Originally posted by KirillK View Post
    Hi guys!

    Is there a way to provide a reference genome for Ray?

    cheers,
    KK
    You can provide reference genomes using the -search option.

    Code:
           -search searchDirectory
                  Provides a directory containing fasta files to be searched in the de Bruijn graph.
                  Biological abundances will be written to RayOutput/BiologicalAbundances
                  See Documentation/BiologicalAbundances.txt
    However, this will not be used to aid in the assembly. This option is useful to report biological abundances.

    See this paper for more information.
    Last edited by seb567; 03-11-2013, 05:39 AM. Reason: added Genome Biology reference

    Leave a comment:


  • seb567
    replied
    Originally posted by yaximik View Post
    Hi,

    I tried to run Ray (maxkmer 32) on 2 x quad core RHEl58 with hyper-threading enabled:


    mpiexec -n 16 Ray <Ray.conf> and got the error:
    Code:
    ........
    Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SC-MILLib1-Herc2s10cFr1Fr2run2R1AdQ30.fastq (please wait...)
    [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SC-MILLib1-Herc2s10cFr1Fr2run2R1AdQ30.fastq (please wait...)
    [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run1R1AdQ30.fastq (please wait...)
    [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run1R1AdQ30.fastq (please wait...)
    [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run2R1AdQ30.fastq (please wait...)
    [Loader::load] File: /media/FantomHD/Data/MiSeq/SC/AdQ30/SCPfx3s25cFr3-150-200run2R1AdQ30.fastq (please wait...)
    [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SColdAll.fasta (please wait...)
    [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SColdAll.fasta (please wait...)
    [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SCallSanger.fasta (please wait...)
    [Loader::load] File: /media/FantomHD/AssRefMap/SC/SCold/SCallSanger.fasta (please wait...)
    [Loader::load] File: /home/yaximik/AssRefMap/SC/minia/SCMiSeqAllFGMGPGIGclean_k27.contigs.fasta (please wait...)
    [G5NNJN1:07040] *** Process received signal ***
    [G5NNJN1:07040] Signal: Segmentation fault (11)
    [G5NNJN1:07040] Signal code:  (128)
    [G5NNJN1:07040] Failing at address: (nil)
    --------------------------------------------------------------------------
    mpiexec noticed that process rank 0 with PID 7040 on node G5NNJN1 exited on signal 11 (Segmentation fault).
    The last file loaded was a file with fasta contigs from another assembler (minia). Does this mean contigs from other assemblers cannot be used in Ray?
    The maximum read length is 65536 nucleotides.

    Leave a comment:


  • KirillK
    replied
    Hi guys!

    Is there a way to provide a reference genome for Ray?

    cheers,
    KK

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Exploring the Dynamics of the Tumor Microenvironment
    by seqadmin




    The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
    07-08-2024, 03:19 PM
  • seqadmin
    Exploring Human Diversity Through Large-Scale Omics
    by seqadmin


    In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
    06-25-2024, 06:43 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:20 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-16-2024, 05:49 AM
0 responses
36 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-15-2024, 06:53 AM
0 responses
39 views
0 likes
Last Post seqadmin  
Started by seqadmin, 07-10-2024, 07:30 AM
0 responses
41 views
0 likes
Last Post seqadmin  
Working...
X