Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
This topic is closed.
X
This is a sticky topic.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • apfejes
    replied
    Hi kmcarr - thanks for the clarification. I was under the impression that Gerald was simply one step in the process, rather than a wrapper around the Eland calls. It's getting harder and harder to keep on top of all of the different aligner formats and pipelines.

    For the record, I rarely use Eland output of any form myself. We mainly use Maq here and I expect we'll be moving to SAM/BAM based formats in the future.

    Leave a comment:


  • kmcarr
    replied
    Originally posted by apfejes View Post
    Hi ka123$,
    I should also mention that the "-aligner" format used sets the format and some of the behaviours of FindPeaks. If you've selected "-aligner eland", then FindPeaks expects the files you provide to be in the Eland format. I don't know what format Gerald uses, but I'm certain it's not the same as the output from the Eland aligner.
    Anthony,

    Actually the GERALD output is the appropriate place to look. GERALD.pl is a wrapper script which (among other things) calls the Eland aligner. The output from Eland is then placed in the "GERALD_<DD-MM-YYYY>_<USERNAME>" folder. Included in that output is the s_N_eland_extended.txt, s_N_eland_multi.txt, s_N_export.txt and s_N_sorted.txt. As you stated the s_N_sorted.txt file should be able to be used in FindPeaks directly. (I've never done it myself so I can't speak from experience.)

    After looking at your link above I think the problem may be that Kal needs to specify elandext as the "-aligner" parameter. While the program is still called the "Eland" the standard "eland" invocation is essentially deprecated. The program is now almost always invoked (through GERALD) using "eland_extended".
    Last edited by kmcarr; 09-28-2009, 01:11 PM. Reason: Add bit about eland_extended

    Leave a comment:


  • apfejes
    replied
    Hi ka123$,

    kmcarr is right - Gerald is an intermediate program along the way from the sequencing machine to getting results. It's not an appropriate place to look for files to work with FindPeaks.

    If your problem is with the sorting and pre-processing, you might consider using the s_N_sorted.txt produced by findPeaks. It's pre-sorted, so it should make your life easier.

    I should also mention that the "-aligner" format used sets the format and some of the behaviours of FindPeaks. If you've selected "-aligner eland", then FindPeaks expects the files you provide to be in the Eland format. I don't know what format Gerald uses, but I'm certain it's not the same as the output from the Eland aligner.

    As for the problem you're seeing, I'm not sure why 2.3M reads would cause an out of memory error, however, I suspect that despite allocating 2Gb of RAM, the machine you're using actually has less than that free. (-Xmx2G sets the maximum the application is allowed to use, not the actual amount available.) I've certainly sorted much larger files than that with the SortFiles program, although I do tend to use a machine with more than 2Gb of Ram so I don't see that problem myself.

    I'm happy to try helping, but I think you need to clarify a few things for me. What aligner are you using, and what commands are you using? If we settle on one aligner, I can point you in the right direction as to the work flow you're using, and if I can see the commands you're using, I can check to see if any of the parameters should be changed.

    Cheers,

    Anthony

    Leave a comment:


  • Ka123$
    replied
    Thanks to both kmcarr and apfejes !
    I did belive that GERALD generates the Eland format files. But when I used GERALD files to perform a separate reads according to findpeaks and I used ELAND as an aligner name it gave me an error saying that it was a wrong aligner name.......hence needed a confirmation as to what I thought was actually the correct thing or not.....
    I dont know why it said that?
    Did I have to use GERALD.fa or the export file? not sure....

    Why I needed to use GERALD instead of aligned files?
    Reason being,when I used the findpeaks tool to perform a conversion of my aligned files to wig files , I would need to go through the separate and sort files..... When I perform separate files using bowtie aligned files, I get just one gi|......|.......|.part.bowtie.gz which contains the contigs with each contig having the name gi|.....|.....| etc along with their position w.r.t the reference.

    Why did I get only one gi|........file although I have separated it? if I sorted this either a gz or gunzipped I get memory error
    as whenever I used sort files on this I get memory heap error: at 2300000 lines read.
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.substring(Unknown Source)
    at java.lang.String.subSequence(Unknown Source)
    at java.util.regex.Pattern.split(Unknown Source)
    at java.lang.String.split(Unknown Source)
    at java.lang.String.split(Unknown Source)
    at src.lib.ioInterfaces.BowtieIterator.next(BowtieIterator.java:145)
    at src.lib.ioInterfaces.BowtieIterator.next(BowtieIterator.java:20)
    at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103)
    at src.fileUtilities.SortFiles.main(SortFiles.java:79)

    although I use -Xmx2G........


    So we thought we could use GERALD to separate into indiv chr and then sort on each indv chr instead?????

    ANy suggestions?

    Leave a comment:


  • kmcarr
    replied
    Kal,

    Bustard and GERALD are not files with a format in the sense you are asking. Bustard and GERALD are pipelines for processing Illumina short reads data. They generate many different output files with many different formats.

    The Bustard pipeline performs base calling starting with signal intensity information. The primary output of the Bustard pipeline are qseq files. These files are a format peculiar to Illumina which contain the read ID, base calls and quality scores for each read on a single line as a set of tab separated values. Bustard may output other files (e.g. qval, prb) depending on options supplied when the pipeline is launched.

    GERALD is the pipeline for performing alignments using one of two different aligners supplied with the Pipeline software. The first aligner, PhageAlign is only useful for very small genomes and data sets and is almost never used so I will forego any further mention of it. The primary aligner supplied with the Illumina pipeline is Eland. GERALD calls the Eland aligner and passes it a set of configuration parameters. Eland outputs a number of files which all have similar (but slightly different) formats. Some examples of the files generated by Eland are s_N_eland_extended.txt, s_N_eland_multi.txt (where N = lane number from the Illumina run). These files basically list each read, its sequence and quality scores, where it matches the reference sequence and what mismatches exist between the read and the reference. Which files Eland generates and details of their format will be dependent on the arguments used when invoking Eland. GERALD may also be used to output sequence files in FASTQ format.

    Leave a comment:


  • apfejes
    replied
    Hi Ka123$,

    Gerald and bustard are files produced by the Illumina Pipeline, as far as I know, and neither one should contain useful information about the origin of a fragment. Only output from an aligner can be used in the context of peak finding.

    For a list of formats accepted by FindPeaks, please see the following page:

    Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


    If you're having an error with Eland files, please let me know what it is, and I'll try to fix it.

    Anthony

    Leave a comment:


  • Ka123$
    replied
    If I would directly perform separate reads and sort reads on the GERALD alignment files what type of aligner do I need to specify? GERALD/Eland if specified give me an error on fndpeaks
    Error: Did not recognize aligner type: GERALD/Eland
    Error: Please check that you have not made a spelling mistake when providing the alignment type
    same error if I specify only Eland.....so what type of an aligner is used GERALD files from solexa?

    Leave a comment:


  • Ka123$
    replied
    what kind of formats are BUSTARD and GERALD files from solexa?

    Leave a comment:


  • nathan.genome
    replied
    hello everybody

    hello everybody

    i am working on a resequencing project. i have a reference genome and a set of sanger pairmates from a genotype. i identified a list of structural variations. i want to visualize them. Can i use lookseq ?

    thanks
    nathan

    Leave a comment:


  • Ka123$
    replied
    Thanks a lot I will try all the options you gave me and let u know how it worked for me.

    Leave a comment:


  • apfejes
    replied
    I seem to recall that bowtie is able to produce .map files - which would be pre-sorted and directly readable by FindPeaks without breaking it up into chromosomes. That might be a good first pass to try. (Assuming this is SET data. if it's PET data, you'll need to do the pairing anyhow, so SeparateReads wouldn't have been the right path to take.)

    I suppose I should also mention that running SortReads.jar on .gz bowtie files *should* work. If you could send me the error you're getting, I may be able to track down the reason why it's not working for you.

    And finally, I should probably also mention that bowtie seems to be doing something funny to your chromosome names. I don't use bowtie myself, but someone had previously reported to me that there was an option you can use to get more "sane" chromosome names. I would suggest you take a look - it may help you out downstream.

    Leave a comment:


  • Ka123$
    replied
    sortpeaks

    Yeah sure,
    I had this huge I human seq reads that I aligned using bowtie. This bowtie alignment I need to convert into wig files. So I have been using the separateReads as the first step in converting into wig. This worked fine and I got a gi|22XXXXXX|ref|NT_XXXXXX.12|.bg.bowtie also I have the same with .part.bowtie after I ran the separtereads.
    Now on this file (uncompressed) I ran sortfiles using -Xmx2G memory heap specified. But after some lines it gives me a memory error.
    I tried running sortfiles on the "gz"ed separate reads but did not work. The file was not recognisable or something.

    Is it the bowtie mapped reads that is the problem and so I might need to use GERALD instead directly?
    Or is it the separate reads/sortreads problem?
    Hope this helps. I appreciate any suggestions in this matter.
    I found findpeaks very cool but unfortunately not working for me now....

    Leave a comment:


  • apfejes
    replied
    Hi Ka123,

    There are other ways to do the sort - including several methods you could try from the linux command line. However, I'm really not sure why it's taking so memory. Could you give me a few ideas as to what your work flow is?

    In the meantime, documentation and an example command for SeparateReads can be found here:

    Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.

    Leave a comment:


  • Ka123$
    replied
    Solexa findpeaks

    Using Findpeaks sort reads on bowtie mapped alignment is taking up too much memory......!!!!! So I am trying using the GERALD maps reads directly from solexa to convert to wig files...I believe the solexa GERALD mapped alignments are ELAND format?
    So the aligner type will be -aligner eland, to perform separateReads.jar?
    Any suggestions?

    Leave a comment:


  • Dinny
    replied
    Hi Anthony,
    Thanks again for the advice. Taking the reads directly into .bed would be better. The .map converter in Bowtie needs a library file created in Maq, so it would be easier to limit the number of applications the data goes through...less opportunity to completely jumble it.
    Couldn't see a way to align straight to .map, but I'll look again.
    Dinny

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
59 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
57 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
56 views
0 likes
Last Post seqadmin  
Working...
X