Seqanswers Leaderboard Ad

**Ka123$** · 09-30-2009, 03:31 PM

Hi Anthony,
so here is what I am doing. We have decided to stick with the GERALD files to convert it to wig.......(PI's order !)
I checked for unaligned files and none were there.
I have a .export file with a s_#_export.txt
java -Xmx2G -jar SeparateReads.jar elandext 7_XXXXXX_GERALD-YYYY-MM-DD.export G_sep_7
Version: Initializing class SeparateReads $Revision: 1082 $
Version: Initializing class Generic_AlignRead_Iterator $Revision: 1318 $
Version: Initializing class Log_Buffer $Revision: 1145 $
Version: Initializing class ElandExtIterator $Revision: 832 $
Exception in thread "main" java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at src.lib.ioInterfaces.ElandExtIterator.next(ElandExtIterator.java:180)
at src.lib.ioInterfaces.ElandExtIterator.next(ElandExtIterator.java:20)
at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103)
at src.fileUtilities.SeparateReads.main(SeparateReads.java:69)
^[[A

It looks like that GERALD gives out a .txt file . How can I specify what type of aligner is gerald? If I did elandext or eland_extended it does not work......

is there a way to directly convert a .txt from solexa export files to .wig in findpeaks?

**apfejes** · 09-30-2009, 03:40 PM

Hi Ka123$,

Thanks for the detailed report! I've managed to re-create the problem by parsing a data set that is similar. I observed that the iterator crashes on reads marked with "QC", so I've modified the code in order to reject those reads.

I can do two things for you. The first is that I can compile the code for you and send you the latest version via email. The second is that I can check in the code changes so that you can check it out and compile it yourself. Either option is open.

Thanks again for the very helpful bug report!

Anthony

Edit: The code has been checked in to the repository, if you're interested in building from scratch.

**Ka123$** · 09-30-2009, 03:59 PM

Thanks so much anthony ! If you could compile and email me that will be great!!!! I appreciate it so much!.......

**apfejes** · 09-30-2009, 07:15 PM

Hi Ka123$,

I'm sorry - I can't seem to find your email address. Could you send it to me again? I'll package up a copy for you in the morning.

Anthony

**Ka123$** · 10-09-2009, 10:45 AM

Hi apfejes,
I had sent you my email ID earlier last week. I was wondering if you got it or not....Please can you check again. I am sending you a email with this thread and you can reply to me on that....Thanks

**apfejes** · 10-09-2009, 11:18 AM

Hi Ka123$,

I emailed it to you last week. If it didn't arrive, it may be that it was too large. Can you check on the maximum email attachment size your email can accept? The attachment was 10.6Mb, which may have been to large.

If that's the case, please let me know, and I'll arrange to host it somewhere for you.

Anthony

**eslondon** · 10-17-2009, 01:43 AM

Same log problem, probably silly but still...

When using Separatereads.jar I have no issues if I use only 1 input file, all works fine. If I try to use it in the way described in the example, i.e. using the asterisk to provide it as input several input files in one directory, it decides that it should try to write the output into one of the inputs...

Here is the command line:
java -jar ~/programs/VancouverShortRead/fp4/SeparateReads.jar bowtie /data/bioinfo/302KC/*.map /data/bioinfo/Analysis/mapping/brain/

And here it the output:
Error: Coundn't create log file : /data/bioinfo/302KC/HCT449_brain_s_2_sequence.fastq.map/SeparateReads.log

Any tips?

thanks

Elia

**eslondon** · 10-17-2009, 01:47 AM

Hmmm... must be something silly that I will regret having posted... I have the same problem with SortFiles as well... rather than taking in all *.gz, it takes the first one, and assumes the second one is the location for the log file.... could it be a shell/environment issue?

Update: fails also without using asterisk... basically it allows only one input file, and takes the 2nd input file as the output directory

Elia

**apfejes** · 10-17-2009, 12:57 PM

Hi Elia,

The short answer is that you don't need to run separateReads/SortReads on map files, as the reads they contain are already sorted by chromosome and start position. Of course, if you're trying to do something other than run FindPeaks with them, that's a different story.

Edit: I should probably also add that it's not a good idea to try. These two particular utilities were intended only for use with text format files - not pre-sorted binary files. I've never tested it out on a .map file.

Anthony

**greggrant** · 10-27-2009, 04:12 AM

Thanks for this list, that's really awesome. What do people think is the best way at this point to map approximately 5 million 100 bp reads to a transcriptome? I'm looking for alignment allowing (a specified number of) mismatches but no gaps. Thanks again for this list!

**dan** · 10-27-2009, 05:28 AM

Originally posted by greggrant View Post

Thanks for this list, that's really awesome. What do people think is the best way at this point to map approximately 5 million 100 bp reads to a transcriptome? I'm looking for alignment allowing (a specified number of) mismatches but no gaps. Thanks again for this list!

I'd map against the genome (you never know) using bowtie or SOAP.

You can look them up here:

SEQanswers

http://seqanswers.com/wiki/Software

**greggrant** · 10-27-2009, 06:22 AM

Originally posted by dan View Post

I'd map against the genome (you never know) using bowtie or SOAP.

You can look them up here:

http://seqanswers.com/wiki/Software

Those options won't find things that map across exon/exon junctions. I need something that can map ungapped to the transcriptome, probably BLAST will do the trick, bu there should be something faster.

**apfejes** · 10-27-2009, 07:11 AM

We use a database of all predicted/potential exon/exon junctions in addition to the genome, and then use maq/bwa - it seems to do very well.

**ewilbanks** · 10-27-2009, 09:33 AM

Try TopHat http://tophat.cbcb.umd.edu/

It uses Bowtie to map reads and analyzes the mapping results to identify splice junctions between exons.

**Xi Wang** · 10-30-2009, 12:39 AM

Hi all,

Have you noticed a review on ChIP-seq and RNA-seq computational studies? It mentioned and summarized some available tools on ChIP-seq and RNA-seq data processing.

http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.1371.html

Review
Nature Methods 6, S22 - S32 (2009)
doi:10.1038/nmeth.1371
Computation for ChIP-seq and RNA-seq studies
Shirley Pepke1, Barbara Wold2 & Ali Mortazavi2

Best wishes,
Xi

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News