Software packages for next gen sequence analysis

This topic is closed.

This is a sticky topic.

greggrant replied

10-27-2009, 06:22 AM
Originally posted by dan View Post

I'd map against the genome (you never know) using bowtie or SOAP.

You can look them up here:

http://seqanswers.com/wiki/Software

Those options won't find things that map across exon/exon junctions. I need something that can map ungapped to the transcriptome, probably BLAST will do the trick, bu there should be something faster.
Leave a comment:
dan replied

10-27-2009, 05:28 AM
Originally posted by greggrant View Post

Thanks for this list, that's really awesome. What do people think is the best way at this point to map approximately 5 million 100 bp reads to a transcriptome? I'm looking for alignment allowing (a specified number of) mismatches but no gaps. Thanks again for this list!

I'd map against the genome (you never know) using bowtie or SOAP.

You can look them up here:

SEQanswers

http://seqanswers.com/wiki/Software
Leave a comment:
greggrant replied

10-27-2009, 04:12 AM
Thanks for this list, that's really awesome. What do people think is the best way at this point to map approximately 5 million 100 bp reads to a transcriptome? I'm looking for alignment allowing (a specified number of) mismatches but no gaps. Thanks again for this list!
Leave a comment:
apfejes replied

10-17-2009, 12:57 PM
Hi Elia,

The short answer is that you don't need to run separateReads/SortReads on map files, as the reads they contain are already sorted by chromosome and start position. Of course, if you're trying to do something other than run FindPeaks with them, that's a different story.

Edit: I should probably also add that it's not a good idea to try. These two particular utilities were intended only for use with text format files - not pre-sorted binary files. I've never tested it out on a .map file.

Anthony

Last edited by apfejes; 10-18-2009, 08:20 AM. Reason: Additional information
Leave a comment:
eslondon replied

10-17-2009, 01:47 AM
Hmmm... must be something silly that I will regret having posted... I have the same problem with SortFiles as well... rather than taking in all *.gz, it takes the first one, and assumes the second one is the location for the log file.... could it be a shell/environment issue?

Update: fails also without using asterisk... basically it allows only one input file, and takes the 2nd input file as the output directory

Elia

Last edited by eslondon; 10-17-2009, 02:09 AM.
Leave a comment:
eslondon replied

10-17-2009, 01:43 AM
Same log problem, probably silly but still...

When using Separatereads.jar I have no issues if I use only 1 input file, all works fine. If I try to use it in the way described in the example, i.e. using the asterisk to provide it as input several input files in one directory, it decides that it should try to write the output into one of the inputs...

Here is the command line:
java -jar ~/programs/VancouverShortRead/fp4/SeparateReads.jar bowtie /data/bioinfo/302KC/*.map /data/bioinfo/Analysis/mapping/brain/

And here it the output:
Error: Coundn't create log file : /data/bioinfo/302KC/HCT449_brain_s_2_sequence.fastq.map/SeparateReads.log

Any tips?

thanks

Elia
Leave a comment:
apfejes replied

10-09-2009, 11:18 AM
Hi Ka123$,

I emailed it to you last week. If it didn't arrive, it may be that it was too large. Can you check on the maximum email attachment size your email can accept? The attachment was 10.6Mb, which may have been to large.

If that's the case, please let me know, and I'll arrange to host it somewhere for you.

Anthony
Leave a comment:
Ka123$ replied

10-09-2009, 10:45 AM
Hi apfejes,
I had sent you my email ID earlier last week. I was wondering if you got it or not....Please can you check again. I am sending you a email with this thread and you can reply to me on that....Thanks
Leave a comment:
apfejes replied

09-30-2009, 07:15 PM
Hi Ka123$,

I'm sorry - I can't seem to find your email address. Could you send it to me again? I'll package up a copy for you in the morning.

Anthony
Leave a comment:
Ka123$ replied

09-30-2009, 03:59 PM
Thanks so much anthony ! If you could compile and email me that will be great!!!! I appreciate it so much!.......
Leave a comment:
apfejes replied

09-30-2009, 03:40 PM
Hi Ka123$,

Thanks for the detailed report! I've managed to re-create the problem by parsing a data set that is similar. I observed that the iterator crashes on reads marked with "QC", so I've modified the code in order to reject those reads.

I can do two things for you. The first is that I can compile the code for you and send you the latest version via email. The second is that I can check in the code changes so that you can check it out and compile it yourself. Either option is open.

Thanks again for the very helpful bug report!

Anthony

Edit: The code has been checked in to the repository, if you're interested in building from scratch.

Last edited by apfejes; 09-30-2009, 03:42 PM.
Leave a comment:
Ka123$ replied

09-30-2009, 03:31 PM
Hi Anthony,
so here is what I am doing. We have decided to stick with the GERALD files to convert it to wig.......(PI's order !)
I checked for unaligned files and none were there.
I have a .export file with a s_#_export.txt
java -Xmx2G -jar SeparateReads.jar elandext 7_XXXXXX_GERALD-YYYY-MM-DD.export G_sep_7
Version: Initializing class SeparateReads $Revision: 1082 $
Version: Initializing class Generic_AlignRead_Iterator $Revision: 1318 $
Version: Initializing class Log_Buffer $Revision: 1145 $
Version: Initializing class ElandExtIterator $Revision: 832 $
Exception in thread "main" java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at src.lib.ioInterfaces.ElandExtIterator.next(ElandExtIterator.java:180)
at src.lib.ioInterfaces.ElandExtIterator.next(ElandExtIterator.java:20)
at src.lib.ioInterfaces.Generic_AlignRead_Iterator.hasNext(Generic_AlignRead_Iterator.java:103)
at src.fileUtilities.SeparateReads.main(SeparateReads.java:69)
^[[A

It looks like that GERALD gives out a .txt file . How can I specify what type of aligner is gerald? If I did elandext or eland_extended it does not work......

is there a way to directly convert a .txt from solexa export files to .wig in findpeaks?
Leave a comment:
apfejes replied

09-30-2009, 02:11 PM
Hi Ka123$,

Once again, it would really help if you tell us what the error is that you're seeing. The most common errors are:

- Trying to write to a directory without permissions
- missing a parameter (FindPeaks won't start without it, and throws and error)
- a parameter is incorrect (FindPeaks won't start with an invalid parameter)

If you tell us what error you've got, I might be able to narrow it down.

EDIT:
Is the error above the same one? I think this is probably a path problem. You're trying to write to a directory called p_7_ger in the directory from which you're launching the jar program. Does that directory already exist?

Anthony
Leave a comment:
Ka123$ replied

09-30-2009, 02:08 PM
can anyone let me know why findpeaks separatereads.jar command cannot create a log file when I use the GERALD aligned files or the bowtie aligned files?
In GERALD aligned files I indicated elandext or eland_extended as the aligner type....?

Bowtie aligned files were giving me problems to run on findpeaks to separate and sort so I am directly converting gerald files to wig files although GERALD is probably not a best choice over bowtie alignment.
Any suggestion
Leave a comment:
Ka123$ replied

09-28-2009, 06:49 PM
First of, thanks so much for all your guidance, from both of you!
I really appreciate it so much!

I previously tried using bowtie aligner. As bowtie aligner gave me only one separatefile.gz and I could not make sense of it.... We reverted to use GERALD alignment directly to separate and sort.........But here are the comands I have used using bowtie aligner:

Secondly I followed bowtie commands to do my alignment .
./bowtie -a -v 2 -f h_X_GERALD.fa h_sap (did I have to use the -chr here???)

I used findpeaks cmds here:
java -jar -SeparateReads.jar elandext p_align_copy p_7_ger

(before I had problems using this for gerald and it said aligner format not recognised,so according to the blog I used elandext
java -jar SeparateReads.ja
r elandext p_align_copy p_7_ger
Error: Couldn't create log file : p_7_ger/SeparateReads.log)

for sort reads previously I have used this cmd:
java -jar Sort* bowtie g_sort_7 p_7_ger/*.bowtie

(although it ran sometime gave me memory problems)
Leave a comment:

Previous 1 2 3 4 5 6 13 16 template Next

Genetic Variation in Immunogenetics and Antibody Diversity

by seqadmin

The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
- Channel: Articles
11-06-2024, 07:24 PM
Choosing Between NGS and qPCR

by seqadmin

Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
- Channel: Articles
10-18-2024, 07:11 AM

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News