Seqanswers Leaderboard Ad

**kevinrue** · 06-20-2013, 01:45 AM

Examples of STAR usage in documentation?

Dear Alex and Shawn,

Could you please try to run 2-3 jobs (without killing your server ) pausing for 10sec between them, and send me the Log.out outputs for each job.

Our server is slightly busy at the moment (Nicolas Nalpas si pulling all the blanket for himself

), but I'll try asap.

It does make sense that jobs submitted simultaneously can't see the genome being loaded into shared memory (we have a looped script submitting a bunch of STAR jobs all within a few seconds from each other). For the record, we recently submitted 10 jobs in such a loop, and we went over our 256GB RAM + 4GB swap, which slowed the server down.

I think the documentation should include some examples because the explanation is a little confusing.

I support this idea, as we previously also had weird experiences with the outFilter(NminMatch) and (others) parameters which seem to filter on a number of consecutive matches rather than total number of matches for instance. I'd rather have detailed description (or maybe a short and a more detailed) of each option than a doubt which requires additional testing and guessing which impeges on my real project time.

Regarding the original point, example usage would be appreciated too. Maybe users could actually participate to such an effort, as we could rapidly gather a diversity of applications along with the combination of options we succesfully used? If so, we'd need some other place than this thread to share our commands.

Kevin

**ymc** · 08-27-2013, 09:48 PM

Are there any plan to make STAR works for fusion genes? The existing tools are too slow for me...

**alexdobin** · 08-28-2013, 11:28 AM

Originally posted by ymc View Post

Are there any plan to make STAR works for fusion genes? The existing tools are too slow for me...

STAR can detect chimeric alignments both "spanning" and "encompassing" chimeric junctions. However, you would need to do all the post-processing: filtering alignments, collapsing the chimeric junctions, annotating fused genes. There is some discussion about it on the STAR forum: https://groups.google.com/d/msg/rna-...U/yxj5C8LaovIJ

**ymc** · 09-04-2013, 06:13 PM

Originally posted by alexdobin View Post

STAR can detect chimeric alignments both "spanning" and "encompassing" chimeric junctions. However, you would need to do all the post-processing: filtering alignments, collapsing the chimeric junctions, annotating fused genes. There is some discussion about it on the STAR forum: https://groups.google.com/d/msg/rna-...U/yxj5C8LaovIJ

Thanks for your reply. I think I can count spanning chimeric junctions from Chimeric.out.junction. How can I count "encompassing"? Which output file should I look at?

**alexdobin** · 09-05-2013, 02:07 PM

Originally posted by ymc View Post

Thanks for your reply. I think I can count spanning chimeric junctions from Chimeric.out.junction. How can I count "encompassing"? Which output file should I look at?

The Chimeric.out.junction file contains the encompassing junctions as well. They are marked with -1 in column 7 (junction type). Of course, to assign the encompassing reads to a chimeric junction, you have to know the coordinates of the junction, or somehow cluster the inner ends of the encompassing mates.

**wildtypegoose** · 09-13-2013, 08:06 AM

I am unable to get STAR use the SHARED MEMORY option between different instances. I followed the instructions given at

STAR: ultrafast universal RNA-seq aligner - SEQanswers

http://seqanswers.com/forums/showpost.php?p=107990&postcount=53

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

I am trying to run two instances (with delays of at least 2-3 mins), but each individual instance seems to allocate its own memory

$top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5291 xxx 20 0 26.6g 25g 25g R 100 20.6 17:37.64 STAR
5299 xxx 20 0 26.6g 25g 25g R 100 20.4 13:18.87 STAR

What else can I try?

STAR Version: STAR_2.3.0e.Linux_x86_64

$ uname -a
Linux XXX 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux

$ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x17000006 2588672 xxx 666 28271287966 2

------ Semaphore Arrays --------
key semid owner perms nsems

------ Message Queues --------
key msqid owner perms used-bytes messages

**ymc** · 09-15-2013, 05:39 AM

Can I use STAR's sam output to call SNP? If so, how?

**dpryan** · 09-15-2013, 05:53 AM

Originally posted by ymc View Post

Can I use STAR's sam output to call SNP? If so, how?

Just search this site for "RNAseq SNP" for a plethora of examples, like this.

**alexdobin** · 09-16-2013, 07:00 AM

Originally posted by wildtypegoose View Post

I am unable to get STAR use the SHARED MEMORY option between different instances. I followed the instructions given at

STAR: ultrafast universal RNA-seq aligner - SEQanswers

http://seqanswers.com/forums/showpost.php?p=107990&postcount=53

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

I am trying to run two instances (with delays of at least 2-3 mins), but each individual instance seems to allocate its own memory

$top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5291 xxx 20 0 26.6g 25g 25g R 100 20.6 17:37.64 STAR
5299 xxx 20 0 26.6g 25g 25g R 100 20.4 13:18.87 STAR

What else can I try?

It seems to me that the shared memory is working fine - since ipcs shows two instances attached to the same shared memory piece. What you see in 'top' is memory usage per process - and 25GB out of 26.6GB are shared for each process. To test it you can try to run >5 jobs at the same time - they would not be able to run without sharing memory.

**wildtypegoose** · 09-17-2013, 06:53 AM

Originally posted by alexdobin View Post

It seems to me that the shared memory is working fine - since ipcs shows two instances attached to the same shared memory piece. What you see in 'top' is memory usage per process - and 25GB out of 26.6GB are shared for each process. To test it you can try to run >5 jobs at the same time - they would not be able to run without sharing memory.

The global "used" memory (as reported by top) increases linearly if I run another STAR process, which made me think that probably the shared mem option is not working correctly. I'll try your suggestion of running >5 jobs once the server is free.

Thanks for your input!

**alexdobin** · 09-18-2013, 08:30 AM

Originally posted by wildtypegoose View Post

The global "used" memory (as reported by top) increases linearly if I run another STAR process, which made me think that probably the shared mem option is not working correctly. I'll try your suggestion of running >5 jobs once the server is free.

Thanks for your input!

STAR will use a 1-2 GB of memory per process for temporary storage and I/O buffers, however, the ~25GB of genome files are shared. The "used" memory reported by top includes "cached", and it's hard to determine how much physical RAM the process is actually using.

**Nino** · 09-19-2013, 07:38 AM

Hello

I am happy with STAR but not too happy with the MQ scores it is difficult to filter out reads based on their mapping quality score when you only have 255,3,2,1,0 (I believe 0 is one of them I forget). Does anyone know of any program which can convert these 5 or 4 values based on the CIGAR information of the read to phred scale values?

Thanks,
Nino

**dpryan** · 09-19-2013, 08:09 AM

Originally posted by Nino View Post

Hello

I am happy with STAR but not too happy with the MQ scores it is difficult to filter out reads based on their mapping quality score when you only have 255,3,2,1,0 (I believe 0 is one of them I forget). Does anyone know of any program which can convert these 5 or 4 values based on the CIGAR information of the read to phred scale values?

Thanks,
Nino

That turns out to be a surprisingly difficult thing to do, as you often end up needing to realign everything so that you know how many second-best alignments there are and what their score is (unless one of STAR's more verbose output modes provides this).

**Nino** · 09-24-2013, 06:45 AM

Hey Devon,

Its turns it is not difficult since a group of individual from Case Western Reserve University, Cleveland, OH published a paper on a program they developed called LoQuM which does exactly what I wanted. I have not tried the program yet but here is the title of article if you would like to read if yourself

"Accurate estimation of short read mapping quality for next-generation genome sequencing"

Thanks,
Nino

**wildtypegoose** · 09-24-2013, 07:19 AM

Originally posted by alexdobin View Post

STAR will use a 1-2 GB of memory per process for temporary storage and I/O buffers, however, the ~25GB of genome files are shared. The "used" memory reported by top includes "cached", and it's hard to determine how much physical RAM the process is actually using.

Hi Alex,
You are right: I was able to run 7 star jobs simultaneously on our server with 128GB of memory. I used the LoadAndExit option of genomeLoad flag to first load the genome in shared memory, and then used the LoadAndRemove for all the simultaneous STAR jobs.

Although the processes went well on memory usage, but I noticed that server at times became irresponsible due to a lot of I/O (as shown by D state for many of the STAR processes in "top" output). Any suggestion/s to avoid this bottleneck?

Thanks a lot!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 33 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News