Free & Open Environment for RNA-seq analysis: Galaxy (http://usegalaxy.org)

This is a sticky topic.

suparmin replied

11-07-2016, 01:02 AM
Hello everyone, my name is andrean, i'm starting to do analysis my RNA sequencing, i want to make a heatmap, and GO analysis, could you tell me how to do using R or another open source program, thank you
Leave a comment:
Zapages replied

12-06-2014, 01:16 PM
I posted a partial solution on iplant's community forum to concerning on how to view Cuffdiff output from Galaxy to cummerbund: http://ask.iplantcollaborative.org/q...-gff3gtf-file/

I hope this helps everyone in the future.
Leave a comment:
Peppe replied

11-28-2012, 03:36 PM
Thanks Jeremy.
I will use the Galaxy user community: http://user.list.galaxyproject.org/
Leave a comment:
jgoecks replied

11-28-2012, 03:24 PM
@Peppe re: JGI gene annotation

No need to run that workflow on the annotation in your case. Just make sure that your gene annotation's contig names (column 1) match the names in your genome's FASTA file.

J.
Leave a comment:
jgoecks replied

11-28-2012, 03:22 PM
@Peppe

Cummerbund is the preferred option for visualizing Cuffdiff output, but it's not yet integrated into Galaxy.

However, there is a scatterplot that's handy for visualizing FPKM data for different conditions: click on the visualize icon at the bottom of a Cuffdiff expression dataset and plot column 10 (sample 1 FPKM) vs. column 14 (sample 2 FPKM). And, yes, you may want to filter for only significantly expressed genes/transcripts (using the Filtering tool).

Finally, these questions are best asked on the galaxy-user mailing list, where you can communicate with the Galaxy user community: http://user.list.galaxyproject.org/

Good luck,
J.
Leave a comment:
Peppe replied

11-26-2012, 04:30 PM
As you guys can see in the previous messages, I am new in this field. I am using the public Galaxy server, and for the first time I run sample with TopHat-Cufflink-Cuffcompare-Cuffdiff. Can anyone tell me how to get the data generated by Cuffdiff? Is there any tool in Galaxy for visualizing them? I am interested in gene/transcript differential expression testing. Should I select only the genes with significance YES?
Thank you
Leave a comment:
Peppe replied

11-19-2012, 09:34 PM
Hello,
I tried to run cufflinks after TopHat mapping and as a genome reference annotation it requires a file in gtf format. I downloaded the genome annotation from the JGI website (I am working with a fungus whose genome was sequenced in 2010). The file annotation that I have is in gtf, but when I extract it it becomes gff, a format not supported by cufflinks in Galaxy. I converted my file with the workflow found at https://main.g2.bx.psu.edu/u/jeremy/...with-cufflinks. The generated genome annotation is in the gtf format (the only difference that I see is in the first column, scaffold become chrscaffold), but when I run cufflinks all the output were 0. Now I am running again cufflinks without using the annotated genome to see if there is any difference in the results. I was wondering if the conversion that I did is correct, if anyone knows if there is another way to do it, or if I should email JGI people to see if they have a GTF format for the genome annotation I need. I don't think I can go ahead with the analysis without a genome annotation in GTF.
Thanks
Leave a comment:
Peppe replied

11-19-2012, 08:32 PM
Thanks jgoecks,
Yes, I am using the public Galaxy server.
As you said the job took a while to start, but after that it was quick enough. Now there is an update of the server so I have to wait. After that I will go ahead with Cufflinks,
Cuffcompare, Cuffmerge and Cuffdiff. Still not sure what it comes out and how to handle the data, but I will figure out. This is the first time that I am going through RNAseq analysis and having a forum like this is very helpful.
Thanks a lot,
Peppe
Leave a comment:
jgoecks replied

11-19-2012, 10:40 AM
@Peppe

I assume you're using the public Galaxy server at https://main.g2.bx.psu.edu/ , yes?

If so:

(a) Galaxy will work fine on Windows, though you'll want to you Firefox or Chrome as your Web browser going forward so that you can use all of Galaxy's functionality.

(b) The server is very busy right now, so it may take a couple days for your job to start. Do not restart the job or it will go to the end of the wait list. Once your job starts, it should go quickly (4-8 hours is a good estimate) because your genome is small.

Best,
J.
Leave a comment:
Peppe replied

11-18-2012, 11:22 AM
Hi all,
I am new in the forum and also in the RNA seq analysis field.
I just a got the results of my RNA sequencing and I am trying to map my reads using tophat on galaxy. I am working with Windows 7. After the FastQC analysis, I converted my reads with FASTQ Groomer and then I run tophat. It has been 2 days and the process hasn't started yet.

Does tophat (in galaxy) run on windows7?
Usually how long does it take a mapping analysis (about 20 Mb the size of the genome reference)?

Thanks
Leave a comment:
jgoecks replied

08-17-2012, 08:41 AM
Your data is almost certainly not solexa format; most newer Illumina data is already fastqsanger, in which case the groomer is not needed.

See the Wikipedia entry for FASTQ for more details:

FASTQ format - Wikipedia

http://en.wikipedia.org/wiki/FASTQ_format

You should be able to look at the first few reads of your datasets to determine the FASTQ format.

Best,
J.
Leave a comment:
weijenc replied

08-16-2012, 07:35 PM
Problem in grooming

Thanks for the reply to my previous post.

I have been trying to work with the paired-end dataset (SRR131208, two files). After grooming (solexa to fastq sanger), however, quality values are between 5 and 0. Did I do something wrong?

Thanks,

WJ
Leave a comment:
jgoecks replied

08-13-2012, 01:59 PM
Trimming PE reads in Galaxy

@weijenc

My suggestion for trimming paired-end reads in Galaxy is:

(1) Join them using the FASTQ joiner;
(2) Filter them using the Filter FASTQ tool;
(3) Split them using the FASTQ splitter.

Best,
J.
Leave a comment:
weijenc replied

08-13-2012, 05:02 AM
Trimming Paired-End Data

Hello,

So if I use quality value < 20 to trim my Illumina dataset, which contains paired-end 100 bp sequencing reads, would both reads on the same pair be removed should one of them have a base quality < 20? What I worry is when I use the trimmed dataset to perform de novo assembly, would any program say that the dataset is not paired-end if both reads are not removed at the same time?

Thanks,

WJ
Leave a comment:
SilviaBCE replied

07-16-2012, 08:19 AM
Nucleotide bias in a specific position of the reads- Galaxy analysis

Hi, I'm analyzing my small-RNA-seq data (Illumina 1.9 quality score) and I'm using galaxy to make the preliminary qc tasks. I find it a great and easy tool! I'm here to ask you how can I interpretate a graph:I'm talking about the nucleotide distribution chart after the sample grooming and the 3' adapter trimming. I attach it here so anybody can see it. Up to now I've loaded two samples in galaxy and they both give me this kind of bias at the 3rd nucleotide of the reads. What does it mean? would you suggest to eliminate all those reads which contain the "N" in the 3rd position?
Any suggestion would be appreciated! Thanks a lot.
Attached Files

Galaxy29-[Draw_nucleotides_distribution_chart_on_sample2_after_trimming].png (13.5 KB, 21 views)
Leave a comment:

Previous 1 2 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News