Unconfigured Ad

**GenoMax** · 10-01-2014, 03:12 AM

My question was about the version of unix/linux you are using. Are you the "administrator" or is there someone you can ask help from?

What program from the fastx_toolkit are you most interested in? There are other options that may allow you to move on with your analysis. We can offer suggestions once we know what you are trying to do.

**Nanu** · 10-01-2014, 05:32 AM

As administrator and I have to learn the RNA seq data analysis. I am new for NGS analysis. I performed firstly fastqc then I got the failure of k-mer & duplication level and warning for per base sequence content . So , I tried to trim the sequences. I don't know am I on the right track .. Can you suggest me for the steps.

**GenoMax** · 10-01-2014, 06:03 AM

Originally posted by Nanu View Post

As administrator and I have to learn the RNA seq data analysis. I am new for NGS analysis. I performed firstly fastqc then I got the failure of k-mer & duplication level and warning for per base sequence content . So , I tried to trim the sequences. I don't know am I on the right track .. Can you suggest me for the steps.

"Failure" on some aspects of FastQC does not immediately indicate that you have a bad dataset. In some cases duplication of regions is expected/normal.

If you are looking to trim your sequences then try BBDuk. It is simple to use and will not need any compilation/installation. Alignments (BBMap/TopHat/STAR, only one aligner is needed) followed by DESeq2 (R package) is a fairly standard path to take for RNAseq data analysis. Sequence/Aligner indexes/Annotations are available for several model organisms here: http://support.illumina.com/sequenci...e/igenome.html

There is a nice tutorial for TopHat here: http://www.nature.com/nprot/journal/....2012.016.html. Brian has examples for BBMap usage in this thread: http://seqanswers.com/forums/showthread.php?t=41057 DESeq2 vignette provides an excellent introduction: http://www.bioconductor.org/packages...c/beginner.pdf Subread package include FeatureCounts (http://bioinf.wehi.edu.au/subread-package/) which you will need for counting purposes.

**Michael Love** · 10-01-2014, 07:53 AM

Thanks GenoMax. Just a warning: on October 14, 2014, the Beginner's vignette PDF will move to a Bioconductor workflow hosted here:

Bioconductor - Workflows

http://bioconductor.org/help/workflows/

The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

Called something like "RNA-Seq at the gene level"

Writing it up as a workflow allows us to explore other downstream analyses using other packages and not worry about build/check timing.

**Nanu** · 10-07-2014, 09:22 AM

Thanks Genomax,

I tried the NGSQC toolkit for quality control. Please guide me how to convert .fna file to fastq file. On otherside I am installing Tophat..too. I installed Boost then I got msg of 11 failure, 8 skipped and remaining has been upgraded. Now what to do in both cases.?

**Brian Bushnell** · 10-07-2014, 09:57 AM

Nanu,

You can convert *.fna (another name for fasta) to *.fastq with reformat, but note that the resulting output will not have valid scores. Why are you attempting to do that?

And FYI, BBMap is substantially easier to install than Tophat; you just unzip it.

**GenoMax** · 10-07-2014, 10:09 AM

Originally posted by Nanu View Post

Thanks Genomax,

I tried the NGSQC toolkit for quality control. Please guide me how to convert .fna file to fastq file. On otherside I am installing Tophat..too. I installed Boost then I got msg of 11 failure, 8 skipped and remaining has been upgraded. Now what to do in both cases.?

You should get pre-compiled binaries for TopHat. That would simplify things. Binaries are available from the same page where you downloaded the source code.

**Nanu** · 10-07-2014, 08:02 PM

Brushnell,

I have to do the sequence based triming of adapters. Trimmer scripts need .fastq format.

**Nanu** · 10-07-2014, 08:03 PM

Ohk I will try BBMap also

**Nanu** · 10-07-2014, 08:04 PM

I have to do the sequence based triming of adapters. Trimmer scripts need .fastq format. Is any other way to convert the .fna and .qual file to trim without any conversion. If conversion needed then guide me also

**Brian Bushnell** · 10-07-2014, 08:41 PM

Nanu,

Reformat can change fasta + qual into fastq, like this:

reformat.sh in=reads.fna qfin=reads.qual out=reads.fastq

BBDuk can directly trim the fasta (fna) files, or do both at the same time, for example -

bbduk.sh in=reads.fna qfin=reads.qual out=reads.fastq ktrim=r k=25 mink=12 hdist=1 ref=truseq.fa

Adapter files (truseq and nextera) are included with the BBTools package.

**Nanu** · 10-09-2014, 09:23 PM

I would like to thanks to everybody, due to them i completed the previous steps. Now I need more help. I have done indexing of reference genome by bowtie2-build . Now I am executing tophat 2.0.13 SO,
./tophat2 /home/me/Downloads/bowtie2-2.2.3/*.bt2/ /home/me/Downloads/bin/Sample_L1_R1_trim.fastq

Then I found the following error:
[2014-10-10 10:51:30] Beginning TopHat run (v2.0.13)
-----------------------------------------------
[2014-10-10 10:51:30] Checking for Bowtie
Bowtie version: 2.1.0.0
[2014-10-10 10:51:30] Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (/home/himanshu/Downloads/bowtie2-2.2.3/*.bt2/.*.bt2)

**GenoMax** · 10-10-2014, 02:59 AM

Please go through the command line examples on how to run a typical TopHat analysis. Though this article is for TopHat (v.1.0) basic principles are the same for TopHat v.2. (http://www.nature.com/nprot/journal/....2012.016.html).

Hint: On your tophat command line provide path to "prefix" of your index files i.e. if your index files are named human*.bt2, then you need to provide only the "human" part to the command (with full path, in case the files are not in the current directory).

**Nanu** · 10-13-2014, 01:29 AM

Dear Genomax,
When i mentioned the --prefix to give the path the it was showing the following:
./tophat --prefix= /home/me/Downloads/bowtie2-2.2.3/bowtei/ /home/me/Downloads/bin/Sample_L1_R1_trim.fastq
tophat: option --prefix not recognized
for detailed help see http://tophat.cbcb.umd.edu/manual.html

then what should i do?

**GenoMax** · 10-13-2014, 03:42 AM

Originally posted by Nanu View Post

When I use the command reformat.sh in bbtools package I am getting the following error::
java -ea -Xmx200m -cp /home/himanshu/Downloads/me2/bbmap/current/ jgi.ReformatReads -in=reads.fna qfin=reads.qual out=reads.fasta
Executing jgi.ReformatReads [-in=reads.fna, qfin=reads.qual, out=reads.fasta]

Input is being processed as unpaired
Exception in thread "Thread-1" java.lang.AssertionError
at stream.FastaQualReadInputStream3.makeRead(FastaQualReadInputStream3.java:257)
at stream.FastaQualReadInputStream3.toReadList(FastaQualReadInputStream3.java:147)
at stream.FastaQualReadInputStream3.toReads(FastaQualReadInputStream3.java:113)
at stream.FastaQualReadInputStream3.fillBuffer(FastaQualReadInputStream3.java:97)
at stream.FastaQualReadInputStream3.hasMore(FastaQualReadInputStream3.java:56)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:745)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:737)

Please help me

Himanshu: Don't post questions in a thread that is originally about a completely different topic. This is not going to help you get an answer since your post will not be visible to someone who can answer the question.

e.g. This question would be more appropriate in the BBTools thread (search the forum and find the thread).

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 34 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 97 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 117 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 112 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News