Unconfigured Ad

**westerman** · 08-21-2013, 10:49 AM

I would use a tool designed to put RNAseq reads together. It has a been a while since I used Velvet but as far as I know it is designed to assemble genomes not transcripts. My favorite RNAseq tool is 'Trinity'.

The above assumes that your reads are from transcripts and not from the entire genome.

I know that the above advise does not answer your specific questions. However in case you did start down a poor path then I wanted to concentrate on correcting that instead of specifics. I suppose I could answer #1 -- trimming. Seems ok. Not sure why you want to trim off the end of the sequence after quality trimming but it won't hurt.

**nareshvasani** · 08-21-2013, 12:14 PM

Hi Westerman,

Hi,

You are right, I have transcripts read. I forgot to mention I have also used oases: it is post assembly processor for velvet, it work as transcriptome assembler.

I trimmed some base from end to improve per base GC content and per base sequence content.
If you don't mind can you please explain me in detail about:
fastq_qulaity trimmer -Q33 -t 20 -l 50 -i -o
upto my understanding it remove nucleotides having quality score less lower than 20 from the ends of the read. Furthermore, any trimmed reads having length less than 50 nt are discarded altogether.

Trinity is also good for transcriptome assembly. But I have never used that.
Can you please help with parameter for trinity command line.

Trinity.pl -SeqType Fq -min_contig_length 150 -JM 10G -single inputfilename -CPU 2 -output output_filename

Which other option do i need to consider for running Trinity like butterfly, inchworm, kmer and Chrysalis, etc

Thanks for your input.
I would really appreciate your feedback

Naresh

Originally posted by westerman View Post

I would use a tool designed to put RNAseq reads together. It has a been a while since I used Velvet but as far as I know it is designed to assemble genomes not transcripts. My favorite RNAseq tool is 'Trinity'.

The above assumes that your reads are from transcripts and not from the entire genome.

I know that the above advise does not answer your specific questions. However in case you did start down a poor path then I wanted to concentrate on correcting that instead of specifics. I suppose I could answer #1 -- trimming. Seems ok. Not sure why you want to trim off the end of the sequence after quality trimming but it won't hurt.

**westerman** · 08-21-2013, 12:30 PM

Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

**nareshvasani** · 08-21-2013, 12:37 PM

Hi Westerman,

Hi,

Thanks for your prompt reply.
I really appreciate your suggestion.
Do you think if I put some more input option for butterfly, inchworm, kmer and Chrysalis, it will give me better contig file?

Thanks,
Naresh

Originally posted by westerman View Post

Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

**westerman** · 08-21-2013, 12:46 PM

No. The options to inchworm and chrysalis are really performance related. Butterfly has some non-performance related options but I would stick with the defaults unless you get something that seems weird. Really the only useful extra non-performance option is ' --jaccard_clip' which is used on high-gene density genomes.

**nareshvasani** · 08-21-2013, 12:51 PM

Hi,

Thanks a lot.

Naresh

Originally posted by westerman View Post

No. The options to inchworm and chrysalis are really performance related. Butterfly has some non-performance related options but I would stick with the defaults unless you get something that seems weird. Really the only useful extra non-performance option is ' --jaccard_clip' which is used on high-gene density genomes.

**Cofactor Genomics** · 08-22-2013, 05:37 AM

One metric you can look at to assess your assembly is the percentage of reads that align back to your transcriptome assembly.

**nareshvasani** · 08-22-2013, 06:01 AM

Hi Cofactor,

HI,

Can you please ellobarate how can I perform that?

Thanks,
Naresh

Originally posted by Cofactor Genomics View Post

One metric you can look at to assess your assembly is the percentage of reads that align back to your transcriptome assembly.

**Cofactor Genomics** · 08-22-2013, 06:12 AM

Well, your newly formed transcriptome assembly is your reference, the raw read data used to generate the assembly is your data and you treat it like a RNA-seq project. In this manner, you align the raw data (trimming does not matter here, this is just a QA check) to the assembly and divide the number of reads aligning to the assembly by the total number of reads that went into the assembly. This is just a rough check and one could argue that you will miss things, however it is good for a rough check.

From these alignments, you may find some surprising results in that the percentage of reads are pretty low. Non-transcriptome assemblers do not like to see large differences in coverage in an assembly, assuming these are repetitive areas that are piling, but this type of data is inherent with RNA data.

Did you perform any manipulations during library prep to treat the RNA for the assembly process, like double-stranded nuclease treatment (to compress the dynamic range of the sample)? This can greatly help an assembly if one is not to heavy handed in the treatment.

It is hard to tell you what percentages are good and bad since I am not sure how the material was treated prior to sequencing or what your goals are for the assembly.

Hope this helps.

Jon Armstrong

**nareshvasani** · 08-22-2013, 07:07 AM

Hi,
Thanks for your prompt reply.
I didn't perform any manipulation during library prep.

Thanks in advance,
Naresh

Originally posted by Cofactor Genomics View Post

Well, your newly formed transcriptome assembly is your reference, the raw read data used to generate the assembly is your data and you treat it like a RNA-seq project. In this manner, you align the raw data (trimming does not matter here, this is just a QA check) to the assembly and divide the number of reads aligning to the assembly by the total number of reads that went into the assembly. This is just a rough check and one could argue that you will miss things, however it is good for a rough check.

From these alignments, you may find some surprising results in that the percentage of reads are pretty low. Non-transcriptome assemblers do not like to see large differences in coverage in an assembly, assuming these are repetitive areas that are piling, but this type of data is inherent with RNA data.

Did you perform any manipulations during library prep to treat the RNA for the assembly process, like double-stranded nuclease treatment (to compress the dynamic range of the sample)? This can greatly help an assembly if one is not to heavy handed in the treatment.

It is hard to tell you what percentages are good and bad since I am not sure how the material was treated prior to sequencing or what your goals are for the assembly.

Hope this helps.

Jon Armstrong

**nareshvasani** · 08-22-2013, 07:21 AM

Hi westerman,

Hi,

I am trying to run read alignment of my fastq file with Trinity.fa file using trinity's script that is alignreads.pl

I used below cmd:
#### /bin/util/alignReads.pl -seqType fq -single inputfile_name -target Trinity.fasta -aligner bowtie2 -p 4 -retain_intermediate_files -num_top_hits 20 -output align_bowtie_output

but i am getting below error:
Must specify target_db and it must exist at that location at /bin/util/alignReads.pl line 180

I don't know what does that mean, as I am good with reading script.

Hope you can help me out.
Thanks,
Naresh

Originally posted by westerman View Post

Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

**westerman** · 08-22-2013, 12:13 PM

Well, off-hand I would say that you need to give the whole path to the Trinity.fasta file. I suspect that it is not located in the directory in which you are located.

In these 'file not found' cases it is always helpful to the rest of us if you can include the results of:

pwd

and a

ls -l

**nareshvasani** · 08-22-2013, 12:24 PM

Hi westerman,

Hi westerman,

Thanks for you reply.
fullpath was mising in my cmd line.

Above cmd worked but with bowtie not with bowtie2.

With below cmd:
/bin/util/alignReads.pl --seqType fq --single CombineIonXpressRNA_010_NareshPool_Chip1_2_WT2_fastxtrimmer_from_quality_trimmer.fastq --target /media/DATAPART3/Combine_Files/Velvetoptimiser_27_37/trinity_output/Trinity.fasta --aligner bowtie2 --retain_intermediate_files --num_top_hits 20 --output align_bowtie_output1

Following error:
which: no tophat2 in (/root/perl5/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/root/Trinity/util/../trinity-plugins/rsem/sam/)
Error, path to required tophat2 cannot be found at /bin/util/alignReads.pl line 234.

#If I used bowtie instead of bowtie 2, it work fine with problem i.e. sam file has no header.

Hope you can help me out.
Thanks in advance.
Naresh

Originally posted by westerman View Post

Well, off-hand I would say that you need to give the whole path to the Trinity.fasta file. I suspect that it is not located in the directory in which you are located.

In these 'file not found' cases it is always helpful to the rest of us if you can include the results of:

pwd

and a

ls -l

**westerman** · 08-23-2013, 07:45 AM

I am surprised that Trinity would be looking for tophat2. However bowtie2 is closely associated with tophat2. I suggest installing tophat2. You may never need it but that should be the way to get Trinity to use bowtie2.

Topics	Statistics	Last Post
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 108 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 125 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM

Unconfigured Ad

Downstream RNA-seq analysis without reference genome

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News