Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • nareshvasani
    Member
    • Apr 2013
    • 57

    Downstream RNA-seq analysis without reference genome

    Hello all Seqanswer community Users,

    I am biologist. Learning bioinformatics from scratch. Performing RNA-seq analysis for first time:

    I got fastq files from ion proton instrument, it has single end read, 50-340 sequence length.
    I don't have reference genome.
    Here is how I did my analysis:
    1]Fastqc
    2]Trimmmimg some read using fastx tool:
    a]First used fastq_qulaity trimmer -Q33 -t 20 -l 50 -i -o, because some of sequence has quality less than 18
    b]fastx_trimmer = to trim off few reads from end of seq

    For de novo assembly:
    3]Velveth with kmer 31
    4]velvetg
    5] bowtie-build== to build reference index from contig.fa file created from velvetg
    6] Mapping my fastq read with above build reference index

    So my questions are:
    1] Am i doing trimimg in right manner
    2] On what basis you select parameter of velvetg and velveth
    2] Which kmer value to select
    3] Am I running bowtie in correct manner?
    4] If yes, how do i confirm assembly created using velvet contain preserved input information and it's accuracy.

    Hope you all can help me out.
    Thanks a bunch in advance.

    Naresh
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    I would use a tool designed to put RNAseq reads together. It has a been a while since I used Velvet but as far as I know it is designed to assemble genomes not transcripts. My favorite RNAseq tool is 'Trinity'.

    The above assumes that your reads are from transcripts and not from the entire genome.

    I know that the above advise does not answer your specific questions. However in case you did start down a poor path then I wanted to concentrate on correcting that instead of specifics. I suppose I could answer #1 -- trimming. Seems ok. Not sure why you want to trim off the end of the sequence after quality trimming but it won't hurt.

    Comment

    • nareshvasani
      Member
      • Apr 2013
      • 57

      #3
      Hi Westerman,

      Hi,

      You are right, I have transcripts read. I forgot to mention I have also used oases: it is post assembly processor for velvet, it work as transcriptome assembler.

      I trimmed some base from end to improve per base GC content and per base sequence content.
      If you don't mind can you please explain me in detail about:
      fastq_qulaity trimmer -Q33 -t 20 -l 50 -i -o
      upto my understanding it remove nucleotides having quality score less lower than 20 from the ends of the read. Furthermore, any trimmed reads having length less than 50 nt are discarded altogether.

      Trinity is also good for transcriptome assembly. But I have never used that.
      Can you please help with parameter for trinity command line.

      Trinity.pl -SeqType Fq -min_contig_length 150 -JM 10G -single inputfilename -CPU 2 -output output_filename

      Which other option do i need to consider for running Trinity like butterfly, inchworm, kmer and Chrysalis, etc

      Thanks for your input.
      I would really appreciate your feedback

      Naresh


      Originally posted by westerman View Post
      I would use a tool designed to put RNAseq reads together. It has a been a while since I used Velvet but as far as I know it is designed to assemble genomes not transcripts. My favorite RNAseq tool is 'Trinity'.

      The above assumes that your reads are from transcripts and not from the entire genome.

      I know that the above advise does not answer your specific questions. However in case you did start down a poor path then I wanted to concentrate on correcting that instead of specifics. I suppose I could answer #1 -- trimming. Seems ok. Not sure why you want to trim off the end of the sequence after quality trimming but it won't hurt.

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

        The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

        Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

        Comment

        • nareshvasani
          Member
          • Apr 2013
          • 57

          #5
          Hi Westerman,

          Hi,

          Thanks for your prompt reply.
          I really appreciate your suggestion.
          Do you think if I put some more input option for butterfly, inchworm, kmer and Chrysalis, it will give me better contig file?

          Thanks,
          Naresh


          Originally posted by westerman View Post
          Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

          The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

          Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

          Comment

          • westerman
            Rick Westerman
            • Jun 2008
            • 1104

            #6
            No. The options to inchworm and chrysalis are really performance related. Butterfly has some non-performance related options but I would stick with the defaults unless you get something that seems weird. Really the only useful extra non-performance option is ' --jaccard_clip' which is used on high-gene density genomes.

            Comment

            • nareshvasani
              Member
              • Apr 2013
              • 57

              #7
              Hi,

              Thanks a lot.


              Naresh
              Originally posted by westerman View Post
              No. The options to inchworm and chrysalis are really performance related. Butterfly has some non-performance related options but I would stick with the defaults unless you get something that seems weird. Really the only useful extra non-performance option is ' --jaccard_clip' which is used on high-gene density genomes.

              Comment

              • Cofactor Genomics
                Registered Vendor
                • Jan 2010
                • 52

                #8
                One metric you can look at to assess your assembly is the percentage of reads that align back to your transcriptome assembly.

                Comment

                • nareshvasani
                  Member
                  • Apr 2013
                  • 57

                  #9
                  Hi Cofactor,

                  HI,

                  Can you please ellobarate how can I perform that?

                  Thanks,
                  Naresh

                  Originally posted by Cofactor Genomics View Post
                  One metric you can look at to assess your assembly is the percentage of reads that align back to your transcriptome assembly.

                  Comment

                  • Cofactor Genomics
                    Registered Vendor
                    • Jan 2010
                    • 52

                    #10
                    Well, your newly formed transcriptome assembly is your reference, the raw read data used to generate the assembly is your data and you treat it like a RNA-seq project. In this manner, you align the raw data (trimming does not matter here, this is just a QA check) to the assembly and divide the number of reads aligning to the assembly by the total number of reads that went into the assembly. This is just a rough check and one could argue that you will miss things, however it is good for a rough check.

                    From these alignments, you may find some surprising results in that the percentage of reads are pretty low. Non-transcriptome assemblers do not like to see large differences in coverage in an assembly, assuming these are repetitive areas that are piling, but this type of data is inherent with RNA data.

                    Did you perform any manipulations during library prep to treat the RNA for the assembly process, like double-stranded nuclease treatment (to compress the dynamic range of the sample)? This can greatly help an assembly if one is not to heavy handed in the treatment.

                    It is hard to tell you what percentages are good and bad since I am not sure how the material was treated prior to sequencing or what your goals are for the assembly.

                    Hope this helps.

                    Jon Armstrong

                    Comment

                    • nareshvasani
                      Member
                      • Apr 2013
                      • 57

                      #11
                      Hi,
                      Thanks for your prompt reply.
                      I didn't perform any manipulation during library prep.

                      Thanks in advance,
                      Naresh

                      Originally posted by Cofactor Genomics View Post
                      Well, your newly formed transcriptome assembly is your reference, the raw read data used to generate the assembly is your data and you treat it like a RNA-seq project. In this manner, you align the raw data (trimming does not matter here, this is just a QA check) to the assembly and divide the number of reads aligning to the assembly by the total number of reads that went into the assembly. This is just a rough check and one could argue that you will miss things, however it is good for a rough check.

                      From these alignments, you may find some surprising results in that the percentage of reads are pretty low. Non-transcriptome assemblers do not like to see large differences in coverage in an assembly, assuming these are repetitive areas that are piling, but this type of data is inherent with RNA data.

                      Did you perform any manipulations during library prep to treat the RNA for the assembly process, like double-stranded nuclease treatment (to compress the dynamic range of the sample)? This can greatly help an assembly if one is not to heavy handed in the treatment.

                      It is hard to tell you what percentages are good and bad since I am not sure how the material was treated prior to sequencing or what your goals are for the assembly.

                      Hope this helps.

                      Jon Armstrong

                      Comment

                      • nareshvasani
                        Member
                        • Apr 2013
                        • 57

                        #12
                        Hi westerman,

                        Hi,

                        I am trying to run read alignment of my fastq file with Trinity.fa file using trinity's script that is alignreads.pl

                        I used below cmd:
                        #### /bin/util/alignReads.pl -seqType fq -single inputfile_name -target Trinity.fasta -aligner bowtie2 -p 4 -retain_intermediate_files -num_top_hits 20 -output align_bowtie_output

                        but i am getting below error:
                        Must specify target_db and it must exist at that location at /bin/util/alignReads.pl line 180

                        I don't know what does that mean, as I am good with reading script.

                        Hope you can help me out.
                        Thanks,
                        Naresh


                        Originally posted by westerman View Post
                        Your understanding of fastq_quality_trimmer is correct. BTW, if you ever get paired-end sequences then use 'trimmomatic' instead since it works much better with PE reads.

                        The other parameters to Trinity will depend on the size of your computer system -- e.g., how much memory, how many CPUs -- but these parameters are not required and what you have should be good enough. I suggest running Trinity with the parameters you have and see what happens. In the end you should get a 'Trinity.fasta' file. The other files can be discarded.

                        Once you get a 'Trinity.fasta' file then you can use bowtie2 to back-map your reads or, perhaps better, the Trinnotate annotation pipeline described on the Trinity web site.

                        Comment

                        • westerman
                          Rick Westerman
                          • Jun 2008
                          • 1104

                          #13
                          Well, off-hand I would say that you need to give the whole path to the Trinity.fasta file. I suspect that it is not located in the directory in which you are located.

                          In these 'file not found' cases it is always helpful to the rest of us if you can include the results of:

                          pwd

                          and a

                          ls -l

                          Comment

                          • nareshvasani
                            Member
                            • Apr 2013
                            • 57

                            #14
                            Hi westerman,

                            Hi westerman,

                            Thanks for you reply.
                            fullpath was mising in my cmd line.

                            Above cmd worked but with bowtie not with bowtie2.

                            With below cmd:
                            /bin/util/alignReads.pl --seqType fq --single CombineIonXpressRNA_010_NareshPool_Chip1_2_WT2_fastxtrimmer_from_quality_trimmer.fastq --target /media/DATAPART3/Combine_Files/Velvetoptimiser_27_37/trinity_output/Trinity.fasta --aligner bowtie2 --retain_intermediate_files --num_top_hits 20 --output align_bowtie_output1

                            Following error:
                            which: no tophat2 in (/root/perl5/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/root/Trinity/util/../trinity-plugins/rsem/sam/)
                            Error, path to required tophat2 cannot be found at /bin/util/alignReads.pl line 234.

                            #If I used bowtie instead of bowtie 2, it work fine with problem i.e. sam file has no header.

                            Hope you can help me out.
                            Thanks in advance.
                            Naresh


                            Originally posted by westerman View Post
                            Well, off-hand I would say that you need to give the whole path to the Trinity.fasta file. I suspect that it is not located in the directory in which you are located.

                            In these 'file not found' cases it is always helpful to the rest of us if you can include the results of:

                            pwd

                            and a

                            ls -l

                            Comment

                            • westerman
                              Rick Westerman
                              • Jun 2008
                              • 1104

                              #15
                              I am surprised that Trinity would be looking for tophat2. However bowtie2 is closely associated with tophat2. I suggest installing tophat2. You may never need it but that should be the way to get Trinity to use bowtie2.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              108 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...