Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTseq without reference genome

    I'm new to RNA-Seq analysis and have Illumina HiSeq paired-end reads (100bp) from plant samples. I would like to get a count of read abundance on the isoform as well as gene level and to proceed to DESeq for differential expression. I've been reading that HTseq is a suitable tool for obtaining read counts (as opposed to RSEM which gives estimates instead of actual counts).

    My problem is that there is no reference genome for my plant species, and I notice that most examples for HTseq are for samples with reference genomes with known gene annotations. Can I still use HTseq for obtaining read count values if no reference genome is available?

    Thanks in advance for any advice.

  • #2
    You can maybe use a genome of a related species and then perform an alignment (with tophat per example). After that you can extract the number of read per feature (gene, isoform,...) with htseq. After that use DESeq for differential expression analysis

    Or you can use a de-novo approach. Assemble de-novo the transcriptome of the plant ( with trinity, oases,...) . Align your reads against the transcriptome. extract the read count for each transcript. perform differential expression analysis with DESeq (or edgeR)

    Comment


    • #3
      Much thanks for your advice.

      The second approach is what I'm doing now. However, I'm not sure how to extract the read count for each transcript, since some of the reads are multi-mapped. I know there's RSEM which works for transcriptomes without a reference.. but from what I've been reading, RSEM output is not so suitable as DESeq input since the read counts are only estimates.

      Does anyone know of any other programs which can give me read counts without references other than RSEM?

      Comment


      • #4
        Hi guys,

        I have similar situation with what mht has.
        Could anyone fix this problem?

        Thanks.

        Regards,

        Senhao

        Originally posted by mht View Post
        Much thanks for your advice.

        The second approach is what I'm doing now. However, I'm not sure how to extract the read count for each transcript, since some of the reads are multi-mapped. I know there's RSEM which works for transcriptomes without a reference.. but from what I've been reading, RSEM output is not so suitable as DESeq input since the read counts are only estimates.

        Does anyone know of any other programs which can give me read counts without references other than RSEM?

        Comment


        • #5
          This paper does transcript de novo assembly and then count gene features based on the output of their assemblies, might be of your interest:

          Genome Res. 2012 Apr;22(4):602-10. Epub 2011 Dec 29.
          Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

          Comment


          • #6
            You should obtain read counts per gene, not per transcript. If you align reads to a transcriptome, each read will typically align to several transcripts. Verify that they are all transcripts of the same gene and then count this as one for this gene. Of course, you will need to write a custom script to process the aligner output and do the counting, but this should be easy.

            Comment


            • #7
              Hi Simon,

              Thanks for your advice. Unfortunately, without that background, I don't know how to write such a script. I may need your help if you have time and I wish it will not bother you too much.

              I align our reads back to the transcriptome using a script within Trinity package (alignReads.pl), the transcriptome was de novo assembled using Trinity, I got my align results consist of several files, such as
              Code:
              bowtie_out.coordSorted.bam
              bowtie_out.coordSorted.bam.bai
              bowtie_out.nameSorted.bam
              bowtie_out.nameSorted.PropmapPairsForRSEM.bam
              
              [I]et al.[/I]
              I don't know which file listed above should be used to count genes.
              (p.s. non-model plant; two replicates per sample; 100bp paired-end reads obtained using HiSeq 2000)

              I really need your generous help, or I don't know how to do downstream analysis.

              Thank you very much.

              Yours sincerely,
              Senhao

              Originally posted by Simon Anders View Post
              You should obtain read counts per gene, not per transcript. If you align reads to a transcriptome, each read will typically align to several transcripts. Verify that they are all transcripts of the same gene and then count this as one for this gene. Of course, you will need to write a custom script to process the aligner output and do the counting, but this should be easy.

              Comment


              • #8
                Hi areyes,

                Thank you very much.

                I will read the paper.

                Yours sincerely,
                Senhao

                Originally posted by areyes View Post
                This paper does transcript de novo assembly and then count gene features based on the output of their assemblies, might be of your interest:

                Genome Res. 2012 Apr;22(4):602-10. Epub 2011 Dec 29.
                Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 12:08 PM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                17 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X