Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTseq without reference genome

    I'm new to RNA-Seq analysis and have Illumina HiSeq paired-end reads (100bp) from plant samples. I would like to get a count of read abundance on the isoform as well as gene level and to proceed to DESeq for differential expression. I've been reading that HTseq is a suitable tool for obtaining read counts (as opposed to RSEM which gives estimates instead of actual counts).

    My problem is that there is no reference genome for my plant species, and I notice that most examples for HTseq are for samples with reference genomes with known gene annotations. Can I still use HTseq for obtaining read count values if no reference genome is available?

    Thanks in advance for any advice.

  • #2
    You can maybe use a genome of a related species and then perform an alignment (with tophat per example). After that you can extract the number of read per feature (gene, isoform,...) with htseq. After that use DESeq for differential expression analysis

    Or you can use a de-novo approach. Assemble de-novo the transcriptome of the plant ( with trinity, oases,...) . Align your reads against the transcriptome. extract the read count for each transcript. perform differential expression analysis with DESeq (or edgeR)

    Comment


    • #3
      Much thanks for your advice.

      The second approach is what I'm doing now. However, I'm not sure how to extract the read count for each transcript, since some of the reads are multi-mapped. I know there's RSEM which works for transcriptomes without a reference.. but from what I've been reading, RSEM output is not so suitable as DESeq input since the read counts are only estimates.

      Does anyone know of any other programs which can give me read counts without references other than RSEM?

      Comment


      • #4
        Hi guys,

        I have similar situation with what mht has.
        Could anyone fix this problem?

        Thanks.

        Regards,

        Senhao

        Originally posted by mht View Post
        Much thanks for your advice.

        The second approach is what I'm doing now. However, I'm not sure how to extract the read count for each transcript, since some of the reads are multi-mapped. I know there's RSEM which works for transcriptomes without a reference.. but from what I've been reading, RSEM output is not so suitable as DESeq input since the read counts are only estimates.

        Does anyone know of any other programs which can give me read counts without references other than RSEM?

        Comment


        • #5
          This paper does transcript de novo assembly and then count gene features based on the output of their assemblies, might be of your interest:

          Genome Res. 2012 Apr;22(4):602-10. Epub 2011 Dec 29.
          Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

          Comment


          • #6
            You should obtain read counts per gene, not per transcript. If you align reads to a transcriptome, each read will typically align to several transcripts. Verify that they are all transcripts of the same gene and then count this as one for this gene. Of course, you will need to write a custom script to process the aligner output and do the counting, but this should be easy.

            Comment


            • #7
              Hi Simon,

              Thanks for your advice. Unfortunately, without that background, I don't know how to write such a script. I may need your help if you have time and I wish it will not bother you too much.

              I align our reads back to the transcriptome using a script within Trinity package (alignReads.pl), the transcriptome was de novo assembled using Trinity, I got my align results consist of several files, such as
              Code:
              bowtie_out.coordSorted.bam
              bowtie_out.coordSorted.bam.bai
              bowtie_out.nameSorted.bam
              bowtie_out.nameSorted.PropmapPairsForRSEM.bam
              
              [I]et al.[/I]
              I don't know which file listed above should be used to count genes.
              (p.s. non-model plant; two replicates per sample; 100bp paired-end reads obtained using HiSeq 2000)

              I really need your generous help, or I don't know how to do downstream analysis.

              Thank you very much.

              Yours sincerely,
              Senhao

              Originally posted by Simon Anders View Post
              You should obtain read counts per gene, not per transcript. If you align reads to a transcriptome, each read will typically align to several transcripts. Verify that they are all transcripts of the same gene and then count this as one for this gene. Of course, you will need to write a custom script to process the aligner output and do the counting, but this should be easy.

              Comment


              • #8
                Hi areyes,

                Thank you very much.

                I will read the paper.

                Yours sincerely,
                Senhao

                Originally posted by areyes View Post
                This paper does transcript de novo assembly and then count gene features based on the output of their assemblies, might be of your interest:

                Genome Res. 2012 Apr;22(4):602-10. Epub 2011 Dec 29.
                Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  New Genomics Tools and Methods Shared at AGBT 2025
                  by seqadmin


                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                  The Headliner
                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                  03-03-2025, 01:39 PM
                • seqadmin
                  Investigating the Gut Microbiome Through Diet and Spatial Biology
                  by seqadmin




                  The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                  02-24-2025, 06:31 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 12:50 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-03-2025, 01:15 PM
                0 responses
                183 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-28-2025, 12:58 PM
                0 responses
                280 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 02-24-2025, 02:48 PM
                0 responses
                664 views
                0 likes
                Last Post seqadmin  
                Working...
                X