Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HTseq without reference genome

    I'm new to RNA-Seq analysis and have Illumina HiSeq paired-end reads (100bp) from plant samples. I would like to get a count of read abundance on the isoform as well as gene level and to proceed to DESeq for differential expression. I've been reading that HTseq is a suitable tool for obtaining read counts (as opposed to RSEM which gives estimates instead of actual counts).

    My problem is that there is no reference genome for my plant species, and I notice that most examples for HTseq are for samples with reference genomes with known gene annotations. Can I still use HTseq for obtaining read count values if no reference genome is available?

    Thanks in advance for any advice.

  • #2
    You can maybe use a genome of a related species and then perform an alignment (with tophat per example). After that you can extract the number of read per feature (gene, isoform,...) with htseq. After that use DESeq for differential expression analysis

    Or you can use a de-novo approach. Assemble de-novo the transcriptome of the plant ( with trinity, oases,...) . Align your reads against the transcriptome. extract the read count for each transcript. perform differential expression analysis with DESeq (or edgeR)

    Comment


    • #3
      Much thanks for your advice.

      The second approach is what I'm doing now. However, I'm not sure how to extract the read count for each transcript, since some of the reads are multi-mapped. I know there's RSEM which works for transcriptomes without a reference.. but from what I've been reading, RSEM output is not so suitable as DESeq input since the read counts are only estimates.

      Does anyone know of any other programs which can give me read counts without references other than RSEM?

      Comment


      • #4
        Hi guys,

        I have similar situation with what mht has.
        Could anyone fix this problem?

        Thanks.

        Regards,

        Senhao

        Originally posted by mht View Post
        Much thanks for your advice.

        The second approach is what I'm doing now. However, I'm not sure how to extract the read count for each transcript, since some of the reads are multi-mapped. I know there's RSEM which works for transcriptomes without a reference.. but from what I've been reading, RSEM output is not so suitable as DESeq input since the read counts are only estimates.

        Does anyone know of any other programs which can give me read counts without references other than RSEM?

        Comment


        • #5
          This paper does transcript de novo assembly and then count gene features based on the output of their assemblies, might be of your interest:

          Genome Res. 2012 Apr;22(4):602-10. Epub 2011 Dec 29.
          Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

          Comment


          • #6
            You should obtain read counts per gene, not per transcript. If you align reads to a transcriptome, each read will typically align to several transcripts. Verify that they are all transcripts of the same gene and then count this as one for this gene. Of course, you will need to write a custom script to process the aligner output and do the counting, but this should be easy.

            Comment


            • #7
              Hi Simon,

              Thanks for your advice. Unfortunately, without that background, I don't know how to write such a script. I may need your help if you have time and I wish it will not bother you too much.

              I align our reads back to the transcriptome using a script within Trinity package (alignReads.pl), the transcriptome was de novo assembled using Trinity, I got my align results consist of several files, such as
              Code:
              bowtie_out.coordSorted.bam
              bowtie_out.coordSorted.bam.bai
              bowtie_out.nameSorted.bam
              bowtie_out.nameSorted.PropmapPairsForRSEM.bam
              
              [I]et al.[/I]
              I don't know which file listed above should be used to count genes.
              (p.s. non-model plant; two replicates per sample; 100bp paired-end reads obtained using HiSeq 2000)

              I really need your generous help, or I don't know how to do downstream analysis.

              Thank you very much.

              Yours sincerely,
              Senhao

              Originally posted by Simon Anders View Post
              You should obtain read counts per gene, not per transcript. If you align reads to a transcriptome, each read will typically align to several transcripts. Verify that they are all transcripts of the same gene and then count this as one for this gene. Of course, you will need to write a custom script to process the aligner output and do the counting, but this should be easy.

              Comment


              • #8
                Hi areyes,

                Thank you very much.

                I will read the paper.

                Yours sincerely,
                Senhao

                Originally posted by areyes View Post
                This paper does transcript de novo assembly and then count gene features based on the output of their assemblies, might be of your interest:

                Genome Res. 2012 Apr;22(4):602-10. Epub 2011 Dec 29.
                Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Developments in Metagenomics
                  by seqadmin





                  Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                  09-23-2024, 06:35 AM
                • seqadmin
                  Understanding Genetic Influence on Infectious Disease
                  by seqadmin




                  During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                  Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                  09-09-2024, 10:59 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 10-02-2024, 04:51 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-01-2024, 07:10 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-30-2024, 08:33 AM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-26-2024, 12:57 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Working...
                X