Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    This is what sdriscoll meant, when he said that you are mapping to a transcriptome. You have provided your aligner with a FASTA file which did not contain one sequence for each chromosome but one sequence for each transcript. (Otherwise, how would samtools know where the transcripts are, as you haven't supplied a GFF file.) This is known as "mapping against the transcriptome" and it is "bad" if you don't know exactly what you are doing, for various reasons that you'll find in old threads here.

    Comment


    • #17
      Originally posted by Simon Anders View Post
      This is what sdriscoll meant, when he said that you are mapping to a transcriptome. You have provided your aligner with a FASTA file which did not contain one sequence for each chromosome but one sequence for each transcript. (Otherwise, how would samtools know where the transcripts are, as you haven't supplied a GFF file.) This is known as "mapping against the transcriptome" and it is "bad" if you don't know exactly what you are doing, for various reasons that you'll find in old threads here.
      Ah okay. I misunderstood, I thought sdriscoll was asking if I was mapping against a reference assembled from transcriptomics data, as opposed to the DNA sequences of predicted proteins from sequenced genomes (Which is what I'm using). Sorry for being a "student" and for for making a "mistake". I'll correct that post

      Comment


      • #18
        Exactly as Simon said. If you were mapping to a genome reference then idxstats would return read counts per chromosome. It's absolutely more complicated to map to a transcriptome reference. A couple tools for that are eXpress and RSEM but neither of those will help you get counts at the gene level without you providing some knowledge of which references are from the same gene.

        Probably the most straightforward approach is to align your reads to a genome reference (full chromosome sequences) with Tophat or STAR, if you have the RAM for it, then to count hits to genes with something like htseq-count which can find overlaps of genomic coordinates with gene features annotated in a GTF file.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #19
          If you really want to align to this database you're using I suggest trying RSEM.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #20
            Originally posted by sdriscoll View Post
            If you really want to align to this database you're using I suggest trying RSEM.
            I'll look into it. Thank you.

            Comment


            • #21
              Also I don't want to send you down a confusing path. I don't mind providing you with some help to get that pipeline working. One more thing to consider - do you expect insertions/deletions to be important? If so then RSEM may not be what you want since it uses bowtie1 for alignments. eXpress is a similar solution and with eXpress you can use alignments from bowtie1, bowtie2, bwa (with some tweaking) and really any aligner that can output all possible alignments for a given read. These tools attempt to disambiguate the alignments to a set of gene/protein/transcript sequences giving you "unique" mappings for even reads that can align equally well to several references. I've done a bit of benchmarking and honestly I haven't seen great results from eXpress but RSEM does pretty well. Both work VERY well if you are able to sum counts of sequences together for sequences that share exons or share sequence (as in multi-copy genes or alternatively spliced genes). They work OK in terms of per-sequence level counts - certainly better than what the aligners can do on their own - but certainly not perfect. Just keep in mind that you're per-sequence expressions will likely contain some false positives (maybe a lot...) and will also likely be missing a few true positives. In the end you're knowledge of which sequences in your database share sequence or share exons will help you immensely in getting stable and reliable read counts.
              /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
              Salk Institute for Biological Studies, La Jolla, CA, USA */

              Comment


              • #22
                Originally posted by sdriscoll View Post
                Also I don't want to send you down a confusing path. I don't mind providing you with some help to get that pipeline working. One more thing to consider - do you expect insertions/deletions to be important? If so then RSEM may not be what you want since it uses bowtie1 for alignments. eXpress is a similar solution and with eXpress you can use alignments from bowtie1, bowtie2, bwa (with some tweaking) and really any aligner that can output all possible alignments for a given read. These tools attempt to disambiguate the alignments to a set of gene/protein/transcript sequences giving you "unique" mappings for even reads that can align equally well to several references. I've done a bit of benchmarking and honestly I haven't seen great results from eXpress but RSEM does pretty well. Both work VERY well if you are able to sum counts of sequences together for sequences that share exons or share sequence (as in multi-copy genes or alternatively spliced genes). They work OK in terms of per-sequence level counts - certainly better than what the aligners can do on their own - but certainly not perfect. Just keep in mind that you're per-sequence expressions will likely contain some false positives (maybe a lot...) and will also likely be missing a few true positives. In the end you're knowledge of which sequences in your database share sequence or share exons will help you immensely in getting stable and reliable read counts.
                Well alternatively spliced genes won't be a problem, it's all bacteria I'm mapping to. RSEM I have been playing around with, but I have a pretty large sample size, and realigning all of them would be very time consuming, obviously I'll do it if necessary but I'd prefer not to have to. I haven't tried eXpress yet, but I will.

                When I was mapping with bowtie2 I left it's reporting mode in default (i.e. report only the best alignment) but eXpress wants to be able to select the best alignment itself. Do you think this will be a big issue?

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X