Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gsMapper - Using a transcriptome ref question

    I have a question about gsMapper I'm hoping someone can help me with.

    I have a project in which 20 samples underwent RNAseq using 454. I combined the SFFs and used gsAssembler to create a transcriptome.

    Now, I want to use this transcriptome as a reference for individually mapping each of the 20 samples. From there I will ascertain differential expression data, etc. The transcriptome assembly resulted in 454AllContigs files and 454Isotigs files. Which should I use for the mapping reference??

    I think it should be the 454AllContigs file, since mapping to the isotigs will results in a single read likely mapping to many different isotigs that share exons within an isogroup. I've read a different post that talked about finding the isotigs that contains the most exons in the isogroup, and only mapping to such a collection of long isotigs, but this seems like it would be less biologically relevant and simply more convenient.

    Any thoughts are much appreciated

  • #2
    Another option to get read count, and I honestly am not sure which is better, is to just use the 454ReadStatus.txt file from the assembly. This file tells you which contig was formed for the 3` and 5` tails of each assembled read.

    I used mapping back to the contigs to try and find SNPs, when I use the assembled reads (taking only the reads with status listed as Assembled or PartiallyAssembled from the 454ReadStatus.txt file) as a .fasta something like 99.9% of reads mapped back to the contigs, but when I used the same read set as a .sff only 75% mapped back.

    Version 2.6 also seems pretty good at mapping to a transcriptome reference so you should just be able to map to the 454Isotigs.fna file directly.

    Comment


    • #3
      Thanks Jeremy

      I'm using 2.6, so I'll give it a try with 454isotigs.fna and report back my metrics. I appreciate the advice.

      Comment


      • #4
        Hi,

        I'm conducting a similar analysis as all_your_base, and am using the 454isotigs.fna file as reference file. Each isogroup is specified in the header for each isotig, so it says gene=Isogroup1 i.e. Since one would like to measure expression per isogroup (most likely per gene), the output file 454GeneStatus.txt should have the number of reads per each isogroup, right?
        However, my 454GeneStatus.txt file has the number of reads per isotig and there are multiple isotigs with a read count that belong to the same isogroup. Did you have the same problem, how did you solve this?

        Comment


        • #5
          Use 454AllContigs.fna as the reference, then the 454GeneStatus.txt will output the reads per isogroup. Only 76% of reads (the subset that that were reported as assembled, i.e. singletons removed) mapped when I did this using the .sff meaning that 24% of the reads that were used in the assembly did not map. Using a .fasta of the same sub set of reads resulted in almost all of them mapping.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          20 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X