Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shoegame2001
    Member
    • Dec 2010
    • 21

    RNA-seq SNP-calling without a complete reference

    I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.

    I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs.

    However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors.

    I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it?
    Last edited by shoegame2001; 12-06-2011, 03:49 PM.
  • htchu.taiwan
    Junior Member
    • Dec 2011
    • 5

    #2
    Hi, friend,

    You may try my program: EBARDenovo for RNA-Seq.
    Download EBARDenovo for free. Highly-accurate de novo assembler of paired-end RNA-Seq. A highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.



    It's a 64-bits Windows command with .Net.

    EBARDenovo can assembly lower-expressed transcripts even their coverage depths are very low (e.g., 1.5).


    Frank H.T. Chu from Taiwan

    Originally posted by shoegame2001 View Post
    I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.

    I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs.

    However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors.

    I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it?

    Comment

    • Nico55
      Junior Member
      • Dec 2011
      • 7

      #3
      I’m in the same boat my friend. Right now I am using oases to assemble; after trialing several assembly programs I found it did the best work with my transcriptomes. I then implemented SOAPaligner in conjunction with SOAPsnp. This trial is still underway I will update you as soon as I compile my results. I would love to hear if you have made any progress using different programs or pipelines.
      Thanks
      Last edited by Nico55; 12-14-2011, 06:38 PM.

      Comment

      • rururara
        Member
        • Jan 2011
        • 31

        #4
        RNA-seq SNP-calling without a complete reference

        Hi all,

        I tried also Oases for de novo transcriptome and quite satisfied with the output.
        But now, I notice that how to obtain the SNP position from de novo assembly?
        Can we just rely on the SNP position that was given from variant calls etc: samtools, gigabayes, freebayes or we need to write in house script ?

        In my case, I'm working with diploid plant. Some people said it's easier. But for me it's still a challenge.

        Hope to hear comments from you guys.
        Thanks!

        Comment

        • edge
          Senior Member
          • Sep 2009
          • 199

          #5
          Hi shoegame2001,

          Do you figure out the solution for your doubt?
          Currently I'm facing the same problem as well.
          I have a Illumina RNA-seq pair-end read, reference transcriptome.
          However, I have no idea how to get the SNP result from my data set.
          Thanks for any advice.

          Comment

          • shoegame2001
            Member
            • Dec 2010
            • 21

            #6
            As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.

            Comment

            • htchu.taiwan
              Junior Member
              • Dec 2011
              • 5

              #7
              Hi, friends,

              You may try my program: EBARDenovo for RNA-Seq.
              EBARDenovo now can output SNP locations in the comtigs with the parameter (-P)
              Please check:
              Download EBARDenovo for free. Highly-accurate de novo assembler of paired-end RNA-Seq. A highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.


              It's a 64-bits Windows command with .Net.
              You can run it on a Windows PC with 16G RAM for 30~40G fastq RNA-Seq data.
              In our experiments, EBARDenovo is more accurate than Trinity and Oases.

              Hsueh-Ting Chu

              Originally posted by shoegame2001 View Post
              As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              15 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              26 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              37 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              61 views
              0 reactions
              Last Post SEQadmin2  
              Working...