Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shoegame2001
    Member
    • Dec 2010
    • 21

    RNA-seq SNP-calling without a complete reference

    I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.

    I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs.

    However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors.

    I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it?
    Last edited by shoegame2001; 12-06-2011, 03:49 PM.
  • htchu.taiwan
    Junior Member
    • Dec 2011
    • 5

    #2
    Hi, friend,

    You may try my program: EBARDenovo for RNA-Seq.
    Download EBARDenovo for free. Highly-accurate de novo assembler of paired-end RNA-Seq. A highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.



    It's a 64-bits Windows command with .Net.

    EBARDenovo can assembly lower-expressed transcripts even their coverage depths are very low (e.g., 1.5).


    Frank H.T. Chu from Taiwan

    Originally posted by shoegame2001 View Post
    I am working on a project that seeks to call SNPs for a non-model organism with no existing reference genome or transcriptome using multiplexed Illumina RNA-seq data.

    I used Trinity to assemble a partial 'reference' transcriptome of the most highly expressed transcripts for which we had sufficient coverage, as well as many fragments of lower-expressed transcripts. Then I used BWA to map all data for multiple individuals back to that reference, and finally used GATK to call SNPs.

    However, I am running into an issue where reads derived from paralogous genes or a multigene family are mapping back to the same reference contig, creating false SNPs in divergent positions. My evidence of this is that in general one 'allele' (actually a slightly divergent gene) is supported by significantly fewer than half of the reads for a given individual that is called a heterozygote. These 'SNPs' are also generally observed across several individuals, leading me to believe that these are not sequencing/library prep errors.

    I think that I will be able to identify these cases with some statistic, but I am wondering if there is a good way to modify the corresponding SAM files to remove the mis-mapped reads, then re-genotype. Has anyone else encountered similar issues, and if so how did you deal with it?

    Comment

    • Nico55
      Junior Member
      • Dec 2011
      • 7

      #3
      I’m in the same boat my friend. Right now I am using oases to assemble; after trialing several assembly programs I found it did the best work with my transcriptomes. I then implemented SOAPaligner in conjunction with SOAPsnp. This trial is still underway I will update you as soon as I compile my results. I would love to hear if you have made any progress using different programs or pipelines.
      Thanks
      Last edited by Nico55; 12-14-2011, 06:38 PM.

      Comment

      • rururara
        Member
        • Jan 2011
        • 31

        #4
        RNA-seq SNP-calling without a complete reference

        Hi all,

        I tried also Oases for de novo transcriptome and quite satisfied with the output.
        But now, I notice that how to obtain the SNP position from de novo assembly?
        Can we just rely on the SNP position that was given from variant calls etc: samtools, gigabayes, freebayes or we need to write in house script ?

        In my case, I'm working with diploid plant. Some people said it's easier. But for me it's still a challenge.

        Hope to hear comments from you guys.
        Thanks!

        Comment

        • edge
          Senior Member
          • Sep 2009
          • 199

          #5
          Hi shoegame2001,

          Do you figure out the solution for your doubt?
          Currently I'm facing the same problem as well.
          I have a Illumina RNA-seq pair-end read, reference transcriptome.
          However, I have no idea how to get the SNP result from my data set.
          Thanks for any advice.

          Comment

          • shoegame2001
            Member
            • Dec 2010
            • 21

            #6
            As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.

            Comment

            • htchu.taiwan
              Junior Member
              • Dec 2011
              • 5

              #7
              Hi, friends,

              You may try my program: EBARDenovo for RNA-Seq.
              EBARDenovo now can output SNP locations in the comtigs with the parameter (-P)
              Please check:
              Download EBARDenovo for free. Highly-accurate de novo assembler of paired-end RNA-Seq. A highly-accurate search-based de novo assembler of paired-end RNA-Seq for advance transcriptomic study.


              It's a 64-bits Windows command with .Net.
              You can run it on a Windows PC with 16G RAM for 30~40G fastq RNA-Seq data.
              In our experiments, EBARDenovo is more accurate than Trinity and Oases.

              Hsueh-Ting Chu

              Originally posted by shoegame2001 View Post
              As far as I can tell, there is no software designed for SNP-calling in RNA-seq data in the absence of a reference genome. Aligning reads back to a de novo assembled transcriptome and then filtering based on the proportion of reads supporting the alternative allele in called heterozygotes as well as deviation from Hardy-Weinberg results in a more reliable SNP set, but I am afraid there are still false positives that slip through.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              23 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              40 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              47 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              49 views
              0 reactions
              Last Post SEQadmin2  
              Working...