Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ron128
    Member
    • Sep 2011
    • 38

    Variant Calling from paired end RNAseq data

    Hello everybody, since everybody is getting on the variant calling from RNAseq bandwagon, my P.I wants to get in on the party as well. Too bad for overworked ppl like me So I have this huge dataset of illumina paired end rnaseq which I am trying to run through GATK. I am taking the accepted_hits.bam file from tophat/cufflinks as the input file for GATK. My reference for the tophat/cufflinks pipeline was UCSC annotated genes (knownGenes.gtf) and UCSC hg19 reference.

    My pipeline for gatk is:
    convert bam>sam, sort sam> insert read groups> fixmates using picard> sam to bam> remove duplicates > reindex bam > realign indels

    When i Run the indel realignment for GATK, I get the following error: contig chr 1 missing from reference. So i go ahead and look it up, and use the reorder sam option in picards tools. I modify my pipeline as follows:

    convert bam>sam, sort sam> insert read groups> fixmates using picard> createdictionary.jar for hg 19 reference using picard > reorder sam >
    sam to bam> remove duplicates > reindex bam > realign indels

    My error is not solved even after reordering the sam file and i still get the same error of "chr1 contig not found in your reference"

    Is it something to do with the references I have been using? I have used hg19 reference both for my tophat/cufflinks as well as the GATK pipeline.

    Thanks a ton in advance!
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    How are you making your .sam? I'm pretty sure that bwa sampe will add read group in for with the -r option.

    And I think Picard would add them to a .bam. I'm pretty sure you do NOT have to expand your .bam to a .sam in order to do that.

    And you double-cheked to make sure that the name of Chr 1 is exactly the same between your .bam and your reference genome?

    Comment

    • ron128
      Member
      • Sep 2011
      • 38

      #3
      Hey thanks for a quickie reply Like i said, I am not redoing any alignments here. I have already run the tophat cufflinks pipeline on my data to assemble it into known transcripts using UCSC genes and hg 19 as a reference. SO this way I already have access to an "accepted_hits.bam" as an output from the tophat runs. I am using this accepted_hits.bam as my alignment file for GATK and converting this to a sam file using Picard tols. I am skimming down on my analysis time by not redoing the alignments.

      to your query about whether I checked the reference and my sam file for the chr1, yea i did. And this is what is troubling me. I am using the same reference for both my rnaseq and variant calling. So logically there shouldnt be any discrepancies. I was wondering if anybody has faced the same issues when using gatk for variant calling in RNAseq? I am pretty new to this so I might me messing up somewhere in the pipeline..

      Comment

      Latest Articles

      Collapse

      • seqadmin
        New Genomics Tools and Methods Shared at AGBT 2025
        by seqadmin


        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

        The Headliner
        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
        03-03-2025, 01:39 PM
      • seqadmin
        Investigating the Gut Microbiome Through Diet and Spatial Biology
        by seqadmin




        The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
        02-24-2025, 06:31 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-20-2025, 05:03 AM
      0 responses
      17 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-19-2025, 07:27 AM
      0 responses
      18 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-18-2025, 12:50 PM
      0 responses
      19 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-03-2025, 01:15 PM
      0 responses
      185 views
      0 reactions
      Last Post seqadmin  
      Working...