Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM files from RNAseq -Alignment (Basespace) not working with DESEq2 in R

    Hello experts,

    With my recent run in illumina Hiseq2500 I aligned the FASTAQ files generated in basespace to human reference genome (Homosapieans (PAR masked)/hg19 (Refseq) using the RNA-seq Alignment App.I used Tophat (Bowtie2) aligner for the same. except for "NO" to the stranded information, No other parameters were selected.
    I downloaded the BAM files generated and used them in the Bioconductor RNAseq workflow (Using DESeq2)
    Here we walk through an end-to-end gene-level RNA-seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were aligned to the reference genome, and prepare a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results.


    to create a summarized object I used "Homo_sapiens.GRCh37.75.gtf"

    below is the script:

    (txdb <- makeTxDbFromGFF("Homo_sapiens.GRCh37.75.gtf", format="gtf", circ_seqs=character()))
    (ebg <- exonsBy(txdb, by="gene"))

    se <- summarizeOverlaps(features=ebg,reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE )

    I am getting the following error message and am unable to get any count

    Warning message:
    In .Seqinfo.mergexy(x, y) :
    The 2 combined objects have no sequence levels in common. (Use
    suppressWarnings() to suppress this warning.)

    Any suggestions?

    Ram

  • #2
    Hi Ram,

    could you check if the chromosomes' name in the bam files are the same as the ones in the gtf?

    Code:
    samtools view -H my_bam.bam | head
    and
    Code:
    head Homo_sapiens.GRCh37.75.gtf
    .

    Often, one annotation is 'chr1', another 'Chr1', and sometimes just '1'. If you are mixing these up, no program or tool will find an overlap.

    Cheers,

    Michael

    Comment


    • #3
      Thanks a lot Michael,

      below is the output of the codes you suggested:

      [rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
      @HD VN:1.0 SO:coordinate
      @RG ID:0 SM:Ctrl13058-Trimmed PI:50
      @SQ SN:chrM LN:16571
      @SQ SN:chr1 LN:249250621
      @SQ SN:chr2 LN:243199373
      @SQ SN:chr3 LN:198022430
      @SQ SN:chr4 LN:191154276
      @SQ SN:chr5 LN:180915260
      @SQ SN:chr6 LN:171115067
      @SQ SN:chr7 LN:159138663

      [rshukla@dev01 test]$ head Homo_sapiens.GRCh37.75.gtf
      #!genome-build GRCh37.p13
      #!genome-version GRCh37
      #!genome-date 2009-02
      #!genome-build-accession NCBI:GCA_000001405.14
      #!genebuild-last-updated 2013-09
      1 pseudogene gene 11869 14412 . + . gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene";
      1 processed_transcript transcript 11869 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; g ene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";
      1 processed_transcript exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "1"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002234944";
      1 processed_transcript exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "2"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00003582793";
      1 processed_transcript exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "3"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002312635";


      The format in .bam file is chr1 where as in .gtf I guess it is just the number.

      Do you have any suggestion how to proceed further, I am very new to all these.

      Thanks again,

      Ram

      Comment


      • #4
        Can you download a corresponding GTF file from BaseSpace? That should sort this problem out.

        It should be possible to change the GTF file you have (from 1 to chr1 etc) but if the above works then you would know that you have a file that exactly corresponds to the genome used for alignment. Using an incorrect GTF file will lead to erroneous results.

        Edit: If changing 1 to chr 1 is the preferred option then use the solution suggested by @vivek in this thread.

        Copying it here for reference (use your file names) :

        Code:
        $ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.file> with_chr.file
        Last edited by GenoMax; 04-12-2016, 06:22 AM.

        Comment


        • #5
          Thanks GenoMax and Micheal

          I tried the other reference genome with same chromosome annotation but it still did not work

          Below is the error message

          > se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE,suppressWarnings())
          Error in validObject(.Object) :
          invalid class “SummarizedExperiment0” object: 'assays' nrow differs from 'mcols' nrow


          the heads of both the bam and the gtf files are as below:

          [rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
          @HD VN:1.0 SO:coordinate
          @RG ID:0 SM:Ctrl13058-Trimmed PI:50
          @SQ SN:chrM LN:16571
          @SQ SN:chr1 LN:249250621
          @SQ SN:chr2 LN:243199373
          @SQ SN:chr3 LN:198022430
          @SQ SN:chr4 LN:191154276
          @SQ SN:chr5 LN:180915260
          @SQ SN:chr6 LN:171115067
          @SQ SN:chr7 LN:159138663
          [rshukla@dev01 test]$ head Human_hg19.gtf
          chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
          chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
          chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
          chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
          chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
          chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
          chr1 hg19_knownGene start_codon 12190 12192 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          chr1 hg19_knownGene CDS 12190 12227 0.000000 + 0 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          chr1 hg19_knownGene CDS 12595 12721 0.000000 + 1 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          [rshukla@dev01 test]$


          As per Genomaxs suggestion I am also downloading reference genome from basespace, surprisingly it is 42.3GB !

          any further suggestions?

          Ram

          Comment


          • #6
            @Ram you should not need to get the reference genome. The error above is not related to the original issue but an error with your DESeq2 command.

            Comment


            • #7
              Thanks, what changes should I make to the command?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Developments in Metagenomics
                by seqadmin





                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                09-23-2024, 06:35 AM
              • seqadmin
                Understanding Genetic Influence on Infectious Disease
                by seqadmin




                During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                09-09-2024, 10:59 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 04:51 AM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-01-2024, 07:10 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-30-2024, 08:33 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-26-2024, 12:57 PM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Working...
              X