Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BAM files from RNAseq -Alignment (Basespace) not working with DESEq2 in R

    Hello experts,

    With my recent run in illumina Hiseq2500 I aligned the FASTAQ files generated in basespace to human reference genome (Homosapieans (PAR masked)/hg19 (Refseq) using the RNA-seq Alignment App.I used Tophat (Bowtie2) aligner for the same. except for "NO" to the stranded information, No other parameters were selected.
    I downloaded the BAM files generated and used them in the Bioconductor RNAseq workflow (Using DESeq2)
    Here we walk through an end-to-end gene-level RNA-seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were aligned to the reference genome, and prepare a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results.


    to create a summarized object I used "Homo_sapiens.GRCh37.75.gtf"

    below is the script:

    (txdb <- makeTxDbFromGFF("Homo_sapiens.GRCh37.75.gtf", format="gtf", circ_seqs=character()))
    (ebg <- exonsBy(txdb, by="gene"))

    se <- summarizeOverlaps(features=ebg,reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE )

    I am getting the following error message and am unable to get any count

    Warning message:
    In .Seqinfo.mergexy(x, y) :
    The 2 combined objects have no sequence levels in common. (Use
    suppressWarnings() to suppress this warning.)

    Any suggestions?

    Ram

  • #2
    Hi Ram,

    could you check if the chromosomes' name in the bam files are the same as the ones in the gtf?

    Code:
    samtools view -H my_bam.bam | head
    and
    Code:
    head Homo_sapiens.GRCh37.75.gtf
    .

    Often, one annotation is 'chr1', another 'Chr1', and sometimes just '1'. If you are mixing these up, no program or tool will find an overlap.

    Cheers,

    Michael

    Comment


    • #3
      Thanks a lot Michael,

      below is the output of the codes you suggested:

      [rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
      @HD VN:1.0 SO:coordinate
      @RG ID:0 SM:Ctrl13058-Trimmed PI:50
      @SQ SN:chrM LN:16571
      @SQ SN:chr1 LN:249250621
      @SQ SN:chr2 LN:243199373
      @SQ SN:chr3 LN:198022430
      @SQ SN:chr4 LN:191154276
      @SQ SN:chr5 LN:180915260
      @SQ SN:chr6 LN:171115067
      @SQ SN:chr7 LN:159138663

      [rshukla@dev01 test]$ head Homo_sapiens.GRCh37.75.gtf
      #!genome-build GRCh37.p13
      #!genome-version GRCh37
      #!genome-date 2009-02
      #!genome-build-accession NCBI:GCA_000001405.14
      #!genebuild-last-updated 2013-09
      1 pseudogene gene 11869 14412 . + . gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene";
      1 processed_transcript transcript 11869 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; g ene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";
      1 processed_transcript exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "1"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002234944";
      1 processed_transcript exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "2"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00003582793";
      1 processed_transcript exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "3"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002312635";


      The format in .bam file is chr1 where as in .gtf I guess it is just the number.

      Do you have any suggestion how to proceed further, I am very new to all these.

      Thanks again,

      Ram

      Comment


      • #4
        Can you download a corresponding GTF file from BaseSpace? That should sort this problem out.

        It should be possible to change the GTF file you have (from 1 to chr1 etc) but if the above works then you would know that you have a file that exactly corresponds to the genome used for alignment. Using an incorrect GTF file will lead to erroneous results.

        Edit: If changing 1 to chr 1 is the preferred option then use the solution suggested by @vivek in this thread.

        Copying it here for reference (use your file names) :

        Code:
        $ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.file> with_chr.file
        Last edited by GenoMax; 04-12-2016, 06:22 AM.

        Comment


        • #5
          Thanks GenoMax and Micheal

          I tried the other reference genome with same chromosome annotation but it still did not work

          Below is the error message

          > se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE,suppressWarnings())
          Error in validObject(.Object) :
          invalid class “SummarizedExperiment0” object: 'assays' nrow differs from 'mcols' nrow


          the heads of both the bam and the gtf files are as below:

          [rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
          @HD VN:1.0 SO:coordinate
          @RG ID:0 SM:Ctrl13058-Trimmed PI:50
          @SQ SN:chrM LN:16571
          @SQ SN:chr1 LN:249250621
          @SQ SN:chr2 LN:243199373
          @SQ SN:chr3 LN:198022430
          @SQ SN:chr4 LN:191154276
          @SQ SN:chr5 LN:180915260
          @SQ SN:chr6 LN:171115067
          @SQ SN:chr7 LN:159138663
          [rshukla@dev01 test]$ head Human_hg19.gtf
          chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
          chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
          chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
          chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
          chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
          chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
          chr1 hg19_knownGene start_codon 12190 12192 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          chr1 hg19_knownGene CDS 12190 12227 0.000000 + 0 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          chr1 hg19_knownGene CDS 12595 12721 0.000000 + 1 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
          [rshukla@dev01 test]$


          As per Genomaxs suggestion I am also downloading reference genome from basespace, surprisingly it is 42.3GB !

          any further suggestions?

          Ram

          Comment


          • #6
            @Ram you should not need to get the reference genome. The error above is not related to the original issue but an error with your DESeq2 command.

            Comment


            • #7
              Thanks, what changes should I make to the command?

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM
              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-24-2024, 07:15 AM
              0 responses
              198 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-23-2024, 10:28 AM
              0 responses
              220 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-23-2024, 07:35 AM
              0 responses
              229 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-22-2024, 02:06 PM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Working...
              X