Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rDiff - error while getting gene expression

    Hi everybody,

    I was wondering if anybody could help with an error message I am receiving for the differential isoform analysis software rDiff.



    The error message is:
    Getting gene expression for: /NGS/users/Thomas/rDiff//wt_7.bam
    error: convert_reads_to_region_indicators: A(I,J): column index out of bounds; value 11054 out of bound 7483
    error: called from:
    error: /NGS/Software/rDiff-master/src/tools/convert_reads_to_region_indicators.m at line 16, column 19
    error: /NGS/Software/rDiff-master/src/get_reads_caller.m at line 71, column 48
    error: /NGS/Software/rDiff-master/src/get_read_counts.m at line 32, column 17
    error: /NGS/Software/rDiff-master/src/rdiff.m at line 38, column 5



    Any help would be greatly appreciated.

    Additional info:

    The command given was as follows:
    ./rdiff -o output/ -d files/ -a wt_7.bam,wt_8.bam,wt_203.bam -b mut_2.bam,mut_201.bam,mut_204.bam -g genes_mm10.gff3 -m param -L 51 -m 30

    The same error occurs when using both param and non param.

    The bam files were generated by TopHat.

    The .gff3 file was generated by converting the .gtf file provided by TopHat http://tophat.cbcb.umd.edu/igenomes.shtml for mus musculus NCBI.

  • #2
    The command seems to be right although the path to the bam-file seems strange. Is the bam-file located at: /NGS/users/Thomas/rDiff//wt_7.bam ?

    Could you maybe also send me the complete output of the rDiff run as well as the first 1000 lines of your gff3-file( or the part where you believe that the problem is)?

    Comment


    • #3
      Hi philippd,

      Thank you for the quick response.

      I have attached a text file of the output, the first 1000 lines of the GFF3 file and the first 1000 lines of one of the BAM files. I have also attached the output of the first example (used in the make example command) in case that may be of any use.

      I note that on the example BAM files there are no qual and sequence strings. Do the BAM files need to be processed in any specific way?

      The BAM location /NGS/users/Thomas/rDiff//wt_7.bam is correct. Except for the // should be a /

      The command I showed previously was shortened. The full command is shown below:
      /NGS/Software/rDiff-master/bin/rdiff -o /NGS/users/Thomas/rDiff/output/ -d /NGS/users/Thomas/rDiff/ -a wt_7.bam,wt_8.bam,wt_203.bam -b mut_2.bam,mut_201.bam,mut_204.bam -g /NGS/users/Thomas/Transcripts/genes_mm10.gff3 -m param -L 51 -m 30


      I look forward to hearing your reply.

      Kind regards

      Tom
      Attached Files

      Comment


      • #4
        Hi Tom,

        I think that your GFF3-file is not formatted correctly. I saw that sometimes the exon and mRNA coordinates lie outside the gene coordinates( e.g for some genes the exons end after the gene), which should normally not happen.
        What you could to is to either download a GFF3-file where this is not the case or replace for each gene the start and the stop with the smallest resp. largest exon position.

        Kind regards,
        Philipp

        Comment


        • #5
          Hi Philipp

          Thanks again for your reply. I tried with a number of different GFF3 files and using a number of different GTF2/GFF3 converters see below... but still no luck.

          Would you recommend any specific GFF3/GTF files for the mm10 mouse genome?

          I have used the following GFF3 files:

          ftp://ftp.ncbi.nlm.nih.gov/genomes/M..._level.gff3.gz

          ftp://ftp.ncbi.nlm.nih.gov/genomes/M...ffolds.gff3.gz


          I have used the following GTF2/GFF3 converters:

          The GFF toolkit from the mskcc galaxy webserver linked from the rDiff website https://galaxy.cbio.mskcc.org/

          The python script which comes with SpliceGrapher-0.2.2 (gtf2gff.py)

          The gffread tool which comes with cufflinks

          Converter tools used with the following GTF files:

          NCBI gtf file provided by cufflinks http://cufflinks.cbcb.umd.edu/igenomes.html
          ensembl genes downloaded fro UCSC http://genome.ucsc.edu/cgi-bin/hgTab...mblGenes.fasta

          I have attached a file which contains some of the error codes associated with some of the attempts I have made.

          I have tried to avoid having to edit the GFF and replace the start and the stop location for each gene with the smallest resp. largest exon position. As it seems that it indicates that the GFF file is not correct. Although if there is no other option that is what I will do.

          I should note that when I use the GFF toolkit conversion tool kit I always get exon and mRNA coordinates which lie outside the gene coordinate. When I use the gffread conversion tool I get the following rDiff error "child may be mapped to multiple parents ex: Parent=AT01,AT01-1-Protein."

          Kind regards

          Tom
          Attached Files

          Comment


          • #6
            Hello Tom,

            Can you please please post first 5 lines (uncommented) from the GTF/GFF file.

            Thanks, Vipin

            Comment


            • #7
              Hi Vipin

              Sorry for the delay in replying. I have attached a file of the first few lines of the GTF/GFF files I have used.

              I should mention: I have managed to get the program to run successfully using test files of very limited size (~50 KB a BAM file). This was using the ensembl GTF downloaded from UCSC and then GTF/GFF conversion using GFF converter.

              When I attempt with larger files e.g over 2 GB a BAM file. I get the following error message:
              error: memory exhausted or requested size too large for range of Octave's index type -- eval failed

              Best regards
              Attached Files

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              7 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X