Announcement

Collapse
No announcement yet.

rDiff - error while getting gene expression

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tomnl
    replied
    Hi Vipin

    Sorry for the delay in replying. I have attached a file of the first few lines of the GTF/GFF files I have used.

    I should mention: I have managed to get the program to run successfully using test files of very limited size (~50 KB a BAM file). This was using the ensembl GTF downloaded from UCSC and then GTF/GFF conversion using GFF converter.

    When I attempt with larger files e.g over 2 GB a BAM file. I get the following error message:
    error: memory exhausted or requested size too large for range of Octave's index type -- eval failed

    Best regards
    Attached Files

    Leave a comment:


  • vipints
    replied
    Hello Tom,

    Can you please please post first 5 lines (uncommented) from the GTF/GFF file.

    Thanks, Vipin

    Leave a comment:


  • Tomnl
    replied
    Hi Philipp

    Thanks again for your reply. I tried with a number of different GFF3 files and using a number of different GTF2/GFF3 converters see below... but still no luck.

    Would you recommend any specific GFF3/GTF files for the mm10 mouse genome?

    I have used the following GFF3 files:

    ftp://ftp.ncbi.nlm.nih.gov/genomes/M..._level.gff3.gz

    ftp://ftp.ncbi.nlm.nih.gov/genomes/M...ffolds.gff3.gz


    I have used the following GTF2/GFF3 converters:

    The GFF toolkit from the mskcc galaxy webserver linked from the rDiff website https://galaxy.cbio.mskcc.org/

    The python script which comes with SpliceGrapher-0.2.2 (gtf2gff.py)

    The gffread tool which comes with cufflinks

    Converter tools used with the following GTF files:

    NCBI gtf file provided by cufflinks http://cufflinks.cbcb.umd.edu/igenomes.html
    ensembl genes downloaded fro UCSC http://genome.ucsc.edu/cgi-bin/hgTab...mblGenes.fasta

    I have attached a file which contains some of the error codes associated with some of the attempts I have made.

    I have tried to avoid having to edit the GFF and replace the start and the stop location for each gene with the smallest resp. largest exon position. As it seems that it indicates that the GFF file is not correct. Although if there is no other option that is what I will do.

    I should note that when I use the GFF toolkit conversion tool kit I always get exon and mRNA coordinates which lie outside the gene coordinate. When I use the gffread conversion tool I get the following rDiff error "child may be mapped to multiple parents ex: Parent=AT01,AT01-1-Protein."

    Kind regards

    Tom
    Attached Files

    Leave a comment:


  • philippd
    replied
    Hi Tom,

    I think that your GFF3-file is not formatted correctly. I saw that sometimes the exon and mRNA coordinates lie outside the gene coordinates( e.g for some genes the exons end after the gene), which should normally not happen.
    What you could to is to either download a GFF3-file where this is not the case or replace for each gene the start and the stop with the smallest resp. largest exon position.

    Kind regards,
    Philipp

    Leave a comment:


  • Tomnl
    replied
    Hi philippd,

    Thank you for the quick response.

    I have attached a text file of the output, the first 1000 lines of the GFF3 file and the first 1000 lines of one of the BAM files. I have also attached the output of the first example (used in the make example command) in case that may be of any use.

    I note that on the example BAM files there are no qual and sequence strings. Do the BAM files need to be processed in any specific way?

    The BAM location /NGS/users/Thomas/rDiff//wt_7.bam is correct. Except for the // should be a /

    The command I showed previously was shortened. The full command is shown below:
    /NGS/Software/rDiff-master/bin/rdiff -o /NGS/users/Thomas/rDiff/output/ -d /NGS/users/Thomas/rDiff/ -a wt_7.bam,wt_8.bam,wt_203.bam -b mut_2.bam,mut_201.bam,mut_204.bam -g /NGS/users/Thomas/Transcripts/genes_mm10.gff3 -m param -L 51 -m 30


    I look forward to hearing your reply.

    Kind regards

    Tom
    Attached Files

    Leave a comment:


  • philippd
    replied
    The command seems to be right although the path to the bam-file seems strange. Is the bam-file located at: /NGS/users/Thomas/rDiff//wt_7.bam ?

    Could you maybe also send me the complete output of the rDiff run as well as the first 1000 lines of your gff3-file( or the part where you believe that the problem is)?

    Leave a comment:


  • Tomnl
    started a topic rDiff - error while getting gene expression

    rDiff - error while getting gene expression

    Hi everybody,

    I was wondering if anybody could help with an error message I am receiving for the differential isoform analysis software rDiff.

    http://cbio.mskcc.org/public/raetsch...r/drewe/rdiff/

    The error message is:
    Getting gene expression for: /NGS/users/Thomas/rDiff//wt_7.bam
    error: convert_reads_to_region_indicators: A(I,J): column index out of bounds; value 11054 out of bound 7483
    error: called from:
    error: /NGS/Software/rDiff-master/src/tools/convert_reads_to_region_indicators.m at line 16, column 19
    error: /NGS/Software/rDiff-master/src/get_reads_caller.m at line 71, column 48
    error: /NGS/Software/rDiff-master/src/get_read_counts.m at line 32, column 17
    error: /NGS/Software/rDiff-master/src/rdiff.m at line 38, column 5



    Any help would be greatly appreciated.

    Additional info:

    The command given was as follows:
    ./rdiff -o output/ -d files/ -a wt_7.bam,wt_8.bam,wt_203.bam -b mut_2.bam,mut_201.bam,mut_204.bam -g genes_mm10.gff3 -m param -L 51 -m 30

    The same error occurs when using both param and non param.

    The bam files were generated by TopHat.

    The .gff3 file was generated by converting the .gtf file provided by TopHat http://tophat.cbcb.umd.edu/igenomes.shtml for mus musculus NCBI.
Working...
X