Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bbmap error in output file

    Hi I am getting an error when running bbmap. Any ideas what I am doing wrong?
    Also how do I check the mapping stats?

    This is what I am running:

    Code:
    #!/bin/bash
    cd /space/home/aguilar/Ofav_temp/Trim
    /space/home/aguilar/Programs/bbmap/bbwrap.sh t=40 in=\
    S1_F_paired_1.fq,S10_F_paired_1.fq,S11_F_paired_1.fq,S12_F_paired_1.fq,S13_F_paired_1.fq,S14_F_paired_1.fq,S15_F_paired_1.fq,S16_F_paired_1.fq,S17_F_paired_1.fq,S18_F_paired_1.fq,S19_F_paired_1.fq,S2_F_paired_1.fq,S20_F_paired_1.fq,S21_F_paired_1.fq,S22_F_paired_1.fq,S23_F_paired_1.fq,S24_F_paired_1.fq,S25_F_paired_1.fq,S26_F_paired_1.fq,S27_F_paired_1.fq,S28_F_paired_1.fq,S29_F_paired_1.fq,S3_F_paired_1.fq,S30_F_paired_1.fq,S31_F_paired_1.fq,S32_F_paired_1.fq,S33_F_paired_1.fq,S34_F_paired_1.fq,S35_F_paired_1.fq,S36_F_paired_1.fq,S37_F_paired_1.fq,S38_F_paired_1.fq,S39_F_paired_1.fq,S4_F_paired_1.fq,S40_F_paired_1.fq,S41_F_paired_1.fq,S42_F_paired_1.fq,S43_F_paired_1.fq,S44_F_paired_1.fq,S45_F_paired_1.fq,S46_F_paired_1.fq,S47_F_paired_1.fq,S48_F_paired_1.fq,S5_F_paired_1.fq,S6_F_paired_1.fq,S7_F_paired_1.fq,S8_F_paired_1.fq,S9_F_paired_1.fq \
    in2=S1_R_paired_2.fq,S10_R_paired_2.fq,S11_R_paired_2.fq,S12_R_paired_2.fq,S13_R_paired_2.fq,S14_R_paired_2.fq,S15_R_paired_2.fq,S16_R_paired_2.fq,S17_R_paired_2.fq,S18_R_paired_2.fq,S19_R_paired_2.fq,S2_R_paired_2.fq,S20_R_paired_2.fq,S21_R_paired_2.fq,S22_R_paired_2.fq,S23_R_paired_2.fq,S24_R_paired_2.fq,S25_R_paired_2.fq,S26_R_paired_2.fq,S27_R_paired_2.fq,S28_R_paired_2.fq,S29_R_paired_2.fq,S3_R_paired_2.fq,S30_R_paired_2.fq,S31_R_paired_2.fq,S32_R_paired_2.fq,S33_R_paired_2.fq,S34_R_paired_2.fq,S35_R_paired_2.fq,S36_R_paired_2.fq,S37_R_paired_2.fq,S38_R_paired_2.fq,S39_R_paired_2.fq,S4_R_paired_2.fq,S40_R_paired_2.fq,S41_R_paired_2.fq,S42_R_paired_2.fq,S43_R_paired_2.fq,S44_R_paired_2.fq,S45_R_paired_2.fq,S46_R_paired_2.fq,S47_R_paired_2.fq,S48_R_paired_2.fq,S5_R_paired_2.fq,S6_R_paired_2.fq,S7_R_paired_2.fq,S8_R_paired_2.fq,S9_R_paired_2.fq \
    ref=/space/home/aguilar/Ofav_temp/Genomes/Orbicella_faveolata_v2_scaffolds.fa \
    outu=/space/home/aguilar/Ofav_temp/bbmap/ReadsUnm.R1.fastq.gz \
    outu2=/space/home/aguilar/Ofav_temp/bbmap/ReadsUnmR2.fastq.gz \
    outm=/space/home/aguilar/Ofav_temp/bbmap/ReadsMappedR1.fastq.gz \
    outm2=/space/home/aguilar/Ofav_temp/bbmap/ReadsMappedR2.fastq.gz \
    This is the error:

    Code:
    Retaining first best site only for ambiguous mappings.
    No output file.
    Exception in thread "main" java.lang.AssertionError: ASCII encoding for quality (currently ASCII-33) appears to be wrong.
    Thanks

    Comment


    • Negative Array Size in BBMap

      Hello,

      I'm trying to map 16 paired sequences. I've already trimmed the adapters using trimgalore!

      Input (scheduled job run on a server):

      Code:
      for i in "${samples[@]}"; do
      bbmap.sh ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in="$i"_R1_001_val_1.fa.gz in2="$i"_R2_001_val_2.fa.gz out="$i".sam
      done
      Requested 16 cpus, 80g RAM, 24h runtime.

      Output:

      Code:
      module: loading 'java/8u111'
      module: loading 'bbmap/38.23'
      module: loading 'samtools/1.9'
      java -ea -Xmx47593m -cp /gpfs/runtime/opt/bbmap/38.23/bin/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.sam
      Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa, in=PoolCH-1_R1_001_val_1.fa.gz, in2=PoolCH-1_R2_001_val_2.fa.gz, out=PoolCH-1.sam]
      Version 38.24
      
      Retaining first best site only for ambiguous mappings.
      Writing reference.
      Executing dna.FastaToChromArrays2 [/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
      
      Set genScaffoldInfo=true
      Writing chunk 1
      Set genome to 1
      
      Loaded Reference:	0.057 seconds.
      Loading index for chunk 1-1, build 1
      No index available; generating from reference genome: /gpfs/data/mtatar/chutfilz/trimgalore/ref/index/1/chr1_index_k13_c4_b1.block
      Indexing threads started for block 0-1
      Indexing threads finished for block 0-1
      Generated Index:	15.892 seconds.
      Analyzed Index:   	2.851 seconds.
      Started output stream:	1.332 seconds.
      Cleared Memory:    	0.264 seconds.
      Processing reads in paired-ended mode.
      Started read stream.
      Started 16 mapping threads.
      [B]Exception in thread "Thread-28" java.lang.NegativeArraySizeException[/B]
      	at java.util.Arrays.copyOf(Arrays.java:3236)
      	at shared.KillSwitch.copyOf(KillSwitch.java:294)
      	at stream.FastaReadInputStream.fillBuffer(FastaReadInputStream.java:447)
      	at stream.FastaReadInputStream.nextHeader(FastaReadInputStream.java:290)
      	at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:174)
      	at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:107)
      	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
      	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)
      [B]Exception in thread "Thread-27" java.lang.NegativeArraySizeException[/B]
      	at java.util.Arrays.copyOf(Arrays.java:3236)
      	at shared.KillSwitch.copyOf(KillSwitch.java:294)
      	at stream.FastaReadInputStream.fillBuffer(FastaReadInputStream.java:447)
      	at stream.FastaReadInputStream.nextHeader(FastaReadInputStream.java:290)
      	at stream.FastaReadInputStream.fillList(FastaReadInputStream.java:174)
      	at stream.FastaReadInputStream.hasMore(FastaReadInputStream.java:107)
      	at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
      	at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)
      PoolCH-1.sam has begun to write, but it's been five hours (these are Drosophila reads), and no progress on the other 15 paired fasta files.

      Where is the negative array???

      Comment


      • Have you verified that the $i variable is expanding correctly?

        If you have multiple files to align you should first create an index by doing

        Code:
        bbmap.sh ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa
        this will create a "ref" directory with all index files. Don't worry about what is in the directory (bbmap uses it own organization).

        Then when you run use
        Code:
        path=dir_containing_ref_dir
        in command line, instead of "ref=". This will avoid re-indexing the genome each time.

        If you are using a job scheduler then you should submit each alignment job separately (not the way you have the loop setup, which I assume is submitted as a single job?)

        Comment


        • Wow! Thanks for the fast reply. I've pared down my job to just one set of directly-called, paired files to eliminate the possibility of a malfunctioning $i.

          Code:
          bbmap.sh [B]ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa[/B] in=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.sam
          The ref assignment was my first argument of the previous command as well, and I've retained it in this run (deleted the previous 'ref' folder first, from the previous failed run).

          No use in posting the error I received - it's exactly the same as before!

          Comment


          • I see that you are using samtools module as well. With that you can directly write BAM files no need to use SAM.

            So let us try a modified command line and see what happens (I am going to assume that you have ~30G of RAM and 4 cores available for this job in command below and the two fastq files are in the current directory). dm3.fa is just a multi-fasta file of Drosophila chromosomes?

            Code:
            bbmap.sh -Xmx30g threads=4 ref=/users/chutfilz/data/chutfilz/Dm3_Index/dm3.fa in1=PoolCH-1_R1_001_val_1.fa.gz in2=PoolCH-1_R2_001_val_2.fa.gz out=PoolCH-1.bam ambig=random maxindel=10000 trd=t

            Comment


            • No dice, same error.

              dm3.fa is a file under the subdirectory "WholeGenomeFasta" in the file set downloaded from iGenomes.

              Also in this subdirectory are files ending in .dict, .fa.fai, and an xml for genome size. I only imported the .fa to my institution's server for mapping purposes.

              cat dm3.fa reveals unannotated sequence, as expected, as well as a few stretches of ~1,000 Ns.

              Comment


              • How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.
                Last edited by ssully; 03-17-2019, 06:43 PM.

                Comment


                • Originally posted by ssully View Post
                  How do I run the basic BBmap.sh script in Windows? What is the corresponding java command? I want to align four sets of paired-end Illumina RNA-Seq reads to a genome assembly. I am particularly concerned to correctly identify introns, as this genome is thought to have only a few intron-containing genes.

                  If your OS does not support shellscripts, replace 'bbmap.sh' like this:
                  Code:
                  java -XmxNNg -cp /path/to/current align2.BBMap in=reads.fq out=mapped.sam
                  (NN will be a real number on your system).

                  Comment


                  • I'm wondering if anyone tried to apply callvariants.sh to RNA-seq data. When I tried to use it with my bam file, it found about 140,000 variants - but all of them are homozygous which is obviously impossible. I guess I should play with the parameters somehow...

                    Comment


                    • RNAseq data analysis failed with BBMAP

                      Dear Brain,

                      BBMAP is great for mapping coverage and mapping speed. I have tried several times but failed. The versions of bbmap and samtools are 38.22 and 0.1.9, respectively. My data is RNA seq generated using human cell lines. The command lines and output are listed below:

                      bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa

                      $bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh

                      ## a.fq has been trimmed using trim-galore and dynamictrim. The sequencer is illumila hiseq.

                      $samtools flagstat a.bbmap.sam
                      [bam_header_read] EOF marker is absent.
                      [bam_header_read] invalid BAM binary header (this is not a BAM file).
                      [bam_flagstat_core] Truncated file? Continue anyway.
                      0 in total
                      0 QC failure
                      0 duplicates
                      0 mapped (-nan%)
                      0 paired in sequencing
                      0 read1
                      0 read2
                      0 properly paired (-nan%)
                      0 with itself and mate mapped
                      0 singletons (-nan%)
                      0 with mate mapped to a different chr
                      0 with mate mapped to a different chr (mapQ>=5)


                      $more a.bbmap.sam
                      @HD VN:1.4 SO:unsorted
                      @SQ SN:1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF LN:248956422
                      @SQ SN:10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF LN:133797422
                      @SQ SN:11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF LN:135086622
                      @SQ SN:12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF LN:133275309
                      @SQ SN:13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF LN:114364328
                      @SQ SN:14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF LN:107043718
                      @SQ SN:15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF LN:101991189
                      @SQ SN:16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF LN:90338345
                      .......................
                      ......................[omit other lines]
                      @PG ID:BBMap PN:BBMap VN:38.22 CL:java -Djava.library.path=/path/bbmap-38.22-1/jni/ -ea -Xmx158342m align2.BBMap
                      build=1 overwrite=true fastareadlen=500 maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbma
                      p.fq bs=script.sh
                      E00603:213:HVLFGCCXY:1:1101:20172:9431 1:N:0:ACGGAACA 16 5 dna:chromosome chromosome:GRCh38:5:1:181538259:1 REF 14481853 42 44= * 0 0 CAGAAACAAGCAGGACCGGGCTTTGTCTCTTGGGCCCAGTACTG FA<JJJAJJJAJFA7FJJJJFJFJJJJJFJJJJF7JJFJJJFFF NM:i:0 AM:i:42 XM:i:1 NH:i:1
                      E00603:213:HVLFGCCXY:1:1101:17056:10081 1:N:0:ACGGAACA 4 * 0 0 * * 0 0 AAGCAAGTCTTTATCTTTAGAATAAATGTAGT JJJJ7FAFFFFJJJJJAJJJJJJJJJJJJJ77
                      .......................
                      ......................[omit other lines]


                      $sh script.sh
                      Note: This script is designed to run with the amount of memory detected by BBMap.
                      If Samtools crashes, please ensure you are running on the same platform as BBMap,
                      or reduce Samtools' memory setting (the -m flag).
                      Note: Please ignore any warnings about 'EOF marker is absent'; this is a bug in samtools that occurs when using piped input.
                      [samopen] SAM header is present: 194 sequences.
                      sort: invalid option -- '@'
                      Parse error at line 197: invalid CIGAR character
                      open: No such file or directory
                      Aborted (core dumped)
                      [bam_sort_core] fail to open file 3
                      open: No such file or directory
                      [bam_index_build2] fail to open the BAM file.


                      Could you give some suggestions? Thanks a lot.

                      Comment


                      • @seqmore: I don't see you actually providing the sequence index you made in step 1 to your bbmap.sh command.

                        It would be included using "path=dir_with_index" in

                        Code:
                        $bbmap.sh maxindel=200k intronlen=20 ambig=all xstag=unstranded xmtag=t in=a.fq out=a.bbmap.sam outu=a.unbbmap.fq bs=script.sh

                        Comment


                        • Thank you for your kindly replay @GenoMax. I didnot specify ref= since I have copied the ref fold genereted by index building with bbmap to the current working directory. As I learned from your post, bbmap will automatically find the ref fold in the current directory. I also succeeded in this way for many times previously. Now, as you indicate, I rerun the command again with ref= specified, but I failed as above. I should mention that the screen output looks like normal, as shown below.
                          So I'm confused. I would be really appreciate if you could clarify this issue. Thanks a lot in advance.

                          The screen output during bbmap:
                          Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, maxindel=100k, intronlen=10, in=a.fq, out=a.bb.sam, outu=a.unbbmap.fq, bs=script.sh]
                          Version 38.22

                          Retaining first best site only for ambiguous mappings.
                          Found samtools 0.1.9
                          Writing reference.
                          Executing dna.FastaToChromArrays2 [/mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.dna.primary_assembly.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]

                          Set genScaffoldInfo=true
                          Writing chunk 1
                          Writing chunk 2
                          Writing chunk 3
                          Writing chunk 4
                          Writing chunk 5
                          Writing chunk 6
                          Writing chunk 7
                          Set genome to 1

                          Loaded Reference: 0.010 seconds.
                          Loading index for chunk 1-7, build 1
                          No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr1-3_index_k13_c2_b1.block
                          No index available; generating from reference genome: /mnt/e/Raw_seq/ref/index/1/chr4-7_index_k13_c2_b1.block
                          Indexing threads started for block 4-7
                          Indexing threads started for block 0-3
                          Indexing threads finished for block 0-3
                          Indexing threads finished for block 4-7
                          Generated Index: 213.256 seconds.
                          Finished Writing: 19.955 seconds.
                          Analyzed Index: 7.710 seconds.
                          Started output stream: 0.045 seconds.
                          Started output stream: 0.001 seconds.
                          Cleared Memory: 0.241 seconds.
                          Processing reads in single-ended mode.
                          Started read stream.
                          Started 56 mapping threads.
                          Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55

                          ------------------ Results ------------------

                          Genome: 1
                          Key Length: 13
                          Max Indel: 100000
                          Minimum Score Ratio: 0.56
                          Mapping Mode: normal
                          Reads Used: 1236379 (56305606 bases)

                          Mapping: 163.878 seconds.
                          Reads/sec: 7544.53
                          kBases/sec: 343.58


                          Read 1 data: pct reads num reads pct bases num bases

                          mapped: 42.1017% 520537 41.2560% 23229413
                          unambiguous: 28.9375% 357777 29.3317% 16515379
                          ambiguous: 13.1642% 162760 11.9243% 6714034
                          low-Q discards: 0.0000% 0 0.0000% 0

                          perfect best site: 34.4476% 425903 34.5461% 19451376
                          semiperfect site: 34.4530% 425970 34.5520% 19454715

                          Match Rate: NA NA 45.7553% 22917698
                          Error Rate: 7.7666% 94588 54.2440% 27169499
                          Sub Rate: 7.4116% 90264 0.6028% 301949
                          Del Rate: 0.7988% 9728 53.6224% 26858156
                          Ins Rate: 0.3819% 4651 0.0188% 9394
                          N Rate: 0.0053% 64 0.0007% 372
                          Splice Rate: 0.4858% 5917 (splices at least 10 bp)

                          Total time: 438.182 seconds.

                          Comment


                          • I am not sure I am understanding what seems to be happening. Is the flagstat command showing no reads aligned?

                            At this point in time samtools 0.1.19 is ancient and should really NOT be used for anything. Errors you are seeing also are about samtools options that only the new versions have.

                            You should upgrade to latest samtools which is now in v.1.9. As long as samtools is in your $PATH, BBMap is able to directly write BAM files so there is no need to create SAM files. Just specify out=yourfile.bam.

                            Comment


                            • @GenoMax, Your suggestion is great! I uninstall Samtools and reinstall 1.9. The samtools flagstat is working. Then I try output bam directly as you suggested, like this:
                              bbmap.sh ref=Homo_sapiens.GRCh38.dna.primary_assembly.fa ambig=all xstag=unstranded xmtag=t maxindel=100k intronlen=10 in=a.fq out=bbmap.bam outu=unbbmap.fq

                              Next, I perform cufflinks using the bam. The command line is
                              cufflinks bbmap.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
                              The std output:
                              Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
                              [09:21:15] Loading reference annotation.
                              [09:21:42] Inspecting reads and determining fragment length distribution.
                              > Processed 37488 loci. [*************************] 100%
                              > Map Properties:
                              > Normalized Map Mass: 0.00
                              > Raw Map Mass: 0.00
                              > Fragment Length Distribution: Truncated Gaussian (default)
                              > Default Mean: 200
                              > Default Std Dev: 80
                              [09:21:46] Estimating transcript abundances.
                              > Processed 37488 loci. [*************************] 100%

                              The transcripts.gtf looks strange with all FPKM=0
                              $more ./cuff_out/transcripts.gtf
                              1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
                              "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                              1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
                              000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                              1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
                              000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                              1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
                              000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                              1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
                              "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

                              I try two sort methods to sort the bam file, one is like "$sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort", and the other is "$samtools sort -n bbmap.bam >bbmap.sortn.bam". Both are failed to get FPKM values.
                              However, I get normal FPKMs by Tophat2 using the same set of genome assembly and gtf annotation.

                              Any comments or suggestions are greatly appreciated.

                              Comment


                              • additional information:
                                The outputs by both sort methods are list below. Sorry to put so mach words here. In fact I have been tortured by this error for several weeks but cannot figure it out by myself. I would be very grateful if GenoMax or Brian or anyone could shed light on this issue. Thanks a lot!

                                sort method #1:
                                $sort -k 3,3 -k 4,4n bbmap.bam >bbmap.bam.sort

                                $cufflinks bbmap.bam.sort -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff_out
                                Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
                                [bam_header_read] EOF marker is absent. The input is probably truncated.
                                [bam_header_read] invalid BAM binary header (this is not a BAM file).
                                File bbmap.bam.sort doesn't appear to be a valid BAM file, trying SAM...
                                [10:04:49] Loading reference annotation.
                                [10:05:15] Inspecting reads and determining fragment length distribution.
                                SAM error on line 148483: CIGAR op has zero length
                                SAM error on line 171044: CIGAR op has zero length
                                SAM error on line 172120: CIGAR op has zero length
                                SAM error on line 173571: CIGAR op has zero length
                                SAM error on line 186806: CIGAR op has zero length
                                "#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV÷5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856éO?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
                                SAM error on line 191159: CIGAR op has zero length
                                SAM error on line 199865: CIGAR op has zero length
                                SAM error on line 213871: CIGAR op has zero length
                                > Processed 37488 loci. [*************************] 100%
                                > Map Properties:
                                > Normalized Map Mass: 0.00
                                > Raw Map Mass: 0.00
                                > Fragment Length Distribution: Truncated Gaussian (default)
                                > Default Mean: 200
                                > Default Std Dev: 80
                                [10:05:19] Estimating transcript abundances.
                                SAM error on line 401632: CIGAR op has zero length
                                SAM error on line 424193: CIGAR op has zero length
                                SAM error on line 425269: CIGAR op has zero length
                                SAM error on line 426720: CIGAR op has zero length
                                SAM error on line 439955: CIGAR op has zero length
                                "#?4@K*?-=WnOr^{W"'43I)5$)0Kn?`N n?aNI-l?e?$ T5W$)1D>DN,_e%O?X4V?37YN4p`mlm&c_{?N8MkNg>[&Ws/0t,FjfTr"iWZ0:L;v0 r^}w\2fBZ 0RCq,0$(07W-?+pE4WK41~[ATQIUv?#W?-Pr11???AFiGFdYV÷5v/?B|l?SCM)n:<?%NdqD.N*M3>n>d,XX #N"U?TOj<856éO?7k65JC?"paj/IV[@tL;N{9]C`ndyVQ)OY&veI6nt?$Q' ?XB?B36 PT+^ -$T7]q:^36kΦi|T'w?B?CYbfb`-:P/ΟB_sWg3nYl[.8HGa搧1q/mw'ad:\Lkg8AXF}"vLo@_,hSV?af*гAFEGA[g,?o%kHb)9?@{dQ|6HvYH?ymxy)w4:3:P3Cc5T)4?z?-kWK6m<??z;7iS[iK {nYd}bi?*C21?N),-Nk6H-RW?+2o!R}?uvq/d~d?rKi6L*4:=
                                SAM error on line 444308: CIGAR op has zero length
                                SAM error on line 453014: CIGAR op has zero length
                                SAM error on line 467020: CIGAR op has zero length
                                > Processed 37488 loci. [*************************] 100%


                                $more ./cuff_out/transcripts.gtf
                                1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
                                "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
                                000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
                                000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
                                000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
                                "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";

                                sort method #2:
                                $samtools sort -n bbmap.bam >bbmap.sortn.bam

                                $cufflinks bbmap.sortn.bam -G /mnt/e/database/ensembl_grch38_gtf/Homo_sapiens.GRCh38.95.chr_patch_hapl_scaff.gtf -o cuff.sortn
                                Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
                                [09:48:22] Loading reference annotation.
                                [09:48:48] Inspecting reads and determining fragment length distribution.
                                > Processed 37488 loci. [*************************] 100%
                                > Map Properties:
                                > Normalized Map Mass: 0.00
                                > Raw Map Mass: 0.00
                                > Fragment Length Distribution: Truncated Gaussian (default)
                                > Default Mean: 200
                                > Default Std Dev: 80
                                [09:48:53] Estimating transcript abundances.

                                $ more ./cuff.sortn/transcripts.gtf
                                1 Cufflinks transcript 11869 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; FPKM "0.0000000000
                                "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks exon 11869 12227 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; FPKM "0.0
                                000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks exon 12613 12721 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; FPKM "0.0
                                000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks exon 13221 14409 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; FPKM "0.0
                                000000000"; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                1 Cufflinks transcript 12010 13670 1 + . gene_id "ENSG00000223972"; transcript_id "ENST00000450305"; FPKM "0.0000000000
                                "; frac "0.000000"; conf_lo "0.000000"; conf_hi "0.000000"; cov "0.000000";
                                ............................
                                Last edited by seqmore; 04-02-2019, 07:22 PM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X