Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Abyss de novo assembly question

    Sorry,

    I had to re-post this message here .....


    Hi everyone;

    I have pair-end reads (length 26 bases) Illumina fastq files and I want to perform de novo assembly. I have installed ABYss within my local Galaxy instance.... Now I have two questions:

    1. When I ran it from within Galaxy I get Errors and here is the error message
    ****************
    ABYSS -k15 -q3 --coverage-hist=coverage.hist -s abyss-bubbles.fa -o abyss-1.fa /usr/local/galaxy/galaxy-dist/database/files/001/dataset_1857.dat /usr/local/galaxy/galaxy-dist/database/files/001/dataset_1858.dat
    ABySS 1.3.5
    ABYSS -k15 -q3 --coverage-hist=coverage.hist -s abyss-bubbles.fa -o abyss-1.fa /usr/local/galaxy/galaxy-dist/database/files/001/dataset_1857.dat /usr/local/galaxy/galaxy-dist/database/files/001/dataset_1858.dat
    Reading `/usr/local/galaxy/galaxy-dist/database/files/001/dataset_1857.dat'...
    Reading `/usr/local/galaxy/galaxy-dist/database/files/001/dataset_1858.dat'...
    Loaded 522031 k-mer
    Minimum k-mer coverage is 20
    Using a coverage threshold of 7...
    The median k-mer coverage is 45
    The reconstruction is 31830
    The k-mer coverage threshold is 6.7082
    Setting parameter e (erode) to 7
    Setting parameter E (erodeStrand) to 1
    Setting parameter c (coverage) to 6.7082
    Generating adjacency
    Added 1152045 edges.
    Eroding tips
    Eroded 395857 tips.
    Eroded 0 tips.
    Pruning tips shorter than 1 bp...
    Pruned 11 k-mer in 11 tips.
    Pruning tips shorter than 2 bp...
    Pruned 6 k-mer in 3 tips.
    Pruning tips shorter than 4 bp...
    Pruning tips shorter than 8 bp...
    Pruned 10 k-mer in 2 tips.
    Pruning tips shorter than 15 bp...
    Pruned 16 tips in 4 rounds.
    Marked 102639 edges of 40809 ambiguous vertices.
    Removing low-coverage contigs (mean k-mer coverage < 6.7082)
    Found 126130 k-mer in 38312 contigs before removing low-coverage contigs.
    Removed 95234 k-mer in 19172 low-coverage contigs.
    Split 38343 ambigiuous branches.
    Eroding tips
    Eroded 715 tips.
    Eroded 0 tips.
    Pruning tips shorter than 1 bp...
    Pruned 54 k-mer in 54 tips.
    Pruning tips shorter than 2 bp...
    Pruned 28 k-mer in 19 tips.
    Pruning tips shorter than 4 bp...
    Pruned 69 k-mer in 23 tips.
    Pruning tips shorter than 8 bp...
    Pruned 67 k-mer in 12 tips.
    Pruning tips shorter than 15 bp...
    Pruned 110 k-mer in 11 tips.
    Pruning tips shorter than 15 bp...
    Pruned 119 tips in 5 rounds.
    Popping bubbles
    Removed 4 bubbles.
    Removed 4 bubbles
    Marked 18979 edges of 4979 ambiguous vertices.
    Left 18 unassembled k-mer in circular contigs.
    Assembled 29791 k-mer in 2867 contigs.
    Removed 492161 k-mer.
    The signal-to-noise ratio (SNR) is -12.1687 dB.
    AdjList -k15 -m50 abyss-1.fa >abyss-1.adj
    abyss-filtergraph -k15 -g abyss-2.adj abyss-1.adj >abyss-1.path
    PopBubbles -j2 -k15 -p0.9 -g abyss-3.adj abyss-1.fa abyss-2.adj >abyss-2.path
    MergeContigs -k15 -o abyss-3.fa abyss-1.fa abyss-2.adj abyss-2.path
    awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
    abyss-2.path abyss-1.fa >abyss-indel.fa
    ln -sf abyss-3.fa abyss-unitigs.fa
    abyss-map -j2 -l15 /usr/local/galaxy/galaxy-dist/database/files/001/dataset_1857.dat /usr/local/galaxy/galaxy-dist/database/files/001/dataset_1858.dat abyss-3.fa \
    |abyss-fixmate -l15 -h abyss-3.hist \
    |sort -snk3 -k4 \
    |DistanceEst -j2 -k15 -l15 -s200 -n10 -o abyss-3.dist abyss-3.hist
    ABORTING

    **********

    If I copy and paste this command and run it from the command line i get no such message....and all output including contig file get generated even though, it seems none of these contigs map to my reference sequence but that is different question.

    2. Is there better tool than this? I tried MIRA but got some issues with it but I am thinking of PHRAP too.

    Can someone help me with this?

    Regards,

  • #2
    For very short Illumina reads you could try Velvet.

    Comment


    • #3
      Thank you Mastal,

      I installed the Velvet but I keep getting No Peak results. Since I have no clue what most of these parameters mean, I used the default given parameters. My reads are short (26 bases long). Is there anything you recommend I should do to get any contigs.

      I appreciate your help....

      Comment


      • #4
        Abyss de novo assembly question

        What do you mean by No Peak results, do you mean that velvet gives you an empty contigs.fa file?

        What commands (velveth and velvetg) and parameters did you use, and what was the last line in the Log file produced by velvet?

        Comment


        • #5
          Yes, empty result. I am using the Galaxy Shed Tool and I am running velveth and here are the parameters that get set:
          Input Parameter Value
          Hash length. Odd numbers only. Maximum 75. 21
          All libraries strand-specific? False
          Short Library Type -shortPaired
          File Type -fastq
          File 2: 87137759_S2_L001_R1_001.fastq
          File Type -fastq
          File 3: 87137759_S2_L001_R2_001.fastq
          Short2 Library Type -shortPaired2
          Short3 Library Type -shortPaired3
          Short4 Library Type -shortPaired4
          Short5 Library Type -shortPaired5
          Long Library Type -longPaired

          And the the parameters for the Velvetg (default parameters)
          Input Parameter Value
          velvet hash 114: velveth on data 2 and data 3
          [-ins_length] Insert length (bp) of short library auto
          [ins_length_sd] Insert length standard deviation (bp) of short library; requires above auto
          [ins_length2] Insert length (bp) of short2 library auto
          [-ins_length2_sd] Insert length standard deviation (bp) of short2 library; requires above auto
          [ins_length3] Insert length (bp) of short3 library auto
          [-ins_length3_sd] Insert length standard deviation (bp) of short3 library; requires above auto
          [ins_length4] Insert length (bp) of short4 library auto
          [-ins_length4_sd] Insert length standard deviation (bp) of short4 library; requires above auto
          [ins_length5] Insert length (bp) of short5 library auto
          [-ins_length5_sd] Insert length standard deviation (bp) of short5 library; requires above auto
          [ins_length_long] Insert length (bp) of long library auto
          [-ins_length_sd_long] Insert length standard deviation (bp) of long library; requires above auto
          [-exp_cov] Expected short read k-mer coverage -1
          [-cov_cutoff] Removal of low coverage nodes AFTER tour bus -1
          [-long_cov_cutoff] Removal of low long-read coverage nodes AFTER tour bus -1.0
          [-max_coverage] Exclude highly covered data from your assembly (e.g. plasmid, mitochondrial, and chloroplast sequences) -1.0
          Minimum contig length -1
          Scaffolding True
          Maximum branch length 100
          Maximum divergence rate 0.2
          Maximum gap count 3
          Minimum long read connection cutoff 2
          Minimum Read-Pair Validation 10
          Export unused reads True
          [-read_trkg] tracking of short read positions in assembly False
          [-amos_file] export assembly to AMOS file False
          [-alignments] export a summary of contig alignment to the reference sequences False
          [-exportFiltered] export the long nodes which were eliminated by the coverage filters False

          I wish I can send you more logs but I do not see any logs from galaxy/history...

          Regards

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          9 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          53 views
          0 likes
          Last Post seqadmin  
          Working...
          X