Header Leaderboard Ad

Collapse

de novo assembly using Velvet to reconstruct a PCR amplicon

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • de novo assembly using Velvet to reconstruct a PCR amplicon

    Hi all. I'm a newbie in de novo assembly. I have successfully installed Velvet, was able to get it to read my fastq and run of my test set, but it is not giving me the expected result.

    This sample came from a 100% clonal PCR amplicon that is ~9000bp in length. The amplicon is tagmented using Nextera XT, and put into Illumina MiSeq (multiplexed run of 96 samples). My fastq contains MiSeq non-pair-end reads with length 142bp. After de novo assembly, I am hoping to get one single contig that constructs the original 9000bp PCR amplicon. But for whatever result I am unable to get Velvet to produce contigs over like 500bp.

    Please see an example of my command below

    $ ./velveth ./assembly 21 -fastq S4411.fastq
    $ ./velvetg ./assembly

    This gave me 629 tiny contigs <500bp. There should only be one contig, and the software Geneious was able to produce that 1 contig.
    Any pointers how to work with Velvet?

    You can download the fastq file here.
    https://www.dropbox.com/s/ypb8233krv...411.fastq?dl=0

    Thank you so much in advance.

  • #2
    I suggest you give "tadpole.sh" from BBMap suite a try. It works well with small genomes. Tadpole guide is here.

    If you are looking to make velvet work then this is a moot point.

    Are you letting velvet go through entire range of k-mers?

    Comment


    • #3
      Thank you GenoMax!
      I'll give Tadpole a try.
      As for Velvet: Please excuse my newbie question. How do I make Velvet go through an entire range of k-mers?

      Comment


      • #4
        See this thread: https://www.biostars.org/p/78315/

        Comment


        • #5
          Have you thought about using PacBio for long amplicons instead of tagmenting and trying to re-assemble? Each amplicon could get multiple polymerase passes for a high-quality consensus, and be sequenced multiple times for an even better merged consensus. It would definitely be cheaper than Nextera XT preps and a MiSeq run.
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            Thank you GenoMax! The thread you pointed to is super useful.

            Comment


            • #7
              SNPsaurus, thanks for the suggestion. We don't have immediate access to PacBio, but I'll keep that in mind!

              Comment


              • #8
                Hi GenoMax and all

                I am unable to re-define my velveth MAXKMERLENGTH. It is currently at default 31. Please see error message below. Any clue what's going on? Thanks again!!

                $ cd ./Velvet_1.2.10
                $ make ’MAXKMERLENGTH=150’
                rm obj/*.o obj/dbg/*.o
                rm: obj/dbg/*.o: No such file or directory
                make: [cleanobj] Error 1 (ignored)
                mkdir -p obj
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/tightString.c -o obj/tightString.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/run.c -o obj/run.o
                In file included from src/run.c:31:
                In file included from src/run.h:35:
                src/scaffold.h:21:9: warning: '_SSCAFFOLD_H_' is used as a header guard here,
                followed by #define of a different macro [-Wheader-guard]
                #ifndef _SSCAFFOLD_H_
                ^~~~~~~~~~~~~
                src/scaffold.h:22:9: note: '_SCAFFOLD_H_' is defined here; did you mean
                '_SSCAFFOLD_H_'?
                #define _SCAFFOLD_H_
                ^~~~~~~~~~~~
                _SSCAFFOLD_H_
                1 warning generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/splay.c -o obj/splay.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/splayTable.c -o obj/splayTable.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/graph.c -o obj/graph.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/run2.c -o obj/run2.o
                In file included from src/run2.c:26:
                In file included from src/run.h:35:
                src/scaffold.h:21:9: warning: '_SSCAFFOLD_H_' is used as a header guard here,
                followed by #define of a different macro [-Wheader-guard]
                #ifndef _SSCAFFOLD_H_
                ^~~~~~~~~~~~~
                src/scaffold.h:22:9: note: '_SCAFFOLD_H_' is defined here; did you mean
                '_SSCAFFOLD_H_'?
                #define _SCAFFOLD_H_
                ^~~~~~~~~~~~
                _SSCAFFOLD_H_
                1 warning generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/fibHeap.c -o obj/fibHeap.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/fib.c -o obj/fib.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/concatenatedGraph.c -o obj/concatenatedGraph.o
                In file included from src/concatenatedGraph.c:25:
                src/graph.h:173:26: warning: inline function 'getShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline Coordinate getShortReadMarkerPosition(ShortReadMarker * marker);
                ^
                src/concatenatedGraph.c:56:14: note: used here
                position = getShortReadMarkerPosition(marker);
                ^
                In file included from src/concatenatedGraph.c:25:
                src/graph.h:174:20: warning: inline function 'setShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline void setShortReadMarkerPosition(ShortReadMarker * marker,
                ^
                src/concatenatedGraph.c:59:4: note: used here
                setShortReadMarkerPosition(marker, position);
                ^
                2 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/passageMarker.c -o obj/passageMarker.o
                src/passageMarker.c:50:1: warning: unused function 'PM_I2P' [-Wunused-function]
                DECLARE_FAST_ACCESSORS (PM, PassageMarker, markerMemory)
                ^
                src/allocArray.h:69:21: note: expanded from macro 'DECLARE_FAST_ACCESSORS'
                static inline type* name##_I2P(ArrayIdx idx) \
                ^
                <scratch space>:412:1: note: expanded from here
                PM_I2P
                ^
                1 warning generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/graphStats.c -o obj/graphStats.o
                src/graphStats.c:1961:69: warning: format specifies type 'long' but the argument
                has type 'long long' [-Wformat]
                ..."PLACEHLDR.%ld PLACEHOLDER000", (int64_t) refIndex + 1);
                ~~~ ^~~~~~~~~~~~~~~~~~~~~~
                %lld
                /usr/include/secure/_stdio.h:47:56: note: expanded from macro 'sprintf'
                __builtin___sprintf_chk (str, 0, __darwin_obsz(str), __VA_ARGS__)
                ^~~~~~~~~~~
                In file included from src/graphStats.c:28:
                src/graph.h:173:26: warning: inline function 'getShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline Coordinate getShortReadMarkerPosition(ShortReadMarker * marker);
                ^
                src/graphStats.c:848:19: note: used here
                starts[index] = getShortReadMarkerPosition(marker);
                ^
                In file included from src/graphStats.c:28:
                src/graph.h:177:27: warning: inline function 'getShortReadMarkerOffset' is not
                defined [-Wundefined-inline]
                extern inline ShortLength getShortReadMarkerOffset(ShortReadMarker * marker);
                ^
                src/graphStats.c:849:34: note: used here
                stops[index] = starts[index] - getShortReadMarkerOffset(...
                ^
                3 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/correctedGraph.c -o obj/correctedGraph.o
                In file included from src/correctedGraph.c:26:
                src/graph.h:173:26: warning: inline function 'getShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline Coordinate getShortReadMarkerPosition(ShortReadMarker * marker);
                ^
                src/correctedGraph.c:776:15: note: used here
                position = getShortReadMarkerPosition(shortMarker);
                ^
                In file included from src/correctedGraph.c:26:
                src/graph.h:174:20: warning: inline function 'setShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline void setShortReadMarkerPosition(ShortReadMarker * marker,
                ^
                src/correctedGraph.c:799:4: note: used here
                setShortReadMarkerPosition(shortMarker, position);
                ^
                2 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/dfib.c -o obj/dfib.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/dfibHeap.c -o obj/dfibHeap.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/recycleBin.c -o obj/recycleBin.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/readSet.c -o obj/readSet.o
                src/readSet.c:641:21: warning: incompatible pointer types assigning to 'gzFile'
                (aka 'struct gzFile_s *') from 'AutoFile *' [-Wincompatible-pointer-types]
                file.gzFile = file.autoFile = NULL;
                ^ ~~~~~~~~~~~~~~~~~~~~
                src/readSet.c:680:22: warning: incompatible pointer types assigning to 'gzFile'
                (aka 'struct gzFile_s *') from 'AutoFile *' [-Wincompatible-pointer-types]
                file1.gzFile = file1.autoFile = NULL;
                ^ ~~~~~~~~~~~~~~~~~~~~~
                src/readSet.c:681:22: warning: incompatible pointer types assigning to 'gzFile'
                (aka 'struct gzFile_s *') from 'AutoFile *' [-Wincompatible-pointer-types]
                file2.gzFile = file2.autoFile = NULL;
                ^ ~~~~~~~~~~~~~~~~~~~~~
                src/readSet.c:632:1: warning: unused function 'kseq_rewind' [-Wunused-function]
                KSEQ_INIT(FileGZOrAuto, fileGZOrAuto_read)
                ^
                src/kseq.h:220:2: note: expanded from macro 'KSEQ_INIT'
                __KSEQ_BASIC(type_t) \
                ^
                src/kseq.h:152:21: note: expanded from macro '__KSEQ_BASIC'
                static inline void kseq_rewind(kseq_t *ks...
                ^
                4 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/binarySequences.c -o obj/binarySequences.o
                src/binarySequences.c:304:69: warning: format specifies type 'unsigned long' but
                the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                ...location 0x%lx for seq %ld beyond end 0x%lx\n", (uint64_t) tmp, (uint64_...
                ~~~ ^~~~~~~~~~~~~~
                %llx
                src/binarySequences.c:304:85: warning: format specifies type 'long' but the
                argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                ...seq %ld beyond end 0x%lx\n", (uint64_t) tmp, (uint64_t) sequenceIndex, (...
                ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~
                %llu
                src/binarySequences.c:304:111: warning: format specifies type 'unsigned long'
                but the argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                ...0x%lx\n", (uint64_t) tmp, (uint64_t) sequenceIndex, (uint64_t) arrayEnd);
                ~~~ ^~~~~~~~~~~~~~~~~~~
                %llx
                src/binarySequences.c:389:45: warning: format specifies type 'long' but the
                argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                velvetLog("CnySeq bufIdx %ld too large\n", bufIdx);
                ~~~ ^~~~~~
                %llu
                4 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/shortReadPairs.c -o obj/shortReadPairs.o
                In file included from src/shortReadPairs.c:34:
                src/scaffold.h:21:9: warning: '_SSCAFFOLD_H_' is used as a header guard here,
                followed by #define of a different macro [-Wheader-guard]
                #ifndef _SSCAFFOLD_H_
                ^~~~~~~~~~~~~
                src/scaffold.h:22:9: note: '_SCAFFOLD_H_' is defined here; did you mean
                '_SSCAFFOLD_H_'?
                #define _SCAFFOLD_H_
                ^~~~~~~~~~~~
                _SSCAFFOLD_H_
                In file included from src/shortReadPairs.c:27:
                src/graph.h:173:26: warning: inline function 'getShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline Coordinate getShortReadMarkerPosition(ShortReadMarker * marker);
                ^
                src/shortReadPairs.c:590:14: note: used here
                position = getShortReadMarkerPosition(marker);
                ^
                In file included from src/shortReadPairs.c:27:
                src/graph.h:174:20: warning: inline function 'setShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline void setShortReadMarkerPosition(ShortReadMarker * marker,
                ^
                src/shortReadPairs.c:593:4: note: used here
                setShortReadMarkerPosition(marker, position);
                ^
                3 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/locallyCorrectedGraph.c -o obj/locallyCorrectedGraph.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/graphReConstruction.c -o obj/graphReConstruction.o
                src/graphReConstruction.c:1407:65: warning: format specifies type 'long' but the
                argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
                ...=== Ghost-Threaded in %ld.%06ld s\n", diff.tv_sec, diff.tv_usec);
                ~~~~~ ^~~~~~~~~~~~
                %06d
                src/graphReConstruction.c:1436:59: warning: format specifies type 'long' but the
                argument has type '__darwin_suseconds_t' (aka 'int') [-Wformat]
                velvetLog(" === Threaded in %ld.%06ld s\n", diff.tv_sec, diff.tv_usec);
                ~~~~~ ^~~~~~~~~~~~
                %06d
                2 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/roadMap.c -o obj/roadMap.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/preGraph.c -o obj/preGraph.o
                In file included from src/preGraph.c:35:
                In file included from src/run.h:35:
                src/scaffold.h:21:9: warning: '_SSCAFFOLD_H_' is used as a header guard here,
                followed by #define of a different macro [-Wheader-guard]
                #ifndef _SSCAFFOLD_H_
                ^~~~~~~~~~~~~
                src/scaffold.h:22:9: note: '_SCAFFOLD_H_' is defined here; did you mean
                '_SSCAFFOLD_H_'?
                #define _SCAFFOLD_H_
                ^~~~~~~~~~~~
                _SSCAFFOLD_H_
                src/preGraph.c:165:17: warning: taking address of packed member 'preArcRight' of
                class or structure 'preNode_st' may result in an unaligned pointer value
                [-Waddress-of-packed-member]
                preArcPtr = &(preNode->preArcRight);
                ^~~~~~~~~~~~~~~~~~~~
                src/preGraph.c:167:17: warning: taking address of packed member 'preArcLeft' of
                class or structure 'preNode_st' may result in an unaligned pointer value
                [-Waddress-of-packed-member]
                preArcPtr = &(preNode->preArcLeft);
                ^~~~~~~~~~~~~~~~~~~
                src/preGraph.c:285:17: warning: taking address of packed member 'preArcRight' of
                class or structure 'preNode_st' may result in an unaligned pointer value
                [-Waddress-of-packed-member]
                preArcPtr = &(preNode->preArcRight);
                ^~~~~~~~~~~~~~~~~~~~
                src/preGraph.c:287:17: warning: taking address of packed member 'preArcLeft' of
                class or structure 'preNode_st' may result in an unaligned pointer value
                [-Waddress-of-packed-member]
                preArcPtr = &(preNode->preArcLeft);
                ^~~~~~~~~~~~~~~~~~~
                src/preGraph.c:520:27: warning: unused function 'mergeDescriptors_pg'
                [-Wunused-function]
                static inline Descriptor *mergeDescriptors_pg(Descriptor * descr,
                ^
                src/preGraph.c:635:27: warning: unused function 'mergeDescriptorsH2H_pg'
                [-Wunused-function]
                static inline Descriptor *mergeDescriptorsH2H_pg(Descriptor * descr,
                ^
                src/preGraph.c:760:27: warning: unused function 'mergeDescriptorsF2F_pg'
                [-Wunused-function]
                static inline Descriptor *mergeDescriptorsF2F_pg(Descriptor * descr,
                ^
                8 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/preGraphConstruction.c -o obj/preGraphConstruction.o
                src/preGraphConstruction.c:525:58: warning: format specifies type 'long' but the
                argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                ...velvetLog("readIndex %ld beyond string len %ld\n", (uint64_t) readIndex...
                ~~~ ^~~~~~~~~~~~~~~~~~~~
                %llu
                src/preGraphConstruction.c:525:80: warning: format specifies type 'long' but the
                argument has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
                ...string len %ld\n", (uint64_t) readIndex, (uint64_t) tString->length);
                ~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
                %llu
                2 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/concatenatedPreGraph.c -o obj/concatenatedPreGraph.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/readCoherentGraph.c -o obj/readCoherentGraph.o
                In file included from src/readCoherentGraph.c:25:
                src/graph.h:173:26: warning: inline function 'getShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline Coordinate getShortReadMarkerPosition(ShortReadMarker * marker);
                ^
                src/readCoherentGraph.c:367:14: note: used here
                position = getShortReadMarkerPosition(marker);
                ^
                In file included from src/readCoherentGraph.c:25:
                src/graph.h:174:20: warning: inline function 'setShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline void setShortReadMarkerPosition(ShortReadMarker * marker,
                ^
                src/readCoherentGraph.c:369:3: note: used here
                setShortReadMarkerPosition(marker, position);
                ^
                2 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/utility.c -o obj/utility.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/kmer.c -o obj/kmer.o
                src/kmer.c:28:23: warning: unused variable 'longLongLeftFilter'
                [-Wunused-const-variable]
                static const uint64_t longLongLeftFilter = (uint64_t) 3 << 62;
                ^
                src/kmer.c:29:23: warning: unused variable 'longLeftFilter'
                [-Wunused-const-variable]
                static const uint32_t longLeftFilter = (uint32_t) 3 << 30;
                ^
                src/kmer.c:30:23: warning: unused variable 'intLeftFilter'
                [-Wunused-const-variable]
                static const uint16_t intLeftFilter = (uint16_t) 3 << 14;
                ^
                src/kmer.c:31:22: warning: unused variable 'charLeftFilter'
                [-Wunused-const-variable]
                static const uint8_t charLeftFilter = (uint8_t) 3 << 6;
                ^
                4 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/scaffold.c -o obj/scaffold.o
                In file included from src/scaffold.c:39:
                src/scaffold.h:21:9: warning: '_SSCAFFOLD_H_' is used as a header guard here,
                followed by #define of a different macro [-Wheader-guard]
                #ifndef _SSCAFFOLD_H_
                ^~~~~~~~~~~~~
                src/scaffold.h:22:9: note: '_SCAFFOLD_H_' is defined here; did you mean
                '_SSCAFFOLD_H_'?
                #define _SCAFFOLD_H_
                ^~~~~~~~~~~~
                _SSCAFFOLD_H_
                In file included from src/scaffold.c:32:
                src/graph.h:173:26: warning: inline function 'getShortReadMarkerPosition' is not
                defined [-Wundefined-inline]
                extern inline Coordinate getShortReadMarkerPosition(ShortReadMarker * marker);
                ^
                src/scaffold.c:455:7: note: used here
                getShortReadMarkerPosition(shortMarker);
                ^
                In file included from src/scaffold.c:32:
                src/graph.h:177:27: warning: inline function 'getShortReadMarkerOffset' is not
                defined [-Wundefined-inline]
                extern inline ShortLength getShortReadMarkerOffset(ShortReadMarker * marker);
                ^
                src/scaffold.c:457:7: note: used here
                getShortReadMarkerOffset(shortMarker);
                ^
                3 warnings generated.
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/kmerOccurenceTable.c -o obj/kmerOccurenceTable.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/allocArray.c -o obj/allocArray.o
                gcc -Wall -m64 -O3 -D MAXKMERLENGTH=31 -D CATEGORIES=2 -c src/autoOpen.c -o obj/autoOpen.o
                src/autoOpen.c:56:19: warning: duplicate 'const' declaration specifier
                [-Wduplicate-decl-specifier]
                static const char const* decompressors[] = {"","pigz", "gunzip", "pbunzi...
                ^
                1 warning generated.
                gcc -Wall -m64 -O3 -o velveth obj/tightString.o obj/run.o obj/recycleBin.o obj/splay.o obj/splayTable.o obj/readSet.o obj/binarySequences.o obj/utility.o obj/kmer.o obj/kmerOccurenceTable.o obj/autoOpen.o -lz -lm
                gcc -Wall -m64 -O3 -o velvetg obj/tightString.o obj/graph.o obj/run2.o obj/fibHeap.o obj/fib.o obj/concatenatedGraph.o obj/passageMarker.o obj/graphStats.o obj/correctedGraph.o obj/dfib.o obj/dfibHeap.o obj/recycleBin.o obj/readSet.o obj/binarySequences.o obj/shortReadPairs.o obj/scaffold.o obj/locallyCorrectedGraph.o obj/graphReConstruction.o obj/roadMap.o obj/preGraph.o obj/preGraphConstruction.o obj/concatenatedPreGraph.o obj/readCoherentGraph.o obj/utility.o obj/kmer.o obj/kmerOccurenceTable.o obj/allocArray.o obj/autoOpen.o -lz -lm
                Guineveres-MacBook-Pro:Velvet_1.2.10 guin$

                Comment

                Working...
                X