Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Elsie
    replied
    Thank you Brian, that is really helpful, and incredibly prompt. Thank you so much.

    Leave a comment:


  • Brian Bushnell
    replied
    The only way around is to try to assemble it first. If it's a bacteria, and you have sufficient coverage, you can get a decent assembly in a few minutes with Velvet. BBMap will not work without an assembly, but it doesn't have to be a good assembly - a quick one with short contigs is fine for this purpose, as long as those contigs are several times larger than read length.

    Leave a comment:


  • Elsie
    replied
    Thanks Brian, unfortunately I do not have a reference for this sequence!, so I'm assuming no way around this?

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Elsie,

    You have to index the reference first. For example:

    bbmap.sh ref=genome.fasta

    Wait for that to finish, then map.

    -Brian

    Leave a comment:


  • Elsie
    replied
    Hi Brian,

    Thanks so much for this. I am trying to repeat your above commands, using interleaved files, and I get this error, can you help?
    Thanks.

    bbmap.sh maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
    java -Djava.library.path=/bbmap/jni/ -ea -Xmx43110m -cp /bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 maxindel=200 in=trimmed.fq.gz mhist=mhist.txt bhist=bhist.txt qhist=qhist.txt qahist=qahist.txt
    Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, maxindel=200, in=trimmed.fq.gz, mhist=mhist.txt, bhist=bhist.txt, qhist=qhist.txt, qahist=qahist.txt]

    BBMap version 34.56
    Set match histogram output to mhist.txt
    Set base content histogram output to bhist.txt
    Set quality histogram output to qhist.txt
    Set quality accuracy histogram output to qahist.txt
    Retaining first best site only for ambiguous mappings.
    No output file.
    Exception in thread "main" java.lang.RuntimeException: Can't find file ref/genome/1/summary.txt
    at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:815)
    at fileIO.ReadWrite.getInputStream(ReadWrite.java:780)
    at fileIO.TextFile.open(TextFile.java:277)
    at fileIO.TextFile.<init>(TextFile.java:94)
    at dna.Data.setGenome2(Data.java:839)
    at dna.Data.setGenome(Data.java:785)
    at align2.BBMap.loadIndex(BBMap.java:302)
    at align2.BBMap.main(BBMap.java:32)

    Leave a comment:


  • kentawan
    replied
    Originally posted by ymc View Post
    http://www.illumina.com/systems/next...ncer/kits.html

    I find a NextSeq v2 kit here. Is it something new?
    I just gave my local distributor a call. He said that this kit will be ready for shipment on February 2015. Pricing will be the same as the v1 kits!

    Finally some hope for NextSeq 500 users!

    Leave a comment:


  • ymc
    replied


    I find a NextSeq v2 kit here. Is it something new?

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by pmiguel View Post
    One possible trivial reason could be whether mismatches between an index read and the index sequence are allowed. HiSeq and MiSeq allow 1 mismatch by default. But we demultiplex off-instrument and allow zero mismatches.

    --
    Phillip
    We are also allowing 0 mismatches in both cases (and typically end up with >20% of reads in the unknown bin, as a result). Right now our 2 leading candidate hypotheses are:

    1) NextSeq has much lower cluster density;
    2) NextSeq has a different order of {read1, read2, index1, index2, resynthesis} compared to HiSeq/MiSeq.

    Leave a comment:


  • pmiguel
    replied
    Originally posted by Brian Bushnell View Post
    I would certainly not want to use it for low-coverage variant calling!

    Incidentally, though, it seems the NextSeq platform may have a silver lining. Though all of the standard data quality metrics are much worse than HiSeq in my testing, it appears to have a drastically lower cross-contamination rate (reads from one library assigned to a different library) for dual-index pooled libraries, to the point that we are considering using NextSeq over HiSeq for projects in which index cross-contamination is more important than error rate. We are still investigating why the rate is lower.
    One possible trivial reason could be whether mismatches between an index read and the index sequence are allowed. HiSeq and MiSeq allow 1 mismatch by default. But we demultiplex off-instrument and allow zero mismatches.

    --
    Phillip

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by aeonsim View Post
    The data quality from the NextSeq is substantially worse than that from a Hiseq with substantially more errors to the point where I'm not certain the data is usable for low to medium coverage whole genome sequence variant calling (1-20x).
    I would certainly not want to use it for low-coverage variant calling!

    Incidentally, though, it seems the NextSeq platform may have a silver lining. Though all of the standard data quality metrics are much worse than HiSeq in my testing, it appears to have a drastically lower cross-contamination rate (reads from one library assigned to a different library) for dual-index pooled libraries, to the point that we are considering using NextSeq over HiSeq for projects in which index cross-contamination is more important than error rate. We are still investigating why the rate is lower.

    Leave a comment:


  • aeonsim
    replied
    So I've been getting some initial test data back from a Nextseq 500 and I'm really not happy with it compared to the Hiseq.

    The data quality from the NextSeq is substantially worse than that from a Hiseq with substantially more errors to the point where I'm not certain the data is usable for low to medium coverage whole genome sequence variant calling (1-20x).

    Attached are a number of different PDF's showing the data compared to HiSeq data from the same facility (same experienced technician's doing all the sequencing, library prep and everything). We resequenced our HiSeq libraries (PCR-Free 550bp insert) to compare like to like and you can clearly see the difference.

    Two of the files show GATK's BQSR before after and plots for one of our typical Hiseq libraries (recalQC-randomHiseq.pdf) and a Hiseq library sequenced on the NextSeq (BQSR-NextSeq-Before-After.pdf). The difference is substantial and while these are not the same library the Hiseq is representative of what we usually get.

    The second two files show the same library with 4 lanes of NextSeq sequence vs the Same library when sequenced on the Hiseq, you'll clearly be able to determine which comes from which machine (Nxt, Nxt, Nxt, Nxt, Hiseq).

    Finally here are some alignment stats from Picard tools for the same library sequenced twice on the NextSeq (two different runs) vs the Stats for the same library from the HiSeq showing a 1-2% reduction in reads aligned and ~80% increase in mismatch rate.

    Seq PCT_PF_READS_ALIGNED PF_MISMATCH_RATE PF_HQ_ERROR_RATE
    NextSeq_R2 0.964464 0.022694 0.021512
    NextSeq_R1 0.955108 0.025834 0.024588
    HiSeq 0.973545 0.013678 0.013063


    Now the data isn't entirely unusable for WGS if you have enough coverage you can still get variant calls out of it. However they're likely to have a higher FP and if you were looking for rare variants I would be very hesitant to use the data (especially for de novo mutations). For other uses this may be fine, but I've only experience with WGS and RNA-seq so I'll leave that for others to decide.
    Attached Files

    Leave a comment:


  • Innovelty
    replied
    Originally posted by GenoMax View Post
    You can (get the long reads )

    Provided you have access to the right HiSeq 2500. One can now do 2 x 250 PE runs.


    One can... provided one has access to the machine, and more than a tiny pilot grant to work with. Sadly, when one works on non-model insects for non-agricultural/biomedical purposes, and one is only a wee third-year, one might only have ~$2500 to spend on the run itself. (Not that one is complaining. One is really super pleased about that.)

    Enough of my de-railing, though -- really looking forward to updates from TonyBrooks, because my application is a de novo transcriptome project, primarily interested in the gene expression. Thanks one and all.

    Leave a comment:


  • Brian Bushnell
    replied
    Originally posted by TonyBrooks View Post
    The question is whether the error is systematic or random. Random error can be somewhat compensated for by a decent sequence depth.
    Yep, I plan to plot the error rate across a genome and see if I can see some kind of pattern, but I have not had time to do that yet.

    I'll attempt to post the QC from the Illumina PhiX we sequenced during training.
    That would be great!

    Leave a comment:


  • TonyBrooks
    replied
    Data quality is definitely inferior to both the MiSeq and HiSeq.
    It's quick though, and perhaps more suited for counting applications, such as RNA-Seq and ChIPSeq than variant calling.
    The question is whether the error is systematic or random. Random error can be somewhat compensated for by a decent sequence depth.

    I'll attempt to post the QC from the Illumina PhiX we sequenced during training.

    Leave a comment:


  • GenoMax
    replied
    You can (get the long reads )

    Provided you have access to the right HiSeq 2500. One can now do 2 x 250 PE runs.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 11:49 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X