Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Brian,

    Is it possible to obtain some further information on this program, e.g. how exactly it works?, the columns of the results.txt file?

    Many thanks.

    Comment


    • #17
      Hi Elsie,

      I've attached the presentation I used to describe how it works.

      The columns in the results file are like this:

      assembly contig contam length avgFold reads percentCovered avgFold0 reads0 normRatio

      They mean:

      assembly: Which assembly this contig came from.
      contig: The contig name.
      contam: Whether or not it was flagged as a contaminant.
      length: Length of the contig.
      avgFold: Average coverage of the contig by normalized reads.
      reads: How many normalized reads mapped to the contig.
      percentCovered: Percent of the bases that are covered by mapped normalized reads.
      avgFold0: Average coverage by reads prior to normalization.
      reads0: Number of reads mapping to the contig prior to normalization.
      normRatio: Ratio of coverage before and after normalization.

      Contigs that have very low coverage after normalization, but did not have very low coverage prior to normalization, are considered contaminants. But the program is designed to be very sensitive and ensure no contaminated contigs slip through, so additionally, all contigs shorter than 500bp or with fewer than some number of total reads mapped to them are also classified as contaminants and removed, because once a contig gets shorter than ~500bp or has very few reads mapped to it, the statistical basis of the approach weakens and it is hard to determine confidently whether the contig is a contaminant. So to absolutely ensure all contaminant contigs are removed, these get removed too. You can override this behavior by adjusting these flags:

      minc=3.5 Min average coverage to retain scaffold.
      minp=20 Min percent coverage to retain scaffold.
      minr=18 Min mapped reads to retain scaffold.
      minl=500 Min length to retain scaffold.

      Specifically, you could set "minl=0 minr=0" to bypass the filters that classify a contig as contaminant just because it is very short or has very few reads mapped to it, which may be a good idea for transcriptomics. Note that the tool was developed and optimized for genome assemblies. It will work fine on transcriptomes, in terms of ensuring there is no cross-contamination, but the defaults should probably be adjusted or you'll lose all your transcripts shorter than 500bp.

      P.S. For detecting cross-contamination at the read level, we don't use Crossblock, but rather, we use Seal. First, run "fuse.sh" on all of the assemblies to make them a single contig and rename the contig based on the library (which simplifies stats reporting). Then concatenate them into a single file (though actually that's optional). For example:

      Code:
      fuse.sh in=assembly1.fa out=stdout.fa | rename.sh in=stdin.fa out=renamed1.fa prefix=assembly1 prefixonly
      ...
      cat renamed*.fa > all.fa
      Then run Seal on each set of reads individually:

      seal.sh in=library1.fq ref=all.fa stats=sealstats1.txt mkf=0.4 ambig=toss

      Then summarize the results:

      summarizeseal.sh sealstats*.txt out=summary.txt

      That gives you the read-level cross-contamination in ppm. It underestimates it due to "ambig=toss" which ignores the situation where a contig assembled both in the proper library and also in the contaminant library, but is best in the case when you may have multiple libraries of the same species. "ambig=all" or "ambig=random" on the other hand will over-estimate cross-contamination in that case, but are fine if you are sure you don't have any closely-related organisms multiplexed together.
      Attached Files
      Last edited by Brian Bushnell; 11-16-2015, 10:18 AM.

      Comment


      • #18
        Thank you so much for the ppt - that is very helpful.
        This tool will be incredibly useful in our trouble-shooting toolkit - thank you so much for not only making it available, but for putting in the effort to explain it, it is greatly appreciated.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advanced Tools Transforming the Field of Cytogenomics
          by seqadmin


          At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
          09-26-2023, 06:26 AM
        • seqadmin
          How RNA-Seq is Transforming Cancer Studies
          by seqadmin



          Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
          09-07-2023, 11:15 PM
        • seqadmin
          Methods for Investigating the Transcriptome
          by seqadmin




          Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

          Whole Transcriptome RNA-seq
          Whole transcriptome sequencing...
          08-31-2023, 11:07 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:57 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-26-2023, 07:53 AM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-25-2023, 07:42 AM
        0 responses
        15 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-22-2023, 09:05 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Working...
        X