Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average Insert Size

    Hi there,
    I ran an Illumina HiSeq run 2x250 and wanted to know how I could find out the average insert size from my fastq files?

    Thanks in advance!

  • #2
    There are three primary ways. I'll describe how to do them using the BBMap package.

    1) Via mapping, which requires a reference:
    bbmap.sh in1=r1.fastq in2=r2.fastq ref=ref.fasta ihist=ihist.txt reads=2m pairlen=2000

    2) Via overlap, which requires overlapping reads (they probably overlap given you ran at 2x250):
    bbmerge.sh in1=r1.fastq in2=r2.fastq ihist=ihist.txt reads=2m

    3) Via assembly, which requires sufficient read depth and memory to assemble the genome:
    bbmerge-auto.sh in1=r1.fastq in2=r2.fastq ihist=ihist.txt extend2=200

    2 is the fastest. The best choice and best settings depend on your data, though. Can you describe the organism, experiment, and target insert size?

    Comment


    • #3
      Thanks for responding so quickly!
      Well I ran a HiSeq run on environmental water samples. So it's a metagenomics experiment. The purpose of the experiment is to look at what species are predominant in certain bodies of water. I am looking at bacterial species more specifically.

      I am going to run trimmomatic to trim my reads and remove adapter sequences. Afterwards I am going to use FLASh to merge my reads. In order to find the correct parameters to use for FLASh I need to figure out my average insert size. Hence that is why I am trying to find out how to do that.

      Comment


      • #4
        Full disclosure - I developed BBDuk and BBMerge. In my testing, BBDuk has greater accuracy than Trimmomatic, and BBMerge has greater accuracy than Flash. But you can certainly determine your insert size with BBMerge and then use that with Flash, if you wish. In my experience that's not necessary; Flash will not give substantially better output even when you know the average insert size apriori. Rather, it will output a helpful message when you give it settings it finds inappropriate. You can then correct them, but it will still yield similar results, in my experience.

        Unless you answer all of the questions posed, nobody can give you optimal advice... for example, what's the target insert size?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X