Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Artificially reduce read lengths in FASTQ to mock-up different run parameters

    Hi,

    A have a bunch of MiSeq 2x150bp PE FASTQ data.

    I would like to know how my variant calling approach and coverage would have been affected had I run the sequencer at different read settings. Since I have a PCR amplicon library prep I have a non-normally distributed # sized dataset with a significant proportion less than than 150bp.

    What I am wanting to know is what the sweet spot is on the MiSeq for a particular library with regard to maximising coverage and minimising wasting reads through adapter read-through.
    The niggle is that MiSeq reagents are currently only available as 300 or 50 cycle cartridges, so at the moment this is somewhat of an academic exercise, aside from the time a run takes, but relevant given the upgrade to come and the advent of possible 2x250 PE.

    To do this, I would like to have a way of stripping the last 'x' bases off each read in R1 & R2 FASTQs.

    Has anyone got a way of doing that?

    Thanks

  • #2
    Fastx toolkit trimmer

    Comment


    • #3
      Yes, dont know why I didnt think of that, thanks

      Maybe a bit of a curveball, but is it possible to estimate what coverage would be from an aligned set of reads , had the sequencing parameter been set to bigger than the that contained in the original dataset...?

      ie my dataset is 2x150 PE. Insert size can therefore be inferred from the alignment. Could the read length be 'extended' in-situ in terms of coverage for each read to estimate what coverage would be had a higher read length been contained in the original fastq? The rationale is that if I can find a way of working this out for a 2x250 bp read length I could decide whether or not the increase in read depth that I would get at higher read length is worth it given the increase in adapter read-through that I will get, without buying the reagents to find out.
      Last edited by swNGS; 07-06-2012, 11:40 AM.

      Comment


      • #4
        For SNP calling, there's not really much reason to go past 50bp in length. Especially when comparing to 2x150 bp reads, you'll theoretically have 6x more sampling per basepair with the smaller single end reads.

        We have our alignment parameters set up so that no matter the read, we always want a certain identity in order to call it a good alignment.
        Last edited by ians; 07-06-2012, 01:29 PM.

        Comment


        • #5
          Ians: doesn't that assume that I have a range of insert sizes of 150pb+ though?

          If I have a mixture biased to approx 100 bp then no matter how far I increase the read length, I am never going to increase read depth derived from sequencing each end of such reads. My reason to do so would be to take advantage of those longer insert sizes where I would maybe get the increase that you said.
          If I was dealing with a randomly sheared library I wouldn't worry very much, but given that I have a PCR amplicon library, those long reads may be in abundance...

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X