Header Leaderboard Ad

Collapse

Subsampling from one paired-end fastq file

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsampling from one paired-end fastq file

    Hi,

    I know there were already few discussions about this topic but I am not sure I got it.

    I have a fastq file containing Illumina paired-end reads. Below are the first four headers of the fastq file.

    @HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10533 1:N:0:CGATGT
    @HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10533 2:N:0:CGATGT
    @HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10642 1:N:0:CGATGT
    @HWUSI-EAS1599:82:64H78AAXX:7:100:10000:10642 2:N:0:CGATGT

    It looks like first and second lines are a pair and the third and fourth lines are the another pair. If I want to make a subset containing 1,000 reads, can I just extract the first 1,000 reads in order using 'head' command? I do not understand why it might cause biases. If 'head' command is not a good way for subsampling, any very simple way to do it? Thank you a lot for your comments in advance.

  • #2
    Head isn't ideal since the reads near the beginning of the fastq file tend to be crappier. So, you really want to randomly subsample from the whole fastq file. You should be able to adapt the scripts found here and elsewhere to your case of having the pairs in the same file.

    Comment


    • #3
      The first tile in your fastq is going to come from the edge of the flow cell, so won't be as good as tiles in the middle. Use grep | head to get the first 1000 reads from a tile in the middle; that would be better. The tile ID should be the number after the lane.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
        by seqadmin



        Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
        Today, 01:49 PM
      • seqadmin
        Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
        by seqadmin




        Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
        03-10-2023, 05:31 AM
      • seqadmin
        Expert Advice on Automating Your Library Preparations
        by seqadmin



        Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....
        02-21-2023, 02:14 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-17-2023, 12:32 PM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-15-2023, 12:42 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-09-2023, 10:17 AM
      0 responses
      67 views
      1 like
      Last Post seqadmin  
      Started by seqadmin, 03-03-2023, 12:03 PM
      0 responses
      64 views
      0 likes
      Last Post seqadmin  
      Working...
      X