Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing short reads from paired-end fastqs

    Sometimes trimming adapters from two paired read files (with, say, cutadapt) results in unequal trimming for the members of any given pair. Therefore if you subsequently remove short inserts from both readfiles independently, it can throw the pairs out of sync as soon as it removes one but not the other member of a pair.

    The following script "nixshorts_PE" will remove a read pair from two paired-end read fastqs when at least one of the two members are below a certain length. The same method can be used for removing short reads from a single-end file, with some adjustments. Just thought some of you might find this handy.

    Please post improvements to this script if you think of them. Thanks!

    #!/bin/bash

    # This removes reads of a below a certain length from paired read files in fastq format (e.g., R1 and R2 from the same library)

    # Usage: $ bash nixshorts_PE [input fastqR1] [input fastqR2] [minimum read length to keep]

    # PROCESS:

    #1. Start with inputs
    R1fq=$1
    R2fq=$2
    minlen=$3

    #2. Find all entries with read length less than minimum length and print line numbers, for both R1 and R2
    awk -v min=$minlen '{if(NR%4==2) if(length($0)<min) print NR"\n"NR-1"\n"NR+1"\n"NR+2}' $R1fq > temp.lines1
    awk -v min=$minlen '{if(NR%4==2) if(length($0)<min) print NR"\n"NR-1"\n"NR+1"\n"NR+2}' $R2fq >> temp.lines1

    #3. Combine both line files into one, sort them numerically, and collapse redundant entries
    sort -n temp.lines1 | uniq > temp.lines
    rm temp.lines1

    #4. Remove the line numbers recorded in "lines" from both fastqs
    awk 'NR==FNR{l[$0];next;} !(FNR in l)' temp.lines $R1fq > $R1fq.$minlen
    awk 'NR==FNR{l[$0];next;} !(FNR in l)' temp.lines $R2fq > $R2fq.$minlen
    rm temp.lines

    #5. Conclude
    echo "Pairs shorter than $minlen bases removed from $R1fq and $R2fq"

Latest Articles

Collapse

  • seqadmin
    Advanced Methods for the Detection of Infectious Disease
    by seqadmin




    The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
    ...
    Yesterday, 01:15 PM
  • seqadmin
    Strategies for Investigating the Microbiome
    by seqadmin




    Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
    11-09-2023, 07:02 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:12 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-22-2023, 09:29 AM
1 response
46 views
0 likes
Last Post VilliamPast  
Started by seqadmin, 11-22-2023, 08:53 AM
0 responses
30 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-21-2023, 08:24 AM
0 responses
23 views
0 likes
Last Post seqadmin  
Working...
X