Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastp: a fat all-in-one FASTQ file preprocessor (QC/filter/trim/adapter/split...)

    fastp at github: https://github.com/OpenGene/fastp

    This tool is designed to provide fast all-in-one preprocessing for FastQ files. It is developed in C++ with multithreading supported to afford high performance. It has following features:
    • filter out bad reads (too low quality, too short, or too many N...)
    • trim all reads in front and tail
    • cut low quality bases for per read in its 5' and 3' by evaluating the mean quality from a sliding window (like Trimmomatic but faster).
    • cut adapters (for paired end data it's automatic, for single end data adapter sequence should be provided).
    • report JSON format result for further interpreting.
    • visualize quality control and filtering results on a single HTML page (like FASTQC but faster and more informative).
    • split the output to multiple files (0001.R1.gz, 0002.R1.gz...) to support parallel processing.
    • ...


    This tool is being intensively developed, and new features can be implemented soon if they are considered useful. If you have any additional requirement for fastp, please file an issue:https://github.com/OpenGene/fastp/issues/new


    fastp creates reports in both HTML and JSON format.

    HTML report: http://opengene.org/fastp/fastp.html
    JSON report: http://opengene.org/fastp/fastp.json

    Get fastp


    sudo make install
    usage

    usage: fastp -i <in1> -o <out1> [-I <in1> -O <out2>] [options...]
    options:
    # I/O options
    -i, --in1 read1 input file name (string)
    -o, --out1 read1 output file name (string [=])
    -I, --in2 read2 input file name (string [=])
    -O, --out2 read2 output file name (string [=])
    -6, --phred64 indicates the input is using phred64 scoring (it'll be converted to phred33, so the output will still be phred33)
    -z, --compression compression level for gzip output (1 ~ 9). 1 is fastest, 9 is smallest, default is 2. (int [=2])

    # adapter trimming options
    -A, --disable_adapter_trimming adapter trimming is enabled by default. If this option is specified, adapter trimming is disabled
    -a, --adapter_sequence for single end data, adapter sequence is required for adapter trimming (string [=])

    # global trimming options
    -f, --trim_front1 trimming how many bases in front for read1, default is 0 (int [=0])
    -t, --trim_tail1 trimming how many bases in tail for read1, default is 0 (int [=0])
    -F, --trim_front2 trimming how many bases in front for read2. If it's not specified, it will follow read1's settings (int [=0])
    -T, --trim_tail2 trimming how many bases in tail for read2. If it's not specified, it will follow read1's settings (int [=0])

    # per read cutting by quality options
    -5, --cut_by_quality5 enable per read cutting by quality in front (5'), default is disabled (WARNING: this will interfere deduplication for both PE/SE data)
    -3, --cut_by_quality3 enable per read cutting by quality in tail (3'), default is disabled (WARNING: this will interfere deduplication for SE data)
    -W, --cut_window_size the size of the sliding window for sliding window trimming, default is 4 (int [=4])
    -M, --cut_mean_quality the bases in the sliding window with mean quality below cutting_quality will be cut, default is Q20 (int [=20])

    # quality filtering options
    -Q, --disable_quality_filtering quality filtering is enabled by default. If this option is specified, quality filtering is disabled
    -q, --qualified_quality_phred the quality value that a base is qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])
    -u, --unqualified_percent_limit how many percents of bases are allowed to be unqualified (0~100). Default 40 means 40% (int [=40])
    -n, --n_base_limit if one read's number of N base is >n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])

    # length filtering options
    -L, --disable_length_filtering length filtering is enabled by default. If this option is specified, length filtering is disabled
    -l, --length_required reads shorter than length_required will be discarded. (int [=30])

    # reporting options
    -j, --json the json format report file name (string [=fastp.json])
    -h, --html the html format report file name (string [=fastp.html])

    # thread options
    -w, --thread worker thread number, default is 3 (int [=3])

    # output splitting options
    -s, --split if this option is specified, the output will be split to multiple (--split) files (i.e. 0001.out.fq, 0002.out.fq...). (int [=0])
    -d, --split_prefix_digits the digits for the slice number padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to disable padding (int [=4])

    # help
    -?, --help print this message
    OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
    FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

Latest Articles

Collapse

  • seqadmin
    Non-Coding RNA Research and Technologies
    by seqadmin


    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

    [Article Coming Soon!]...
    Yesterday, 08:07 AM
  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 10-02-2024, 04:51 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
24 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
1 response
31 views
0 likes
Last Post EmiTom
by EmiTom
 
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
20 views
0 likes
Last Post seqadmin  
Working...
X