Header Leaderboard Ad

Collapse

fastp: an ultra-fast all-in-one FASTQ file preprocessor (QC/filter/trim/adapter...)

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastp: an ultra-fast all-in-one FASTQ file preprocessor (QC/filter/trim/adapter...)

    fastp: https://github.com/OpenGene/fastp

    This tool is designed to provide fast all-in-one preprocessing for FastQ files. It is developed in C++ with multithreading supported to afford high performance. It has following features:
    • filter out bad reads (too low quality, too short, or too many N...)
    • trim all reads in front and tail
    • cut low quality bases for per read in its 5' and 3' by evaluating the mean quality from a sliding window (like Trimmomatic but faster).
    • cut adapters (for paired end data it's automatic, for single end data adapter sequence should be provided).
    • report JSON format result for further interpreting.
    • visualize quality control and filtering results on a single HTML page (like FASTQC but faster and more informative).
    • split the output to multiple files (0001.R1.gz, 0002.R1.gz...) to support parallel processing.
    • ...


    This tool is being intensively developed, and new features can be implemented soon if they are considered useful. If you have any additional requirement for fastp, please file an issue:https://github.com/OpenGene/fastp/issues/new


    fastp creates reports in both HTML and JSON format.

    HTML report: http://opengene.org/fastp/fastp.html
    JSON report: http://opengene.org/fastp/fastp.json

    Get fastp


    sudo make install
    usage

    usage: fastp -i <in1> -o <out1> [-I <in1> -O <out2>] [options...]
    options:
    # I/O options
    -i, --in1 read1 input file name (string)
    -o, --out1 read1 output file name (string [=])
    -I, --in2 read2 input file name (string [=])
    -O, --out2 read2 output file name (string [=])
    -6, --phred64 indicates the input is using phred64 scoring (it'll be converted to phred33, so the output will still be phred33)
    -z, --compression compression level for gzip output (1 ~ 9). 1 is fastest, 9 is smallest, default is 2. (int [=2])

    # adapter trimming options
    -A, --disable_adapter_trimming adapter trimming is enabled by default. If this option is specified, adapter trimming is disabled
    -a, --adapter_sequence for single end data, adapter sequence is required for adapter trimming (string [=])

    # global trimming options
    -f, --trim_front1 trimming how many bases in front for read1, default is 0 (int [=0])
    -t, --trim_tail1 trimming how many bases in tail for read1, default is 0 (int [=0])
    -F, --trim_front2 trimming how many bases in front for read2. If it's not specified, it will follow read1's settings (int [=0])
    -T, --trim_tail2 trimming how many bases in tail for read2. If it's not specified, it will follow read1's settings (int [=0])

    # per read cutting by quality options
    -5, --cut_by_quality5 enable per read cutting by quality in front (5'), default is disabled (WARNING: this will interfere deduplication for both PE/SE data)
    -3, --cut_by_quality3 enable per read cutting by quality in tail (3'), default is disabled (WARNING: this will interfere deduplication for SE data)
    -W, --cut_window_size the size of the sliding window for sliding window trimming, default is 4 (int [=4])
    -M, --cut_mean_quality the bases in the sliding window with mean quality below cutting_quality will be cut, default is Q20 (int [=20])

    # quality filtering options
    -Q, --disable_quality_filtering quality filtering is enabled by default. If this option is specified, quality filtering is disabled
    -q, --qualified_quality_phred the quality value that a base is qualified. Default 15 means phred quality >=Q15 is qualified. (int [=15])
    -u, --unqualified_percent_limit how many percents of bases are allowed to be unqualified (0~100). Default 40 means 40% (int [=40])
    -n, --n_base_limit if one read's number of N base is >n_base_limit, then this read/pair is discarded. Default is 5 (int [=5])

    # length filtering options
    -L, --disable_length_filtering length filtering is enabled by default. If this option is specified, length filtering is disabled
    -l, --length_required reads shorter than length_required will be discarded. (int [=30])

    # reporting options
    -j, --json the json format report file name (string [=fastp.json])
    -h, --html the html format report file name (string [=fastp.html])

    # thread options
    -w, --thread worker thread number, default is 3 (int [=3])

    # output splitting options
    -s, --split if this option is specified, the output will be split to multiple (--split) files (i.e. 0001.out.fq, 0002.out.fq...). (int [=0])
    -d, --split_prefix_digits the digits for the slice number padding (1~10), default is 4, so the filename will be padded as 0001.xxx, 0 to disable padding (int [=4])

    # help
    -?, --help print this message
    OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
    FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

Latest Articles

Collapse

  • seqadmin
    How RNA-Seq is Transforming Cancer Studies
    by seqadmin



    Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
    09-07-2023, 11:15 PM
  • seqadmin
    Methods for Investigating the Transcriptome
    by seqadmin




    Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

    Whole Transcriptome RNA-seq
    Whole transcriptome sequencing...
    08-31-2023, 11:07 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 09-22-2023, 09:05 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-21-2023, 06:18 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-20-2023, 09:17 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-19-2023, 09:23 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Working...
X