Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PoorSeq
    replied
    Good point gringer! however, because I have already flash-stitched the reads, I expect that the sequence I'm looking for will be in the same orientation in all reads. Still, I would see if I have any potholes. Also, I chose 14nt because my samples are from a bacteria with a genome size of 4.2 million, so I expect anything equal or more than 12nt will be unique.

    Leave a comment:


  • gringer
    replied
    Originally posted by PoorSeq View Post
    bioawk -c fastx '/SEQUENCE/ {print "@"$name; print $seq; print "+"; print $qual }' inut.fq > output.fq
    Hmm, bioawk seems pretty neat.

    Note that what you've got there won't work for reverse complement orientation, so you'll need to have both forward and reverse included. Also, picking any 14+nt substring of SEQUENCE (or its reverse-complement) will be a bit trickier to implement.

    Leave a comment:


  • PoorSeq
    replied
    Thanks all for the answers, specially dariober for the code. I have also developed a bioawk code later which I was able to use. The code is:

    bioawk -c fastx '/SEQUENCE/ {print "@"$name; print $seq; print "+"; print $qual }' inut.fq > output.fq

    Yes, my aim is to collect all the reads that contains a sequence (or subsequence, at least 14nt) and make a new file.

    Leave a comment:


  • dariober
    replied
    Originally posted by PoorSeq View Post
    I'm pretty new to bioinformatics and sorry if it's not worthy of asking here.

    I have a FLASH stitched fastq file from my paired end data, from which I want to sort the reads containing a particular sequence or part of that sequence in any orientation, and make a new fastq file with them. Is there any easy tool/code to do that?
    Hi- As gringer pointed out, you should clarify what you are trying to do. Anyway... This sequence of unix commands reads a fastq file and outputs the reads in fastq format matching a regular expression. See if it helps:

    Code:
    ## Get reads containing substring AAA or its revcomp TTT
    gunzip -c fastq.fq.gz \
    | paste - - - - \
    | grep -P '^@.*?\t(.*?AAA.*?)|(.*?TTT.*?)\t\+' \
    | tr '\t' '\n' \
    | gzip > sub.fq.gz
    
    ## Example input fastq:
    @seq1
    ACTGAAACTG
    +comment
    IIIIIIIIII
    @seq2
    ACTGNNNCTGTTT
    +comment
    BBBBBBBBBBBBB
    @seq3
    CCCCCCCCCCCCC
    +comment
    BBBBBBBBBBTTT
    @seq4
    AAACCCCCCCCCC
    +comment
    BBBBBBBBBBTTT
    
    ## Output sub.fq.gz
    @seq1
    ACTGAAACTG
    +comment
    IIIIIIIIII
    @seq2
    ACTGNNNCTGTTT
    +comment
    BBBBBBBBBBBBB
    @seq4
    AAACCCCCCCCCC
    +comment
    BBBBBBBBBBTTT
    If your input is unzipped use "paste - - - - < fastq.fq" instead of "gunzip -c fastq.fq.gz \
    | paste - - - -"

    Leave a comment:


  • yueluo
    replied
    If you are trying to get reads that align to a particular sequence, try bowtie2. Not sure if that's your purpose though?

    Leave a comment:


  • gringer
    replied
    I have a FLASH stitched fastq file from my paired end data, from which I want to sort the reads containing a particular sequence
    Okay, that's doable with grep, and very quick.

    or part of that sequence
    Wait, what? You want any subsequence? Will one base pair do? What are your limits?

    Leave a comment:


  • how to parse reads containing a particular sequence in any orientation

    I'm pretty new to bioinformatics and sorry if it's not worthy of asking here.

    I have a FLASH stitched fastq file from my paired end data, from which I want to sort the reads containing a particular sequence or part of that sequence in any orientation, and make a new fastq file with them. Is there any easy tool/code to do that?
    Last edited by PoorSeq; 10-30-2013, 10:25 PM.

Latest Articles

Collapse

  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM
  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 05:03 AM
0 responses
15 views
0 reactions
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:27 AM
0 responses
12 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-18-2025, 12:50 PM
0 responses
14 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
185 views
0 reactions
Last Post seqadmin  
Working...