Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Code:awk '{print ; getline } {print substr($0, 11, 76) ; getline; print ; getline ; print substr($0, 11, 76) }' input.fastq
Leave a comment:
-
Thanks a lot kmcarr! I was searching for it, just missed the inside. I will give it a try.
Best!
Yifang
Leave a comment:
-
Originally posted by yifangt View PostMy question is related, but not quite the same, which is: I need to get a sub-string of the sequence and the corresponding quality score for each of the entries, the file format untouched. My Illumina reads consists of 101bp long. I want remove the first 10bp and last 25bp then only the middle 66bp left.
The reason I need do this is my DNA sequence is methylated and the first 10 and last 25bp seem not having good quality in general so that I want get rid of them. My challenge is to remove both quality score and sequence correspondingly.
Code:use Bio::SeqIO; use Bio::Seq::Quality; $seqio = Bio::SeqIO->new('-format'=>'fastq' , '-file'=>'some.fasq'); my $out_fastq = Bio::SeqIO->new( -format => 'fastq', '-file'=> 'subset.fastq'); while((my $line = $seqio->next_seq() ) { # keep the id # substring the sequence (from 11 to 66) # substring the quality score (from 11 to 66) $out_fastq->write_seq($line); }
Thanks a lot!
Yifang
First rule of coding, never write your own when a tool already exist to do what you want. The FASTX-Toolkit has a utility to do exactly what you want, namely the fastx_trimmer. You give an input file (fasta or fastq), the postion of the first and last base you wish to keep (in your case 11 and 76) and it will produce a trimmed (bases and quality) file.
Leave a comment:
-
My question is related, but not quite the same, which is: I need to get a sub-string of the sequence and the corresponding quality score for each of the entries, the file format untouched. My Illumina reads consists of 101bp long. I want remove the first 10bp and last 25bp then only the middle 66bp left.
The reason I need do this is my DNA sequence is methylated and the first 10 and last 25bp seem not having good quality in general so that I want get rid of them. My challenge is to remove both quality score and sequence correspondingly.
Code:use Bio::SeqIO; use Bio::Seq::Quality; $seqio = Bio::SeqIO->new('-format'=>'fastq' , '-file'=>'some.fasq'); my $out_fastq = Bio::SeqIO->new( -format => 'fastq', '-file'=> 'subset.fastq'); while((my $line = $seqio->next_seq() ) { # keep the id # substring the sequence (from 11 to 66) # substring the quality score (from 11 to 66) $out_fastq->write_seq($line); }
Thanks a lot!
YifangLast edited by yifangt; 08-13-2011, 04:44 AM.
Leave a comment:
-
Originally posted by zlu View PostPerhaps I didn't explain it clearly. I wasn't trying to convert the Illumina short reads but mapped consensus sequence of a few chromosomes. Each is a few Mbp long, e.g:
@chr1
nnnnnnnagatagaaataCACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTT
GGTCCGAAGCATGAGTGTTTACAATGTTTGAAtaCCTTATACAGTTCTTATACATACTTT
ATAAATTATTTCCCaagctgttttgatacactcactaacagaTATTCTATAGAAGGAAAA
GTTATCCACTTATGCACATTTATAGTTTTCAGAATTGTGGATAATTAGAAATTACACACA
AAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAAAAAG
TATAATTGTGTGGATAAGTCGTCCAACTCATGATTTTATAAGGATTTATTTATTGATATT
TACATAAAAATACTGTGCATAACTAATAAGCAGGATAAAGTTATCCACCGATTGTTATTA
+
!!!!!!!????????BBBEEEHKKKKKKKKKKKKNNNNNNQQNNNNNNQWWWW7WZWWWZ
ZZZ]]]`````````>]]]]]]]ZTQQQTQNNKKKKKKHKKHHHEEEEEEEHHHHHHHHH
HKKHHHHHHEEEEEBBBBBBBBBBBB?=B@BBBBBB??BBBBEEEEEEEEEEHKNNNNNQ
QQTTTTWWWWWWWTTWZWWW]`>```c``]`ccc``cfcfliiloouSxxuuuuTollLo
olliifilloolfif````]]WWWWWTTTTTTTTQQQQQQQNKKHHHHEEEEHEEEHHEE
EEEEEEEHHHHHHHHHEEEEEEHHHHHEEHKHHHHHHHHHHHEEHHHHHHHHKNNNNKNN
NQQ@NKNNNQQQTWWWZZZBZZZZ]<`]ZZZZZZZZLZZ``]]]ZZZWTTTQQQNNNNNK
And using the method described above, this is what I got:
>chr1nnnnnnnagatagaaataCACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTT
>`]]``fGff`````]iifc`````cciiLoxuuuxZ{{xxruuxx{{{~~rrrrrrrrr
Ooiiff`ZZZWTTTTTTQQNTQTWWWWWW]]]`]]cfifffciiiiiorrxx{{~x~{{x
>QNNQQQTTQQQQNNNNQZ``cfffolllloollruuuurrroruxxxxx{~~xuroorr
rrorruxxxroooux{{{xxuuuuuruurrrrollccfcc]ZZB]?]]W7QQTTWWWWWW
>QTT7TQQTTTWZZZZ]]]]]]ZZZZ]]]]]]`````]]]]]`]`c`]```Z]``fflll
louxxxxuS{{{{{~~~~~~{{~{{{{xuuroiilroiiollorlloorZ{~~{xxxuuu
>KKKKKKKHHEB????BBBEEKHHHKKKKNNNNQQQQTZZZZZZ]]````````]]WWW<
WWWW7TTTTTTTQKKKKKKHHEEEE<BBEHHH=HEEEEEHHHHEEEEHHHHHHKKKKKKK
>QQQBQQQNNNNKKKKKKQQQQQQTTTTWWZWTTQQQQQQQQTTWZZZZZ]ZZTWWTTTQ
QQTQQNNNTWWWZZ]]]ZZWWWWWWTTTTQQQEQQQQNNNNNNHHHHEEBEEEEEBBBBB
>QQQATTTTTTTWWTTTQQQQQQQNNKKKQQQTTTT7TTQQ5QNNNQKNNQQQQQQQQQQ
QQN4HKKHHHHHHHHHHHHHKHHEEBBBBBBBBBBBBBBB7??????????????!!!!!
>LAcffffooor{{Z~~~~~~~~~~|~~~q~~~~~v~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~c~~xuuric```]]ZZWTNKHEEB????!!!!!!!!!!!!!!!!!!
>QQQ5>QQ@QNNNNNNQQQQNNNNNNKHEBBBB???BBBBBBBBBEEEEEEBBBBBBBBB
BBBBBBEHKNNKKKKKKNNNKKKKKKKKKKKW]]]cccfffc`]Z`>cfiOollllllLl
>NQQTTTWQQQQQQTTNNNQQQT7GTTAWWWTTTZZW7TQTTQQQQQQTQQTTTQTTTTT
WWZWZ]]]]WZ]``D]]]D```D]]]Z]]]ZZWWWT7@QNKKKKKHEBBBBHKHHHNQTT
here "chr1" is chromosome
Leave a comment:
-
The version in MAQ svn, which is never released, parses multi-line fastq and accepts gzipped input, as it uses the same parser as bwa. Nonetheless, as maq only works with reads no longer than 127bp, not supporting multi-line fastq is not a major problem. It is more important for bwa to parse multi-line fastq as it works with long reads.
Leave a comment:
-
Just because the consensus is that line wrapping in FASTQ is/was a bad idea, doesn't mean it can't be reliably parsed. It does make the code a little more complicated I admit - but as this thread shows, it would be useful to some if MAQ could understand line-wrapped FASTQ.
Leave a comment:
-
The original Sanger FASTQ files also allowed the sequence and quality strings to be wrapped (split over multiple lines), but this is generally discouraged as it can make parsing complicated due to the unfortunate choice of "@" and "+" as markers (these characters can also occur in the quality string).
Leave a comment:
-
Originally posted by lh3 View PostMost of fastq parsers do not work with multi-line fastq files. You'd better write your own script. After all, processing multi-line fastq is not that hard.
Leave a comment:
-
Using the patched emboss, I can now convert the line wrapped fastq files. Thank you.
Leave a comment:
-
Originally posted by maubp View PostIs that plain un-patched EMBOSS 6.1.0? They are currently on patch 3, and this did include some FASTQ fixes.
Originally posted by maubp View PostHow big is the file? If you zipped it up and emailed it to me I could try it here for you if you liked
Thanks for your help. I'll try to process the fastq with a script.
Leave a comment:
-
Most of fastq parsers do not work with multi-line fastq files. You'd better write your own script. After all, processing multi-line fastq is not that hard.
Leave a comment:
-
Originally posted by zlu View Postyes, the lines are indedd wrapped. Thanks for the reminder.
Originally posted by zlu View PostHowever, trying seqret of EMBOSS 6.1,
ftp://emboss.open-bio.org/pub/EMBOSS/fixes/README.fixes
Originally posted by zlu View PostI got the following error:
[SBSUser@pipeline Assembly2]$ seqret -sformat fastq-sanger
Reads and writes (returns) sequences
Input (gapped) sequence(s): CNS.fq
Error: Unable to read sequence CNS.fq'
$ seqret -sformat fastq-sanger -sequence CNS.fq -osformat fasta -outseq CNS.fasta
How big is the file? If you zipped it up and emailed it to me I could try it here for you if you liked.
Leave a comment:
-
yes, the lines are indedd wrapped. Thanks for the reminder.
However, trying seqret of EMBOSS 6.1, I got the following error:
[SBSUser@pipeline Assembly2]$ seqret -sformat fastq-sanger
Reads and writes (returns) sequences
Input (gapped) sequence(s): CNS.fq
Error: Unable to read sequence CNS.fq'
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, Today, 09:30 AM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
Today, 09:30 AM
|
||
Started by seqadmin, 02-05-2025, 10:34 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
02-05-2025, 10:34 AM
|
||
Started by seqadmin, 02-03-2025, 09:07 AM
|
0 responses
25 views
0 likes
|
Last Post
by seqadmin
02-03-2025, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
33 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
Leave a comment: