Hey persorrels,
If i was in your place, i proceed this way.
I suggest you trim the first 4 bases and run FASTQC on the trimmed file and checked the QC chart.
--> If you see the mean quality- the blue line it falls more or less near 30, above 28.
--> The median quality- the red line is above 30 for all bases.
That means, only certain reads bases of the file are not of good quality.
But, any how trim the first 4 bases, align and see.
Do it. You will learn a lot by going.
Best,
Vishnu.
Header Leaderboard Ad
Collapse
Minimum Read Length for BWA
Collapse
Announcement
Collapse
SEQanswers June Challenge Has Begun!
The competition has begun! We're giving away a $50 Amazon gift card to the member who answers the most questions on our site during the month. We want to encourage our community members to share their knowledge and help each other out by answering questions related to sequencing technologies, genomics, and bioinformatics. The competition is open to all members of the site, and the winner will be announced at the beginning of July. Best of luck!
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
For a list of the official rules, visit (https://www.seqanswers.com/forum/sit...wledge-and-win)
See more
See less
X
-
Human, 10x exome. Total sequence size is about 30Gb. 100bp per read. Attached are some sample statistics from reverse reads.
Leave a comment:
-
You can usually use the default minimum size of whatever trimmer you're using as a guide. Often minimum sizes of 20 or 30 are used (I wouldn't bother going much lower than that, since anything shorter will probably just become a multi-mapper).
Leave a comment:
-
Hey persorrels,
what is the organism you have sequenced, what is the coverage.
how much is the data size. how many reads does your data have ?
why are you seeing individual reads.
Initially, you need to run a quality check on your entire reads together of Read1 and Read2 individually.
then see the quality output chart, then proceed for trimming.
Best,
vishnu.
Leave a comment:
-
Vishnu, thanks for your reply.
It is true that most forward reads are of high quality. But the reverse reads aren't. Below are two examples:
NTATATTTTCCTCTTGGTGGTATTGAAAACCAGTGAGCAGAGAGCATAAGAACAGAACTTCAAGACCGTGGCAGGAGCTTGTATTTGTACAGCACAAACCC
+
#+12??A;A>@>C>[email protected]>ABB;=BBBB<=A;=AA>==AAAB=>AAAA################################
NAAGGAGCAGCTGCGTGCCGCGTGAGCTTTAGCAGGAGGACCAGTGATTAGCATTTACGATGCAAAGACAGAACAACTTCGTATAGGACTGTACCCCTGGA
+
#+1<?7AA<CBABBC<CCAAA=)?153*=A?A#####################################################################
It is an extreme example, but you can recover a 18bp region (AA<CBABBC<CCAAA=) from the second read. I guess what you're saying is, if I had to trim a large portion of the read, I should ignore it entirely.
Is that correct?
Also, I suspect short reads like this will affect the performance of BWA. I want to understand how the read length distibution affects the performance of BWA.
Any comments on that?
Cheers,
Per
Leave a comment:
-
Hey persorrels,
It looks like your approach of trimming the bases and discarding reads is not appropriate.
Illumina gives more or less good quality reads except at the 1st 3-4 base positions and may be at last 2 base positions.
--> It doesnt make sense of getting reads of size 30,40,50 after trimming 100 bp size reads.
--> Just be concerned about the first few bases quality, if they are above Q20, you may proceed with alignment. if not above Q20, just trim those bases, that will do.
Good luck ahead,
Vishnu.
Leave a comment:
-
Minimum Read Length for BWA
Hi all,
I am preprocessing a dataset from a human sample sequenced by Illumina HiSeq 2500 (Paired-end reads, 100bp each). I first trim each read based on quality. If the trimmed sequence is too short, I just discard it.
My question is how do you pick the threshold length to discard? Would you discard reads shorter than 50, 40, or 30? What is the right approach to pick a threshold?
I haven't been able to find any information on this on the web. (By the way, I am using BWA for alignment.)
Thanks in advance.
Latest Articles
Collapse
-
by seqadmin
Developments in sequencing technologies and methodologies have transformed the field of epigenetics, giving researchers a better way to understand the complex world of gene regulation and heritable modifications. This article explores some of the diverse sequencing methods employed in the study of epigenetics, ranging from classic techniques to cutting-edge innovations while providing a brief overview of their processes, applications, and advances.
Methylation Detect...-
Channel: Articles
05-31-2023, 10:46 AM -
-
Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysisby seqadmin
After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine;...-
Channel: Articles
05-23-2023, 12:26 PM -
-
by seqadmin
Continuing from our previous article, we share variant analysis and genome assembly tools recommended by our experts Dr. Medhat Mahmoud, Postdoctoral Research Fellow at Baylor College of Medicine, and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.
Variant detection and analysis tools
Mahmoud classifies variant detection work into two main groups: short variants (<50...-
Channel: Articles
05-19-2023, 10:03 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 07:14 AM
|
0 responses
6 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:14 AM
|
||
Started by seqadmin, 06-06-2023, 01:08 PM
|
0 responses
6 views
0 likes
|
Last Post
by seqadmin
06-06-2023, 01:08 PM
|
||
Started by seqadmin, 06-01-2023, 08:56 PM
|
0 responses
160 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 08:56 PM
|
||
Deep Sequencing Unearths Novel Genetic Variants: Enhancing Precision Medicine for Vascular Anomalies
by seqadmin
Started by seqadmin, 06-01-2023, 07:33 AM
|
0 responses
297 views
0 likes
|
Last Post
by seqadmin
06-01-2023, 07:33 AM
|
Leave a comment: