Unconfigured Ad

**sdriscoll** · 04-27-2012, 04:57 PM

since alignments are alignments you could align them separately and output as SAM files then use Samtools to merge and sort the alignments.

**epi** · 04-30-2012, 05:53 AM

Thanks for response. But will the alignment be not different when aligned together or in isolation. eg unique matches

**dpryan** · 04-30-2012, 06:14 AM

The individual alignments will be the same regardless. Depending on how you made your library, it might make sense to align the lanes separately (for accurate PCR duplicate calling, which is presumably what you meant by "unique match"). Aside from that, there's no difference aside from the number of keystrokes required.

**epi** · 04-30-2012, 06:33 AM

You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.

**Alex Renwick** · 04-30-2012, 07:17 AM

Originally posted by epi View Post

You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.

Could you explain more what you mean by this? Typically, each read is aligned independently of others, then the results are merged for subsequent analysis.

**dpryan** · 04-30-2012, 07:47 AM

Originally posted by epi View Post

You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.

This is a bit ambiguous in English. "reads matching to genome at one place" can either mean "uniquely mapped reads" (most likely you mean this) or "reads mapping only to a specific region of the genome" (presumably you don't mean that). In neither case will the results differ depending on whether you invoke bowtie once or multiple times. I recall there being auxiliary flags that indicate multiple alignments of which only one was returned and/or a flag to just not return those (something like -m in bowtie1, haven't used it in a while though).

As you quoted Simon Andrews as saying in another thread, "For straight forward alignments (Bowtie, BWA etc) then the two operations would be the same".

**sdriscoll** · 04-30-2012, 09:04 AM

Originally posted by epi View Post

You point is well taken. But there is another situation in addition, which is when you want only reads matching to genome at one place, not various. If you align in batches, you can not have this done accurately.

You might be thinking of this backwards. Each read, of which you have millions, is unique but could in fact all align to the same genomic region. What is meant by unique alignments in RNA-Seq is for each read to only be able to align in one spot. What you WANT is for reads to align on top of one another....that's how we are able to measure gene expression and do anything, really.

Just align with Bowie using the -m 1 -k 1 options. That will produce unique alignments per read.

**epi** · 05-02-2012, 05:08 AM

Nice to see the discussion. I guess it depends on individual experiment how much of an issue PCR duplicates might be. Won't this be a good practice to always merge fastq before align to remove any possible bias.

**Heisman** · 05-02-2012, 05:54 AM

Originally posted by epi View Post

Nice to see the discussion. I guess it depends on individual experiment how much of an issue PCR duplicates might be. Won't this be a good practice to always merge fastq before align to remove any possible bias.

If you want to remove PCR duplicates, then you should merge all data before removing PCR duplicates if all of the data comes from the same prepped library. If the data comes from different prepped libraries, you should merge after removing the duplicates.

**epi** · 05-02-2012, 08:36 AM

Originally posted by Heisman View Post

If you want to remove PCR duplicates, then you should merge all data before removing PCR duplicates if all of the data comes from the same prepped library. If the data comes from different prepped libraries, you should merge after removing the duplicates.

In other words, if library corresponds to sample, which i believe is the case with the data I have, same sample run in multiple lanes should be merged and then aligned.
This clarifies a lot. I have heard some opinions from bioinformaticians that this is immaterial. In fact, even further breaking down the fastq into smaller fragments (for whatever reasons) should not matter for alignment.

**Alex Renwick** · 05-02-2012, 08:56 AM

Originally posted by epi View Post

In other words, if library corresponds to sample, which i believe is the case with the data I have, same sample run in multiple lanes should be merged and then aligned.
This clarifies a lot. I have heard some opinions from bioinformaticians that this is immaterial. In fact, even further breaking down the fastq into smaller fragments (for whatever reasons) should not matter for alignment.

Heisman points out that if you have different samples you should align first, remove duplicates, then merge. You conclude that since you have just one sample, you need to merge first and then align. That conclusion does not logically follow. The fallacy is common enough to have it's own name: Denial of the Antecedent.

It really sounds like you had your mind made up before coming here with your question. Everyone who responded has told you that it doesn't matter whether you align then merge or vice versa. You don't have to believe them, but if someone takes the time to offer guidance you should at least do them the curtesy of plainly stating the basis of your disagreement.

**rnaseek** · 05-07-2012, 10:53 AM

I think it is better to do the alignment individually. This will help check for lane specific biases, if there is any. In addition, aligning individually will help do the alignment in parallel.

**analyst** · 05-08-2012, 08:08 AM

When using splice aligners for RNA-Seq, must merge and then align for obvious reasons. For regular aligners (bowtie etc.) I still do merge first and remove PCR duplicates and then align. As far as speed, it does not bother me as it takes only a few minutes to align anyways. Also using a parellelized tool as bowtie, I would rather dedicate all available nodes to merged lane than splitting them among 2 individual lanes running simultaneously. After all you have to merge them at some stage anyways for the actual analysis, file management can be cleaner to do it right from the beginning. I see from comments people do it alternatively as well, I guess its just my preference for the analysis. I also do not understand Alex's comments, epi's interpretation of Heisman's response seems fine.
Logically, it should not matter if you can take care of PCR duplicates at some stage in your pipeline. But practically, i have some strange experiences using combination of publicly available tools and their behavior. I will have to do a complete analysis by myself to believe if splitting would cause any real issue or not. if anyone has gone on to do the same, please share here. With ll due respect, I am sticking to my approach till then.

**epi** · 05-10-2012, 05:15 AM

Thanks for commenting analyst, I just don't care about responses like Alex's. Unfortunately he is not the only person in public forums and in scientific world who like to get personal in scientific discussion. Basically, it seems they try to push their own agenda and preferences onto the other without even understanding what is being discussed, like this case. May be he is a big advocate of one particular strategy and feels insecure if some one even mentions any other. Or may be he just is looking for places to use the phrase of the day he learnt, this tendency is even more common and has it's own name: talking through the hat. Unlike his example, this even fits.
But overall this is an excellent forum with good collection of people and experts. Actually, I am not familiar with the steps upstream of the NGS data generation, like sample and library prep, so I feel I am more educated after these discussions. Some people state their opinion and some even the reasons behind it, both are useful.

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Today, 08:59 AM	0 responses 8 views 0 reactions	Last Post by SEQadmin2 Today, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 15 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

aligning multiple fastq for the same sample

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News