Announcement

Collapse
No announcement yet.

what is a paired-end read?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Thanks to everyone who contributes to these basic threads. I know they may get tedious for the old-timers, but they're vital for newer folks. Even if the same info is technically available at reference sites, it's often easier to understand when the answer is to a specific, basic question, rather than organized as vendor documentation or generalized teaching material!

    Comment


    • #32
      scaffold and contig helps a lot~ thanks a lot~

      Comment


      • #33
        About single-end read

        For a double-stranded DNA molecule, are the single-end reads generated by sequencing from both 3' ends of the two strands of DNA?

        Comment


        • #34
          This helped but not enough ! I think I get it but then questions pop up - this mates in a pair, paired read stuff pops up when I look a the specification for the SAM format output from the bowtie program. There are flags values that are returned to indicate if the aligned read is one of pair and/or first one in a pair and/or second one in a pair and so on..

          But how does it, meaning bowtie, know ? Am I right in thinking that is specified in the input record ? I can see the .fastq input format has a place holder for this, so I'm guessing that other input formats eg. sra also have it ? And if they don't then there's no way for bowtie ( or other algorithms ) to derive it ?

          Seeing as this is all equipment specific I'm going to need to look at some videos that describe the front parts of the this entire operation. Any ideas where ? I was hoping to just pick it up and work with it from the point of sequences of characters - an IT perspective - but guess not.

          Comment


          • #35
            What is the difference between mate pairs, pair end and single end?

            Hi!!
            Can anyone clarify the difference between mate pair, pair end and single end reads?

            Thanks.

            Comment


            • #36
              Originally posted by edilana.gomes View Post
              Hi!!
              Can anyone clarify the difference between mate pair, pair end and single end reads?

              Thanks.
              Hi,

              Mate-pairs and paired-end reads have been covered. Single end reads are just that... a single read from one end of each sheared DNA fragment.

              Scott.

              Comment


              • #37
                Hello,

                Suppose I have two reads from an exome in fastq. How to determine if they are pair-end or single-end or mate-pair ?

                Comment


                • #38
                  Originally posted by raonyguimaraes View Post
                  Hello,

                  Suppose I have two reads from an exome in fastq. How to determine if they are pair-end or single-end or mate-pair ?
                  i could be wrong but i think u can get ur answer by decoding the name of the fastq read... i.e the string following the > symbol.

                  -A

                  Comment


                  • #39
                    Originally posted by raonyguimaraes View Post
                    Hello,

                    Suppose I have two reads from an exome in fastq. How to determine if they are pair-end or single-end or mate-pair ?
                    Paired end or mate pair reads have to have a means of knowing which two reads go together. Usually, this is by the name. In Illumina at least, reads are normally named by their coordinates on the flowcell. So if you don't have two reads with the same coordiantes, you've got single end.

                    Paired ends run towards each other, and are about 100-500 bp apart. Mate pairs run away from each other, and tend to be a few kb apart, but I belive sometimes they are contaminated with ordinary paired end data.

                    Comment


                    • #40
                      I have some questions about aligning paired end sequencing reads. I am using BWA sampe function to align my paired end reads and it worked, but surprisingly almost all reads are being paired with reads on a different chromosomes, resulting in a lot of "improper reads". I don't understand why BWA did that and I wonder if it was because I used the command "bwa sampe -a 15000 -A" to force bwa to not run smith waterman alignment for unmapped reads.

                      Also, if paired end reads share the same x and y coordinates, which are indicated by the first line of their fastq files, why doesn't bwa just pair them up by their coordinates? That seems like the most straightforward way to find the right pair to me.

                      Comment


                      • #41
                        Hi all I am still struggling with using BWA to align my paired end reads. I used the command:

                        bwa sampe -P -s hg19.fasta CATTCG_1.sai CATTCG_3.sai CATTCG_1.fastq CATTCG_3.fastq > CATTCG_PE.sam

                        and the first few lines of the program running look like this:

                        [bwa_sai2sam_pe_core] convert to sequence coordinate...
                        [infer_isize] fail to infer insert size: too few good pairs
                        [bwa_sai2sam_pe_core] time elapses: 10.96 sec
                        [bwa_sai2sam_pe_core] changing coordinates of 6 alignments.
                        [bwa_sai2sam_pe_core] align unmapped mate...
                        [bwa_sai2sam_pe_core] time elapses: 0.00 sec
                        [bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
                        [bwa_sai2sam_pe_core] print alignments... 1.99 sec
                        [bwa_sai2sam_pe_core] 262144 sequences have been processed.
                        [bwa_sai2sam_pe_core] convert to sequence coordinate...
                        [infer_isize] (25, 50, 75) percentile: (3520, 39961, 70863)
                        [infer_isize] low and high boundaries: 94 and 205549 for estimating avg and std
                        [infer_isize] inferred external isize from 27 pairs: 37726.370 +/- 35311.353
                        [infer_isize] skewness: 0.341; kurtosis: -1.395; ap_prior: 1.00e-05
                        [infer_isize] inferred maximum insert size: 251007 (6.04 sigma)
                        [bwa_sai2sam_pe_core] time elapses: 10.87 sec
                        [bwa_sai2sam_pe_core] changing coordinates of 178 alignments.
                        [bwa_sai2sam_pe_core] align unmapped mate...
                        [bwa_sai2sam_pe_core] time elapses: 0.00 sec
                        [bwa_sai2sam_pe_core] refine gapped alignments... 0.82 sec
                        [bwa_sai2sam_pe_core] print alignments... 1.97 sec
                        [bwa_sai2sam_pe_core] 524288 sequences have been processed.
                        The program seems to be running fine and quite quickly, but when I look at the output file, I see something like this:

                        DJB775P1_0215:5:1101:1262:2347#0 65 chr13 92966174 37 94M chr5 33346129 0 AATAACCACCTAGATAAATGTTCACTCATCTCGCCTGTCTAGCCTGTCTTGAGGCCGGTTTCATCATGAGTCACTCCACCAATTACTTCAAAAC cgggeghhhhfhhfffgffegbcfgffdhfffhhhdb^efgfddfhhhhffS\eefgg\W`c]RZ^__GU]UMMZ_Z]\_a^_TR_bbbbYYYW XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
                        DJB775P1_0215:5:1101:1495:2155#0/3 129 chr5 33346129 37 94M chr13 92966174 0 AATTAACTTCCTTTTTTTGTCTTCATATAACACTGTTGACCTACTCATATTGAGCCCTCAGTCTTTTTTGTACACATGCTCATCCCTGGCATGT ceggggiiiiiiiiiiiiihiiiiihiiihihifgffhiig`fgfghhfhghffhhihigfgfgeeceacabbcccb`b_bcccccc_X[`b^Y XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
                        DJB775P1_0215:5:1101:1465:2351#0 81 chr14 44601925 37 94M chr5 33346129 0 TCTCCTACCTCCTCTCCCTTATAGAAATCCCTGTGATTCCATTAGTCTCACCTGGATAAACCAGAGTATTCTTATTATCCCAAGATCCCCATCT XTR^YGG\^ZZZRbb]]ccc_db]d^dgfcb`bZZe_bSfc`fgf_fc^Iffhhhhfhfagddhfhhhhfffgeefddfedbbd_agfcgebe\ XT:A:U NM:i:1 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:38A55
                        DJB775P1_0215:5:1101:1483:2169#0/3 161 chr5 33346129 37 94M chr14 44601925 0 AATTAACTTCCTTTTTTTGTCTTCATATAACACTGTTGACCTACTCATATTGAGCCCTCAGTCTTTTTTGTACACATGCTCATCCCTGGCATGT ceYbgae_egihffhhd_^efhihfhhfXcghfcgacgfI^[cba\eecgZ_HWWHLaZ`VVVb`gacccZb_RZ]`]bbcbbbb^bc^`X][S XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
                        DJB775P1_0215:5:1101:1317:2430#0 81 chr9 8239023 37 94M chr12 51288201 0 TTAAGTATTAAATGACATAAAACCTATAAAGCACATAGCAGGTAAATGTGGTAAACTCTTGATAAATGTTATTGTTATCATCATCATCATCACT b`]a^VHRcaggggbgeghgc\bhgefbZ\MW[gfgce^[gfff^^aa^OXeaYIae[hhhhgf^[d[hgfge_hhhgd_hY^bfdbcegecZc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:94
                        DJB775P1_0215:5:1101:1461:2198#0/3 161 chr12 51288201 37 94M chr9 8239023 0 CTGTTGGCTGGAATGTAAAATGGTGCAGCTGCTGTGGAAAACTGCATGGCAGTTCCTAGAAAAATTAAAAATAGAATTACCATATGATCCAGCA egggggefdf`egg[bdgh`]gh^dbe`dfhhbffbgIIX^^e_fgffhabgH\\_\HM\d]dUGV\\VV_ZVVHHUZ__bbc]`BBBBBBBBB XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0
                        It is really peculiar that most if not all read pairs have been mapped to different chromosomes. How is this possible? When I use samtools to filter out the correctly paired reads, I only obtained a very small file.

                        Can someone tell me why my reads are paired so weirdly? Any help is greatly appreciated!

                        Comment


                        • #42
                          If the reads have wildly different names, they aren't supposed to be paired with eaach other.

                          bwa assumes that the first read of the first fq goes with the first read of the second fq, and so on. That doesn't appear to be the case here, that's why your "pairs" are all over the place.

                          Comment


                          • #43
                            Thank you so much! Now that I make sure that both input files have the same fragment's reads in the same orders, everything is working now.

                            Comment


                            • #44
                              Hello to the SEQanswers community!

                              I came looking for the answer of a simple question on paired-end reads and I found much more (useful) info on this thread.

                              Thanks to all the contributors

                              Comment


                              • #45
                                Hey,

                                I just wanna ask about paired-end data filtering. Do I need to filter read 1 and read 2 separately or combine read 1 and read 2 then filter? Because later on I want to use the filter data for RNA-seq analysis, using Tophat and cufflink. But, the Tophat require the read 1 and read 2 as input not as paired-end.

                                Comment

                                Working...
                                X