Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alignment trouble with pair-end reads

    Hi,

    I performed alignment of both R1 and R2 FASTQ files on the hg19 genome recovered from UCSC ftp site with the BWA soft. The rawdata are generated by an Illumina platform by pair-end procedure.
    Each FASTQ file are about 8Go.

    Code:
    bwa aln –t 6 –f sample1_R1.sai ~/hg19 sample1_R1.fastq.gz
    bwa aln –t 6 –f sample1_R2.sai ~/hg19 sample1_R2.fastq.gz
    The first output file sample1_R1.sai is about 1.8 Go.
    Curiously the second sample1_R2.sai is about 349 Mo

    Is it normal, or can I suspect an potentical mismatch ?

    Regards

  • #2
    What kind of QC have you done on your samples? It is possible that read 2 may have had some problem in terms of quality (N's, adapters etc).

    Comment


    • #3
      The QC seems correct, it look likes the QC of the first file.

      Code:
       
      ##FastQC	0.10.0
      >>Basic Statistics	pass
      #Measure	Value	
      Filename	Sample1_R2.fastq.gz	
      File type	Conventional base calls	
      Encoding	Sanger / Illumina 1.9	
      Total Sequences	75530219	
      Filtered Sequences	0	
      Sequence length	101	
      %GC	49	
      >>END_MODULE
      >>Per base sequence quality	pass
      #Base	Mean	Median	Lower Quartile	Upper Quartile	10th Percentile	90th Percentile
      1	31.78916283825418	33.0	31.0	34.0	30.0	34.0
      2	31.724956325096848	33.0	31.0	34.0	30.0	34.0
      3	31.969437358575647	34.0	31.0	34.0	30.0	34.0
      4	35.51238230886104	37.0	35.0	37.0	33.0	37.0
      5	35.5300227714155	37.0	35.0	37.0	33.0	37.0
      6	35.55949994001739	37.0	35.0	37.0	33.0	37.0
      7	35.54004348908349	37.0	35.0	37.0	33.0	37.0
      8	35.53263192577265	37.0	35.0	37.0	33.0	37.0
      9	37.12799921843203	39.0	37.0	39.0	33.0	39.0
      10-14	37.355309519227	39.2	37.2	39.4	33.2	39.4
      15-19	38.359193818834285	40.0	38.0	41.0	33.6	41.0
      20-24	38.190496135063505	40.0	38.0	41.0	33.2	41.0
      25-29	37.97000282760997	40.0	38.0	41.0	33.0	41.0
      30-34	37.71478195766915	40.0	37.6	41.0	32.4	41.0
      35-39	37.26693037921683	39.8	36.6	41.0	31.2	41.0
      40-44	37.00259240609378	39.0	36.0	41.0	31.0	41.0
      45-49	36.89229021300732	39.0	35.8	41.0	30.8	41.0
      50-54	35.66621703559472	38.2	34.4	39.8	28.8	40.6
      55-59	35.89709581803278	38.2	34.8	40.0	28.8	41.0
      60-64	35.69028127139417	37.6	34.8	40.0	29.0	41.0
      65-69	35.02425800195284	36.4	34.0	39.6	28.6	41.0
      70-74	34.11854835479823	35.2	33.6	38.6	27.2	40.6
      75-79	33.187774863991855	35.0	33.0	36.8	26.4	39.2
      80-84	32.32925205208262	35.0	32.6	35.8	25.2	37.8
      85-89	31.605413348530078	35.0	32.0	35.0	24.0	36.4
      90-94	30.96580436500522	34.4	31.2	35.0	21.8	35.8
      95-99	30.222672790078896	34.0	31.0	35.0	14.8	35.0
      100-101	28.821457349408718	33.5	30.0	35.0	2.0	35.0
      >>END_MODULE
      >>Per sequence quality scores	pass
      #Quality	Count
      2	75112.0
      3	22642.0
      4	23926.0
      5	30440.0
      6	44937.0
      7	60532.0
      8	75153.0
      9	89726.0
      10	101984.0
      11	113823.0
      12	121694.0
      13	126118.0
      14	128681.0
      15	133180.0
      16	142141.0
      17	155564.0
      18	174203.0
      19	200186.0
      20	232822.0
      21	272656.0
      22	324788.0
      23	389655.0
      24	468098.0
      25	566989.0
      26	681375.0
      27	825302.0
      28	1017544.0
      29	1262710.0
      30	1582489.0
      31	2020455.0
      32	2642315.0
      33	3580784.0
      34	5080759.0
      35	7530436.0
      36	1.1418846E7
      37	1.555542E7
      38	1.4548388E7
      39	3692948.0
      40	15398.0
      >>END_MODULE
      >>Per base sequence content	pass
      #Base	G	A	T	C
      1	23.038904351212466	28.518652323245213	24.119821810587553	24.32262151495477
      2	25.47333939779624	27.658614256544045	24.96407837187099	21.903967973788724
      3	24.24828259937834	25.833871754312522	25.82532657346947	24.092519072839668
      4	24.50727118697187	25.79431974776043	25.68221527276635	24.016193792501355
      5	24.94238452211129	26.0537283695356	25.60309363935187	23.400793469001243
      6	24.787984960665277	25.68108299246541	24.706239947458638	24.824692099410676
      7	24.68817130035484	25.744942745694765	25.34181047779494	24.225075476155457
      8	23.748514734957425	25.803241504718287	25.793226741185087	24.6550170191392
      9	23.903029046222844	25.46559428486792	25.936504991689514	24.694871677219727
      10-14	24.023634679227467	25.821287466045582	25.854546127673444	24.300531727053507
      15-19	24.11725021765147	25.63776257409824	25.775543873281126	24.469443334969164
      20-24	24.28764274271679	25.578742347304782	25.628124726505398	24.505490183473025
      25-29	24.36098600820047	25.63218899305288	25.522305838000577	24.484519160746075
      30-34	24.432423042355964	25.59018823930711	25.45520663436979	24.522182083967135
      35-39	24.493599691872298	25.546950569380485	25.39981538129036	24.559634357456865
      40-44	24.504228505399965	25.529200635160954	25.408192365434157	24.558378494004927
      45-49	24.530562724701387	25.46801849540325	25.385716286441333	24.61570249345403
      50-54	24.602490489354164	25.463437493616777	25.362844075835962	24.5712279411931
      55-59	24.625528952569915	25.451035252284328	25.34563677029044	24.577799024855317
      60-64	24.57810881200606	25.415184762325087	25.370759574688762	24.63594685098009
      65-69	24.609696631494323	25.403954724581514	25.368003496628383	24.618345147295777
      70-74	24.67549612897643	25.423105794561046	25.33035915629237	24.571038920170153
      75-79	24.68021619108372	25.408665595709053	25.326773426999843	24.584344786207385
      80-84	24.70617108233374	25.435781446511253	25.315358526422276	24.542688944732735
      85-89	24.746523206364934	25.460701715727303	25.277099049022056	24.515676028885707
      90-94	24.777061489078818	25.448097579999253	25.248048363294	24.52679256762793
      95-99	24.825401624815772	25.480491656320208	25.205063167361818	24.489043551502203
      100-101	25.06509391569436	25.525392562816755	25.03385996559098	24.375653555897905
      >>END_MODULE
      >>Per base GC content	pass
      #Base	%GC
      1	47.361525866167234
      2	47.37730737158496
      3	48.34080167221801
      4	48.523464979473225
      5	48.34317799111253
      6	49.61267706007595
      7	48.9132467765103
      8	48.403531754096626
      9	48.59790072344257
      10-14	48.32416640628097
      15-19	48.58669355262064
      20-24	48.79313292618982
      25-29	48.84550516894654
      30-34	48.954605126323095
      35-39	49.05323404932916
      40-44	49.06260699940489
      45-49	49.14626521815541
      50-54	49.17371843054726
      55-59	49.203327977425225
      60-64	49.21405566298615
      65-69	49.2280417787901
      70-74	49.24653504914659
      75-79	49.2645609772911
      80-84	49.24886002706648
      85-89	49.26219923525064
      90-94	49.30385405670675
      95-99	49.314445176317975
      100-101	49.44074747159226
      >>END_MODULE
      >>Per sequence GC content	warn
      #GC Content	Count
      0	148.0
      1	140.5
      2	171.0
      3	240.0
      4	314.0
      5	391.5
      6	463.0
      7	571.0
      8	724.0
      9	867.5
      10	1072.5
      11	1480.0
      12	1997.5
      13	2641.5
      14	3742.0
      15	5517.5
      16	8444.5
      17	13783.0
      18	22903.0
      19	36961.0
      20	57176.5
      21	84764.5
      22	122109.0
      23	171345.5
      24	231820.0
      25	303734.5
      26	389014.0
      27	486112.0
      28	590732.5
      29	701439.5
      30	818243.0
      31	939007.0
      32	1064825.5
      33	1196112.0
      34	1328451.5
      35	1455560.0
      36	1575536.5
      37	1691530.5
      38	1799603.0
      39	1888094.0
      40	1954504.5
      41	2000125.5
      42	2032290.0
      43	2053817.5
      44	2062843.0
      45	2069797.0
      46	2081374.5
      47	2091211.0
      48	2093568.5
      49	2090674.0
      50	2092548.0
      51	2092246.5
      52	2081642.5
      53	2075864.0
      54	2076360.5
      55	2075926.0
      56	2069627.5
      57	2057972.0
      58	2039630.5
      59	2010183.5
      60	1965690.5
      61	1897710.5
      62	1809579.0
      63	1712422.5
      64	1598704.0
      65	1462091.5
      66	1316420.0
      67	1165964.0
      68	1005059.0
      69	833335.0
      70	662096.0
      71	508145.5
      72	378685.0
      73	277673.0
      74	202031.5
      75	146246.0
      76	106894.5
      77	78283.5
      78	57723.5
      79	42530.0
      80	30859.0
      81	22268.5
      82	15885.5
      83	11202.5
      84	7546.5
      85	4846.0
      86	3024.5
      87	1999.5
      88	1395.5
      89	972.0
      90	693.0
      91	554.0
      92	447.5
      93	333.5
      94	262.5
      95	215.0
      96	170.5
      97	123.0
      98	90.5
      99	67.0
      100	45.0
      >>END_MODULE
      >>Per base N content	pass
      #Base	N-Count
      1	3.971920166152305E-5
      2	0.008051082176790722
      3	0.0030027716456111427
      4	0.0056162951149393596
      5	0.0048179391615427464
      6	0.003542952788207856
      7	0.012376503237730582
      8	0.015505052355269881
      9	0.018805718013342448
      10-14	0.010111184769635052
      15-19	6.889957514885532E-4
      20-24	0.0011237886123433589
      25-29	7.080609682860843E-4
      30-34	2.9604044971721845E-4
      35-39	6.566908008038479E-4
      40-44	1.2180555176200404E-5
      45-49	7.056778161863928E-4
      50-54	3.0107154859434473E-4
      55-59	8.846790183409902E-4
      60-64	0.0054881344909115115
      65-69	0.006780597312977472
      70-74	0.0024824501038451905
      75-79	0.11694498065734457
      80-84	0.15226673710558153
      85-89	0.11905221670282726
      90-94	0.12773960049023558
      95-99	0.0874582397278631
      100-101	0.07236176556035141
      >>END_MODULE
      >>Sequence Length Distribution	pass
      #Length	Count
      101	7.5530219E7
      >>END_MODULE
      >>Sequence Duplication Levels	warn
      #Total Duplicate Percentage	39.93871449925262
      #Duplication Level	Relative count
      1	100.0
      2	32.0416780733846
      3	11.81321188289656
      4	5.018126311772562
      5	2.8404801606065884
      6	1.9387272591523357
      7	1.3323046547704969
      8	1.1697070754834376
      9	0.9341065014144331
      10++	8.82755531221224
      >>END_MODULE
      >>Overrepresented sequences	pass
      >>END_MODULE
      >>Kmer Content	warn
      #Sequence	Count	Obs/Exp Overall	Obs/Exp Max	Max Obs/Exp Position
      AAAAA	25336500	3.1579587	4.208278	1
      TTTTT	23759670	3.0652945	3.70244	10-14
      >>END_MODULE

      Comment


      • #4
        I'm sorry, I haven't more informations about the sequencing part.
        May be it was done by a private company.

        Comment


        • #5
          You appear to have lot of N's in read 2 in the middle of the read. Perhaps that is the problem.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Best Practices for Single-Cell Sequencing Analysis
            by seqadmin



            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
            06-06-2024, 07:15 AM
          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 06-07-2024, 06:58 AM
          0 responses
          13 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-06-2024, 08:18 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-06-2024, 08:04 AM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-03-2024, 06:55 AM
          0 responses
          15 views
          0 likes
          Last Post seqadmin  
          Working...
          X