Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastq file format for paired end sequences

    Hi,

    I have got my sequencing data from a sequencing-core-facility. It has been done with illumina paired end sequencing. But the reads identifiers for the forward and reverse read of one sequence is not match at all. In addition the second part of identifier (related to the paired number) is always one.
    The other problem is with the indexes, they are not same some times in an individual file.

    e.g.
    Read 1:
    @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

    @HWI-ST1018:135:H0A9YADXX:1:1101:2172:1979 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:1:1101:2146:1994 1:N:0:GGCTAC




    Read 2:
    @HWI-ST1018:135:H0A9YADXX:2:1101:1400:1999 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:2:1101:1657:1985 1:N:0:GGCTAC

    @HWI-ST1018:135:H0A9YADXX:2:1101:1612:1996 1:N:0:GGCTAC


    Could you please help me with identifying the format of my files?

    Thanks,
    Rozita

  • #2
    Contact your core facility to find out what they've done.

    Both those files are read denoted as being read 1. See http://en.wikipedia.org/wiki/FASTQ_format for the header description.

    @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

    I would also recommend they use 0 errors when demultiplexing. The error should be low enough not to have to include indexes with errors unless there was a problem during the run (BMS during index read).

    Comment


    • #3
      Thanks Tony. Yes both files have read 1. and they don't have the same identifier. I just wanted to be sure that there wouldn't be any other format for fastq apart from the one which you also have mentioned.

      Comment


      • #4
        Something is up with that data. Your sequencing facility should be able to help you out.

        Comment


        • #5
          Originally posted by rozitaa View Post
          Hi,

          I have got my sequencing data from a sequencing-core-facility. It has been done with illumina paired end sequencing. But the reads identifiers for the forward and reverse read of one sequence is not match at all. In addition the second part of identifier (related to the paired number) is always one.
          The other problem is with the indexes, they are not same some times in an individual file.

          e.g.
          Read 1:
          @HWI-ST1018:135:H0A9YADXX:1:1101:1124:1996 1:N:0:GGCTAT

          @HWI-ST1018:135:H0A9YADXX:1:1101:2172:1979 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:1:1101:2146:1994 1:N:0:GGCTAC




          Read 2:
          @HWI-ST1018:135:H0A9YADXX:2:1101:1400:1999 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:2:1101:1657:1985 1:N:0:GGCTAC

          @HWI-ST1018:135:H0A9YADXX:2:1101:1612:1996 1:N:0:GGCTAC


          Could you please help me with identifying the format of my files?

          Thanks,
          Rozita
          Those sets of reads come from two different lanes; lane 1 and lane 2 as indicated by the number shown in red.

          Comment


          • #6
            Yes I see. Thanks. But they are in a same file representing 2 reads of one seq. I should contact them and figure it out.

            Comment


            • #7
              Fastq file format for paired end sequences

              The R1 and R2 reads of a pair are usually in different files.

              How many files did you get from the sequence provider, and what were the files called?

              Comment


              • #8
                Actually, I got one file for each sample (e.g. "P424_101_index11"). inside that there are two different files ("130419_AH02WFADXX", "130423_AH0A9YADXX") and based on their words only one of them is the experiment which is valid (the red one). In the inner directory I can file two fastq files ("1_130423_AH0A9YADXX_P424_101_index11_1.fastq" and "2_130423_AH0A9YADXX_P424_101_index11_1.fastq"). Some of the lines of each files are presented previously as examples.

                Comment


                • #9
                  Fastq file format for paired end sequences

                  You need to contact the sequence provider and find out what they did.

                  If they ran a paired-end experiment, then you should have files with the R2 reads matching the R1 reads that you already have.

                  You appear to have two files for the same sample, run on lane 1 and lane 2, and from what you showed previously, both files are R1.

                  Running the samples in more than one lane would be expected if you woudn't get enough reads from one lane of sequencing, or if you have several multiplexed samples, and you want to run each sample in the same lanes so as to avoid lane effects.

                  You need to find out whether the sequencing center performed a single-end or paired-end run with your samples, and if they did do a paired-end run, what have they done with the R2 files.

                  Comment


                  • #10
                    Yeah, Thanks all.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Understanding Genetic Influence on Infectious Disease
                      by seqadmin




                      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                      09-09-2024, 10:59 AM
                    • seqadmin
                      Addressing Off-Target Effects in CRISPR Technologies
                      by seqadmin






                      The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                      08-27-2024, 04:44 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 06:25 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 01:02 PM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-18-2024, 06:39 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-11-2024, 02:44 PM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X