Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastQC: 2 peak per sequence GC content

    Dear all,

    I have some genomic pair-end data from a nematode. I ran FASTQC to have an overview of the data.

    I was surprised to see a "Per sequence GC content" graph with 2 peaks (see image attached).
    I ran trimmomatic but the graph of per sequence GC content remained the same.
    Do you know why I get this profile?

    Best,
    Sophie
    Attached Files

  • #2
    It *may* be indicative of contamination from an unrelated species/source. Have you tried to analyze the data? Is this a simple WGS experiment?

    Comment


    • #3
      What should be the normal GC content? 41? Is there anything within the genome, which could have the other GC content?

      I had once also 2 peaks in some samples.
      Was a low GC bacterium (30%). The second peak (50%) turned out to be totally from the rRNA operons within this bacterium. Our guess was that the GC bias of the adapter ligation kicked somehow in, and ruined the dataset. The supplier doesn't know what happened.
      I'm not sure if that could be the case here, because I don't know if you have biological differences within the DNA in your sample, but is probably worth checking.

      Comment


      • #4
        Hello GenoMax and bastianwur,

        Thanks a lot for your answers.

        We don´t know what the GC content is for this species. We do think it is around 35-40% as in other worms.

        After talking to the people in my lab, the second peak around 70% could very much be due to a bacterium present in the gut of the worm.

        Otherwise, the strain used is inbred but I believe still presents biological differences. I wouldn´t say that would explain the 2nd peak though.

        Do you think it is still possible to do a genome assembly on this data?

        Anyhow, thanks for your answers,
        Sophie

        Comment


        • #5
          We're normally assembling here meta-genomes and -transcriptomes, and haven't encountered many problems with the different species.
          One of my colleagues has a paper in submission, where they investigated that and got very little false assemblies.
          -> assembling 2 totally different organisms from this dataset shouldn't be a problem.
          You might have to do some QA though, to ensure that everything gets corretly assigned/separated.

          Comment


          • #6
            Hi Sophie,

            We observed a similar bimodal distribution from C. elegans samples contaminated with Streptomyces (and the relative height of the high-GC peak varied with the degree of contamination). You could BLAST a sampling of the GC-rich reads and see if they match any known species.

            Comment


            • #7
              If you know what that bacterium (present in the gut) is (and if a genome is available for that species or a close relative) you could try to separate your reads into two pools before trying assembly.

              You can do that easily with BBSplit.

              Comment


              • #8
                Dear all,

                Sorry for the late reply.
                Thanks a lot for your answers! They were much appreciated.

                Unfortunately, I don´t know the gut bacterium of this nematode. But I´ll try doing what HESmith suggested and see if its sequenced I´ll do what GenoMax suggested.

                To Genomax: thanks for telling me about BBSplit! I didn´t know about that tool.

                To Bastianwur: Your message made me very happy! It is very good to know that there shouldn´t be problems assembling this peculiar data. Good luck for the publishing!

                Cheers,
                Sophie

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Analysis Tools
                  by seqadmin


                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                  Yesterday, 07:48 AM
                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 07:17 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 05-02-2024, 08:06 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-30-2024, 12:17 PM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-29-2024, 10:49 AM
                0 responses
                29 views
                0 likes
                Last Post seqadmin  
                Working...
                X