Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • pasta
    Member
    • Jan 2011
    • 27

    RNA-seq read coverage questions

    Hi there,

    I have a question about reads coverage and RNA-seq. I am analyzing our Illumina paired-end data that we obtained from bacterial mRNA with Artemis (alignement with BWA) and I noticed some "funny" coverage profiles. With all the experts on this forum, I am sure one of you will be able to help me.

    case #1

    On this picture we can see that gene B is expressed a lot compared to others. Also, we can see that the signal obtained from gene B's mRNA seems to "decay" on the beginning of gene A and the inter-ORF. How can this phenomenon can be explained ?



    case #2

    Notice the big bump upstream of gene B. It looks like some unreliable annotation or some cryptic-mRNA, what do you think ?


    Is there any software / pipeline that takes into account case #1 and also discover/correct known genome annotations ?

    Thank you for your comments and answers,

    pasta
    Last edited by pasta; 02-09-2011, 08:27 AM.
  • JohnK
    Senior Member
    • Feb 2010
    • 106

    #2
    what gene model are you using?

    Comment

    • pasta
      Member
      • Jan 2011
      • 27

      #3
      John,
      We used YACOP which uses several ORF finders: Critica, Glimmer and Z-curve.

      Comment

      • JohnK
        Senior Member
        • Feb 2010
        • 106

        #4
        It could possibly be a number of things worth investigating including- PCR dup. removal (dependent on the number of PCR cycles you did), 5'/3' bias dependent on the method for creating your cDNA library, which can happen during fragmentation of your isolated mRNA too, an unannotated gene in your gene model (maybe try something like refSeq or ensure all the transcripts in your gene model are present), or a repetitive region upstream of your gene, which caused read-mapping difficulties.

        Comment

        • Richard Finney
          Senior Member
          • Feb 2009
          • 701

          #5
          In many bacterial genomes, the genes are quite tightly packed on to the genome.
          Check out http://microbes.ucsc.edu/cgi-bin/hgTracks to see for yourself.

          It is possible that they are genes. You may have to get that sequence (area in question) and tune down the parameters to see if they match a domain using your favorite motif finding software. You might run a blast to see if there's homology to another organism.

          Other possibilities is that they are regulatory elements.

          The region may not be unique. Check the bwa flags for the reads for more insight. I guess, in bacteria, two "snp" values might tell you there's a dupe.

          Just some thoughts, I'm no bacteria expert.
          Last edited by Richard Finney; 02-09-2011, 09:15 AM.

          Comment

          • JohnK
            Senior Member
            • Feb 2010
            • 106

            #6
            You might also want to check for fRNA contamination. It's a possibility...

            Comment

            • pasta
              Member
              • Jan 2011
              • 27

              #7
              Thank your for these answers, that's very nice from you. I forgot to mention that all rRNA sequences were removed from our analysis.
              For case #2, I blasted the sequence : no homology found; however I found 1 nice promoter sequence. FYI, genes A and B are DNA a integration protein and a transposase respectively. That's vey interesting !

              Do you have any explanation for the first case ?

              Comment

              • pmiguel
                Senior Member
                • Aug 2008
                • 2328

                #8
                What was your method of cDNA synthesis/library construction? It could be an artifact of these processes.

                --
                Phillip

                Comment

                • pasta
                  Member
                  • Jan 2011
                  • 27

                  #9
                  Originally posted by pmiguel View Post
                  What was your method of cDNA synthesis/library construction? It could be an artifact of these processes.

                  --
                  Phillip
                  Total RNA was treated twice with MicrobExpress (ambion) to remove most rRNA.
                  mRNA was fragmented to prepare cDNA with hexanucleotides as primers and RNase H was used on the other strand. Then, Illumina adapters were added before the PCR.
                  Someone told me that the behavior that we can see in case #1 is rather normal with prokaryots. Transcription does not stop exactly at the end of the ORF, some mRNA can be longer. What do you think ?

                  Thanks

                  antoine

                  Comment

                  • pmiguel
                    Senior Member
                    • Aug 2008
                    • 2328

                    #10
                    Yes, I would buy that explanation.

                    Prokaryotic messages are said to be rapidly turned-over. If this turn-over takes the form of exonucleases, that also would cause lower 5' and 3' ends in your sequencing results.

                    --
                    Phillip

                    Comment

                    • nasobema
                      Member
                      • Jul 2010
                      • 14

                      #11
                      @case 1:
                      I believe it might be because of methodological bias. Some methods preferentially enrich 5'-ends of mRNAs while others do so for 3'-ends.

                      Your method is not strand-specific, so you cannot tell, whether you see Gene B downstream transcript or actually the gene A transcript. So, your "procaryotic" explanation is also possible, though I wouldn't expect such a long tail (just a feeling, however)

                      @case 2:
                      I'll vote for repetitive region here. You say, gene B's a transposase? Such genes move genomic elements within an between genomes, often integrating at similar sites and carrying additional DNA. While the transposase itself can be a repeat within the genome, I would also expect to find more repetitive sequence in the vicinity.

                      Comment

                      • pasta
                        Member
                        • Jan 2011
                        • 27

                        #12
                        Thank you very much for your explanations, I appreciate. I am really starting to understand The Biology behind the data, if that makes sense.

                        Thanks again !

                        Toni

                        Comment

                        • niazi84@hotmail.com
                          Member
                          • Jan 2010
                          • 25

                          #13
                          Originally posted by pasta View Post
                          John,
                          We used YACOP which uses several ORF finders: Critica, Glimmer and Z-curve.
                          i want to use Orpheus and Z-curve along with it but unable to find it anywhere on the web. Did you use it? Can you tell from where i can download these two.

                          regards,
                          adnan
                          ~Adnan~

                          Comment

                          • Simon Anders
                            Senior Member
                            • Feb 2010
                            • 995

                            #14
                            Not being a biologists, and never having worked with procaryotes, I apologize if this question might be stupid, but: Bacteria don't have UTRs? Not only translation but also transcription starts exactly at the start codon and stops at the stop codon? Otherwise, what is surprising about the transcript reaching beyond the gene boundary, if your gene model comes from an ORF finder?

                            I'm working a lot in yeast, and there, many genes look like case #1. It seems as the promoter recruits the polymerase to a quite well defined position where transcription starts, but where it stops (or more precisely: where the poly-A tail is placed) seems to be rather a region, or a colelction of several possible places, given the 3' end this "decaying" appearance. As for case #2: there are so many non-coding transcripts in eukaryotes (and in prokaryotes as well, maybe?) that I would be rather surprised if I did not find transcripts that don't overlap with an ORF.
                            Last edited by Simon Anders; 05-01-2011, 10:56 PM.

                            Comment

                            • sshell
                              Junior Member
                              • Jan 2009
                              • 6

                              #15
                              I know it's an old post but in case people are still reading it, I wanted to add that bacterial certainly DO have UTRs, so it is normal and expected that transcription from two convergent genes will overlap. Bacterial terminators are also not always sharp; transcription can end over a range of positions downstream of the stop codon. Gene "B" in the example fits this pattern.

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 07-02-2026, 11:08 AM
                              0 responses
                              12 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              54 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...