Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ila14
    Junior Member
    • Jan 2014
    • 3

    RNA-Seq for transposon insertion analysis

    Hello all,
    I am a new student to bioinformatics and CLC genomics so would be grateful for any help given as I'm feeling slightly lost.

    I have been given illumina data (fastq files) consisting of the genome sequences to a bacterial genome, initially to perform single- end RNA-Seq analysis and differential expression analysis.
    My main focus is to find: The number of mutants.
    The number of transposon insertions in total.
    The number of transposon insertions per gene.

    I have three conditions: Starting, first output and second output i.e. 3 groups.
    I have replicate sequencing data for the first output and second output data however I only have one set of data for the starting.

    Using CLC genomics, I have been able to trim my sequences, map them to an annotated genome and perform RNA-Seq analysis.
    (An example to aid simplicity of my explanation. My experiment contains 1 starting library, 3 first output replicates and 3 second output replicates).
    To start my comparison, I did a box plot of the three groups. This revealed that the individual samples all have a similar distributions, especially with respect to their own group although, the locations of the distributions differed. Because of this, I normalized my samples by quantile normalization which indeed made the samples comparable exemplified by a box plot showing each sample as having an equal distribution. Was this the right thing to do? Should I have chosen a different normalization method?

    Next, I performed a statistical test on the proportions of the two groups using the starting library as the reference. I performed an unpaired Baggerley test of the three groups and chose the RPKM as the expression value to be used in the test.
    (see manual here http://www.clcsupport.com/clcgenomic...oportions.html )

    Now what I am unsure of the following things:
    What is the easiest way to identify insertions into the genome through the data without individually checking the read mapped onto each gene?
    What is the correct way to normalize my data?
    Statistically what is the best test to use?
    What values are most important for my test?
    Should I use the fold-change, total gene reads or unique gene reads instead of the RPKM for the expression value to be used in the test?

    Thanks in advance
  • TiborNagy
    Senior Member
    • Mar 2010
    • 329

    #2
    CLC has a built in variant caller algorithm. You can use it to find indels (http://www.clcbio.com/files/tutorials/Resequencing.pdf)
    You should use RPKM instead of the other metrics you mentiond.

    The answers to your other questions are depends on the experiment conditions.

    Comment

    • ila14
      Junior Member
      • Jan 2014
      • 3

      #3
      Thank you for your response.
      What conditions precisely do you mean?


      Originally posted by TiborNagy View Post
      CLC has a built in variant caller algorithm. You can use it to find indels (http://www.clcbio.com/files/tutorials/Resequencing.pdf)
      You should use RPKM instead of the other metrics you mentiond.

      The answers to your other questions are depends on the experiment conditions.

      Comment

      • HESmith
        Senior Member
        • Oct 2009
        • 512

        #4
        We developed a simple strategy, split-end alignment, for mapping transposon insertions (described here).

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Today, 11:10 AM
        0 responses
        5 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        41 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        102 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        123 views
        0 reactions
        Last Post SEQadmin2  
        Working...