Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Completely new to this and out of my depth

    Hi, I am a complete newbie to the entire realm of sequence analysis. My background is in Computer Engineering, so I don't understand a lot of the details on how primers, etc. work.

    What I'm trying to do is, I think, pretty straightforward. We've received data from Roche, including the sff files which contain the raw reads. Previously, I have worked with other data, starting from Fasta files which contain about 10-100 aligned sequences. That is the type of file I would like to produce.

    Currently, I am attempting to do this the AVA software. I create a project, define the appropriate amplicons and MID sequences, and hook up the samples and data with multiplexers. I _think_ that I am doing all of this correctly, the manual was very helpful.

    To get the fasta files, I am using the doAmplicon 'report align' command. When I report the consensus files, I get about ~1000 sequences. Individually, I have about ~10000 sequences.

    I don't mind having a lot of sequences, but the similarity in them is very high. Infections that are known to have existed for a while are exhibiting low diversity, characteristic of much more recent infections.

    I am not sure if my workflow is appropriate for what I am trying to do - perhaps I would be better served by the gsMapper or gsAssembler programs?

    Any advice would be very much appreciated.
    Thank you,
    -Fwip

    P.S: There is probably relevant important information that I am not including - please ask for clarification and I will do my best to answer. Thank you!

  • #2
    Hi Fwip,
    Could you give us some details of the experiment lab-side? Did you create the amplicons or did Roche? How much starting material was used in the PCR reaction to create the amplicons and how did you assay the amount of starting material. Also, what was the starting material (eg, RNA, ssDNA, dsDNA), etc.
    --
    Phillip

    Comment


    • #3
      Oh goodness... I have very little knowledge of what went on lab-side. I am reasonably sure that Roche created the amplicons. I have no idea on the other questions. If it helps, this is data on HIV infection.

      Sorry I can't be more helpful here.

      Comment


      • #4
        Well then suffice it to say that if the amount of RNA given to Roche was limiting, the number of viruses assayed may not have been sufficient to detect any but the most common variants.

        Amplicons employ PCR. PCR, as long as it gets a single, amplifiable, chunk of DNA to start with, can usually produce lots of DNA given enough cycles of amplification.

        That said, I find the AVA software to be quite inscrutable. So, it is also possible that AVA is hiding most of your variants for one reason or another. There are ways, however, to force it to reveal them all.

        --
        Phillip

        Comment


        • #5
          Hmmm, thank you. I guess it is certainly possible that the samples were not terribly informative and represent only a small portion of the genetic diversity.

          The results seem similar for all of the samples that I have analyzed so far, so I am still hoping it is user error on my part.

          I'll keep toying around with this software, then.

          Thank you,
          Fwip

          Comment


          • #6
            I hasten to add, that there are some default settings that would tend not to display rarer variants. But those become obvious as you tinker around with the AVA GUI.

            --
            Phillip

            Comment


            • #7
              The more I try to work this, the more I think that my workflow may be incorrect. I haven't had any training with this software, so I am not even sure what programs I should be using, or in what order.

              Here is where I am, and my goals:

              The data we received included read data from eight gaskets, split up into 8 SFF files. There was also an FNA and QUAL file for each gasket, which I believe was derived from the respective SFF file. Each gasket had 4 MID tags. All samples had an identical forward 5' primer and one of two different reverse 5' primers, but are still uniquely identified by their gasket/MID combo.

              I would like to get a FASTA file for each gasket/MID combination, representing the reconstructed sequences. If a variant is very common, it should show up in the FASTA file multiple times. Alternatively, if I could get a number representing the frequency at which it appears, that would work.

              What should my workflow look like for this? Should I do this entirely in gsMapper, gsAssembler, or gsAmplicon? Should I take the output of one program and analyze it in another?

              Thank you very much for your help,
              Fwip

              Comment


              • #8
                gsAmplicon should be able to handle the .sff files as they are, although you may need to specify the info for the MIDs.

                gsMapper -- there you would probably want to use sfffile to break the .sff into separate MID sffs. Given your MID structure this would not be trivial, although it should be do-able.

                Seems like your best bet is gsAmplicon. Are you using the GUI?

                --
                Phillip

                Comment


                • #9
                  Thanks for the help.

                  I am currently primarily using the GUI, just because it gives me the most feedback on what I am doing. I'm using the command-line tool to extract the data once the computation is complete though (using 'report align'), because I could not figure out how to do that with the GUI.

                  -Fwip

                  Comment


                  • #10
                    Originally posted by Fwip View Post
                    ...

                    I would like to get a FASTA file for each gasket/MID combination, representing the reconstructed sequences. If a variant is very common, it should show up in the FASTA file multiple times. Alternatively, if I could get a number representing the frequency at which it appears, that would work.
                    ...
                    What should my workflow look like for this? Should I do this entirely in gsMapper, gsAssembler, or gsAmplicon? Should I take the output of one program and analyze it in another?
                    Working with amplicons you should use gsAmplicon and doAmplicon. Both are complex programs which take a while to master. Thus it is hard to troubleshoot what problems you may be having. However having 1000s of consensus sequences for a given sample and reference sounds high.

                    Phillip's and my most recent project -- admittedly a simple one with 1 amplicon, 4 samples/MIDs, and 1 reference -- generates between 15 to 23 consensus sequences per sample. This is starting from 65,000 to 100,000+ reads per sample. We also only found 4 variants in all of the samples. Such a low variance may explain our low number of consensus sequences.

                    Perhaps, in order to add troubleshooting, you could within doAmplicon a 'list' of your 'amplicon', 'mid', 'sample' and 'variant' and tell us, if not the exact entries, at least the counts of what you have. This could aid us in figuring out how complex your project is.

                    Comment


                    • #11
                      Originally posted by westerman View Post

                      Phillip's and my most recent project -- admittedly a simple one with 1 amplicon, 4 samples/MIDs, and 1 reference -- generates between 15 to 23 consensus sequences per sample. This is starting from 65,000 to 100,000+ reads per sample. We also only found 4 variants in all of the samples. Such a low variance may explain our low number of consensus sequences.
                      Ours were from a host gene. Fwip's are from a virus. Completely different set of expectations. The high error rate of HIV replication is thought to aid it in evading the host immune system. Fwip actually expected to see more variants than he did.

                      --
                      Phillip

                      Comment


                      • #12
                        Originally posted by pmiguel View Post
                        Ours were from a host gene. Fwip's are from a virus. Completely different set of expectations. The high error rate of HIV replication is thought to aid it in evading the host immune system. Fwip actually expected to see more variants than he did.

                        --
                        Phillip
                        No doubt that Fwip's experiment will be expected to produce different expectations than was anticipated in our experiment. Did Fwip actually mention the number of variants he saw? I did not see it in the posts. Perhaps he sent you private mail?

                        Anyway by having him do a various 'list's (via doAmplicon) I can get a feel for if he is setting up and running the Amplicon software correctly. As you probably recall getting the MID part setup properly was tricky for our experiment.

                        Fwip: If you want to send private email to Phillip and/or myself then please do so. My address is just my user name at purdue.edu

                        Comment


                        • #13
                          Thank you both for the help - it turns out that a large part of it, at least, was that my reference sequence was incorrectly setup. I had not realized that the reference sequence included the primers, and so I was not using the correct reverse primer for the data I was looking at. On top of this, I wasn't narrowing the amplicon range to only include the area between the primers, which could not have helped things.

                          I've rerun one gasket with these corrections, and already the data looks much better. Thank you!

                          Comment


                          • #14
                            Hello, not sure how much relevance this has to you now considering the date and that I may not have accurately gauged the question you´re asking, but you could run the relevant sections of your sequences through the online Stanford DB for drug-resistance mutations in HIV as a validation of the results you´re getting.

                            Comment


                            • #15
                              Hello
                              I am also using Ava variant analyzer. The software produces the consensus sequence but I would like to change the parameters for producing the consensus alignment. Do anyone know how to change the consensus generating parameters of Ava software or anyone know any script out there that can generate consensus alignment from Ava multiple aligned sequences?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X