Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mastercoder
    Member
    • Dec 2016
    • 11

    MSA on large scale of sequences?

    Hello guys,

    I have 15 miRNA seq, around 8MB each. What i am trying to do is, apply Multiple Sequence Alignment and get a consensus sequence after that do the annotation. I ve tried Clustalx, and Omega also others such as Kalign, Muscle. But i can not get any result from any of them. Can anyone help me with this?
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    Originally posted by mastercoder View Post
    I have 15 miRNA seq, around 8MB each.
    Those are some huge microRNAs!

    Seriously, though, can you clarify a bit? Do you mean you have 15 files, 8MB each, gzip-compressed fastqs of single-ended 50bp miRNA reads, for example - and if not, what exactly do you have? And when you say you tried X, Y, and Z, what were your command lines, what did they print to the screen, and what was the output? Also, what's your experiment?

    Comment

    • mastercoder
      Member
      • Dec 2016
      • 11

      #3
      Originally posted by Brian Bushnell View Post
      Those are some huge microRNAs!

      Seriously, though, can you clarify a bit? Do you mean you have 15 files, 8MB each, gzip-compressed fastqs of single-ended 50bp miRNA reads, for example - and if not, what exactly do you have? And when you say you tried X, Y, and Z, what were your command lines, what did they print to the screen, and what was the output? Also, what's your experiment?
      First, thanks for replying. I ll start with ur last question.

      I have 15 miRNA paired-end seqs 29bp reads. First i used velvet on trimmed data and then SSPACE. The scaffolds for each are ranging from 2MB to 8MB depending on the kmer i used while doing the assembly. After this using UGENE I merged these scaffolds into single sequence. I did this step for each of them. And what I am told is apply MSA on these files. Get a consensus and do the annotation on this consensus seq.
      So these files are no gzip compressed. They are .fa files.
      About X,Y and Z when i use smaller files it gives me an MSA output (.aln) but when i try the X,Y,Z on my actual data. It gives nothing. It just works eventho it has been more than a week. It did not give any output although these softwares are using my cores.

      I am sorry if this does not make sense, but fresh graduate, and could not find somebody to give me a lead.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        @mastercoder: This is not making sense. miRNA's are inherantly small. Why are you trying to assemble them?

        What did you start this analysis with? What is the aim of the experiment?

        Comment

        • mastercoder
          Member
          • Dec 2016
          • 11

          #5
          Originally posted by GenoMax View Post
          @mastercoder: This is not making sense. miRNA's are inherantly small. Why are you trying to assemble them?

          What did you start this analysis with? What is the aim of the experiment?
          The trimmed data of these are really huge. As you can see on the picture

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            There is no doubt there are lots of reads.

            But what experiment are they from? miRNA sequencing? Are you trying to identify how many miRNA's (known?) are there in the samples? What is the point of doing an MSA?

            Comment

            • mastercoder
              Member
              • Dec 2016
              • 11

              #7
              Originally posted by GenoMax View Post
              There is no doubt there are lots of reads.

              But what experiment are they from? miRNA sequencing? Are you trying to identify how many miRNA's (known?) are there in the samples? What is the point of doing an MSA?

              There is a treatment and a control group. Each has 15 sequence from rats. What I am told is find out known and novel miRNA's. So i thought i can assemble, get the scaffolds and then get it into a single sequence and apply MSA so I can get a consensus sequence from both group. and then I do the annotation on the consensus sequence, instead of doing it one by one.

              Is this all wrong?

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                Originally posted by mastercoder View Post
                Each has 15 sequence from rats.
                This is not making sense. Did you mean to say that you are only interested in 15 genes/regions?

                Code:
                What I am told is find out known and novel miRNA's.
                The first part can be done by aligning against miRBASE data. No need to do any assembly (if fact that may give you some odd results). For the novel discovery part you can look for software that can do that. Here is one example.

                Code:
                So i thought i can assemble, get the scaffolds and then get it into a single sequence and apply MSA so I can get a consensus sequence from both group. and then I do the annotation on the consensus sequence, instead of doing it one by one.
                This part is not making much sense. You need to ask whoever asked you to do this for further clarification.

                Comment

                • mastercoder
                  Member
                  • Dec 2016
                  • 11

                  #9
                  @GenoMax
                  No, What I mean is I have 15 miRNA sequences from 15 rats that are control. and other 15 miRNA from 15 rats that are treatment. That is why i was trying to get a consensus sequence from each group. So should I try to align these sequences against miRBASE data one by one?

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    Ah. So you have 15 sequence files (not literally 15 sequences) each for control and treatment. Is that correct?

                    If that is the case then you can align each of them against the miRBASE (not sure if you only want the rat sequences subset from there) to identify reads that align to known miRNA. Then ones that don't align to miRBASE could go into other software to look for novel ones.

                    Comment

                    • mastercoder
                      Member
                      • Dec 2016
                      • 11

                      #11
                      Originally posted by GenoMax View Post
                      Ah. So you have 15 sequence files (not literally 15 sequences) each for control and treatment. Is that correct?

                      If that is the case then you can align each of them against the miRBASE (not sure if you only want the rat sequences subset from there) to identify reads that align to known miRNA. Then ones that don't align to miRBASE could go into other software to look for novel ones.
                      GenoMax, I really am thankful to you. Sorry to make you straggle a bit. Last 2 question, please bear with me. Should I do aligning against miRBASE with my trimmed data or the assembled ones (scaffolds). Lastly Is there any article or a source or some other keywords that you can give me?

                      Comment

                      • GenoMax
                        Senior Member
                        • Feb 2008
                        • 7142

                        #12
                        Originally posted by mastercoder View Post
                        GenoMax, I really am thankful to you. Sorry to make you straggle a bit. Last 2 question, please bear with me. Should I do aligning against miRBASE with my trimmed data or the assembled ones (scaffolds). Lastly Is there any article or a source or some other keywords that you can give me?
                        Happy to help.

                        You should use the trimmed data (hopefully it was correctly trimmed, what program did you use for that?). If this was a pure miRNA prep then the assembled data makes no sense since most of your miRNA's should be smaller than length of one read (how long were they?).

                        A review like this may be of help.

                        Comment

                        • mastercoder
                          Member
                          • Dec 2016
                          • 11

                          #13
                          Originally posted by GenoMax View Post
                          Happy to help.

                          You should use the trimmed data (hopefully it was correctly trimmed, what program did you use for that?). If this was a pure miRNA prep then the assembled data makes no sense since most of your miRNA's should be smaller than length of one read (how long were they?).

                          A review like this may be of help.
                          Trimmed data was provided by the company that did the sequencing. Below is the info.

                          And secondly, the trimmed data has 2 files for each sample, i think this is because they are paired-end. That is why I tried to assembly.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            #14
                            The two files are most likely paired-end sequencing data (as described here).

                            You only have 29 bp reads (if that info is correct). Do you know what was the fragment size for this library?

                            Comment

                            • mastercoder
                              Member
                              • Dec 2016
                              • 11

                              #15
                              Originally posted by GenoMax View Post
                              The two files are most likely paired-end sequencing data (as described here).

                              You only have 29 bp reads (if that info is correct). Do you know what was the fragment size for this library?
                              Nope that is not written on the report.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...