Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Program for aligning particular set of reads to an entire NGS dataset

    Hi,

    I have about 100 cDNA sequences (let's call them "ref.") for which I would like to know how many reads from the original Illumina dataset (10 millions of reads; let's call them "reads") align to them fully (i.e. the entire ref. sequence is in the read; see "read1" below) or partially (see "reads 2, 3, 4" below) without gaps.

    Example:
    Code:
    [COLOR="red"]ref.                   AGTTCGGCCGCTCACCGCACCGTCACGCCATCCAGGCATC[/COLOR]
    read1  ATGCGCTAGCTAGCATAGTTCGGCCGCTCACCGCACCGTCACGCCATCCAGGCATCTTGGACCGCATAGCATC
    read2              ATTAAGTTCGGCCGCTCACCGCACC
    read3                                CCGCACCGTCACGCCATCCAGGCATCATGCGCGATCTCAGC
    read4                        GCCGCTCACCGCACC
    Is there any "mapping" program to do that?

    Can I use Bowtie2 (although it seems a bit complicated to use when I look at the extensive list of the option arguments)? It seems like I would have to input one file containing all the sequences (ref. + reads), which would probably align all the sequences to each other and take ages?
    Also should I used the raw reads (paired-end) or the merged+unmerged reads?

    Thanks for your help !

  • #2
    bowtie2 is good. Yes there are lot of arguments but that is because different people want to do different things. For example in your case you will want to use the non-default '--local' mapping.

    You will not input just one file. Instead you will create an index file for your reference(s) and then input the R1 and R2 read files separately.

    Comment


    • #3
      Got it. Thanks westerman !

      Comment


      • #4
        i did mapping using tophat, where length of reference was minimum 150 bp and max 50,000bp (worked on approx 40,000 reference sequence separately). I mapped paired end reads collectively rather than separate. Both mapping could end up with slight or major difference in mapping (It should be bothered for short stretch reference where reference length is less than 300 bp (just hypothetical statement) . Doing mapping of paired end R1 and R2 seperately, will be followed by selecting those reads that mapped in both mapping ?? right ?? Now how we will encounter the insert size parameter ?? and how i can perform the local mapping in tophat ?? is there any way to do so ??

        Comment


        • #5
          Originally posted by archana2287 View Post
          i did mapping using tophat, where length of reference was minimum 150 bp and max 50,000bp (worked on approx 40,000 reference sequence separately). I mapped paired end reads collectively rather than separate. Both mapping could end up with slight or major difference in mapping (It should be bothered for short stretch reference where reference length is less than 300 bp (just hypothetical statement) . Doing mapping of paired end R1 and R2 seperately, will be followed by selecting those reads that mapped in both mapping ?? right ?? Now how we will encounter the insert size parameter ?? and how i can perform the local mapping in tophat ?? is there any way to do so ??
          This appears to have limited relevance to this thread, so I suggest you create a new thread to ask the question. And please take your time to phrase it clearly.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin




            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM
          • seqadmin
            An Introduction to the Technologies Transforming Precision Medicine
            by seqadmin


            In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
            01-27-2025, 07:46 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 02-07-2025, 09:30 AM
          0 responses
          72 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-05-2025, 10:34 AM
          0 responses
          113 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-03-2025, 09:07 AM
          0 responses
          89 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 01-31-2025, 08:31 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X