Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Ok, here is what BBMap stdout:

    Code:
      ------------------   Results   ------------------   
    
    Genome:                	1
    Key Length:            	13
    Max Indel:             	16000
    Minimum Score Ratio:  	0.56
    Mapping Mode:         	normal
    Reads Used:           	72832	(9176832 bases)
    
    Mapping:          	2.021 seconds.
    Reads/sec:       	36044.03
    kBases/sec:      	4541.55
    
    
    Pairing data:   	pct reads	num reads 	pct bases	   num bases
    
    mated pairs:     	  0.1016% 	       37 	  0.1016% 	        9324
    bad pairs:       	  0.0000% 	        0 	  0.0000% 	           0
    insert size avg: 	  219.24
    
    
    Read 1 data:      	pct reads	num reads 	pct bases	   num bases
    
    mapped:          	  0.1181% 	       43 	  0.1181% 	        5418
    unambiguous:     	  0.1181% 	       43 	  0.1181% 	        5418
    ambiguous:       	  0.0000% 	        0 	  0.0000% 	           0
    low-Q discards:  	  0.0137% 	        5 	  0.0137% 	         630
    
    perfect best site:	  0.0000% 	        0 	  0.0000% 	           0
    semiperfect site:	  0.0000% 	        0 	  0.0000% 	           0
    rescued:         	  0.0220% 	        8
    
    Match Rate:      	      NA 	       NA 	 84.0716% 	        4555
    Error Rate:      	100.0000% 	       43 	 15.8915% 	         861
    Sub Rate:        	100.0000% 	       43 	 15.2270% 	         825
    Del Rate:        	  0.0000% 	        0 	  0.0000% 	           0
    Ins Rate:        	 16.2791% 	        7 	  0.6645% 	          36
    N Rate:          	  4.6512% 	        2 	  0.0369% 	           2
    
    
    Read 2 data:      	pct reads	num reads 	pct bases	   num bases
    
    mapped:          	  0.1428% 	       52 	  0.1428% 	        6552
    unambiguous:     	  0.1428% 	       52 	  0.1428% 	        6552
    ambiguous:       	  0.0000% 	        0 	  0.0000% 	           0
    low-Q discards:  	  1.6751% 	      610 	  1.6751% 	       76860
    
    perfect best site:	  0.0000% 	        0 	  0.0000% 	           0
    semiperfect site:	  0.0000% 	        0 	  0.0000% 	           0
    rescued:         	  0.0275% 	       10
    
    Match Rate:      	      NA 	       NA 	 84.1270% 	        5512
    Error Rate:      	100.0000% 	       52 	 15.8730% 	        1040
    Sub Rate:        	100.0000% 	       52 	 15.8730% 	        1040
    Del Rate:        	  0.0000% 	        0 	  0.0000% 	           0
    Ins Rate:        	  0.0000% 	        0 	  0.0000% 	           0
    N Rate:          	  0.0000% 	        0 	  0.0000% 	           0

    Comment


    • #17
      That does not look promising. Few reads are mapped.

      You are going to have to spend sometime playing with BBMap options to see if you can improve the alignments. Checking some of the individual reads would be needed to see if you have chimeric reads and/or just random hits based on the initial selection of k=25. Have you tried using higher k values to see if you can narrow down the read set?

      Comment


      • #18
        Actually for this data (i.e. RNA-seq) the sequence shouldn't be there, so aligning few reads is actually good for me! I will run through the workflow again with another k-mer, maybe 35 or 50?

        I looked at the BAM file in IGV, and it looks like the attached image. If I'm interpreting this correctly, there are a bunch of reads that map to the sequence, but they all have a bunch of mutations, which would then correspond to the allowed mismatched in the BBDuk step? Should I then rerun with fewer (or no) allowed mismatches, or am I misinterpreting the data?

        I would also like to do this on my whole genome sequencing data, would I just do the exact same procedure for that (starting at the fastq-files from DNA sequencing)?
        Attached Files

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X