Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Sufia
    Junior Member
    • Feb 2024
    • 4

    How to parse blastn output

    I have aligned my shotgun metagenomics reads to NCBI eukaryotic reference database using Blastn to evaluate the dietary assessment from fecal samples of black bears. I've got the blastn output as a tabular format (outfmt 6). I am currently trying to see if a PCR bias/PCR duplicates is influencing our results. I want to see if the ratio of unique subjects to unique queries differs depending on enriched/non-enriched samples (this might indicate that it is something about the enrichment process rather than the PCR that changes the results). So, I extracted the information regarding unique queries and unique subject sequences using the following commands:

    for i in $(ls blastn_out_nt/); do cut -f 1 blastn_out_nt/$i | sort | uniq | wc -l >> query; done

    for i in $(ls blastn_out_nt/); do sort -k2,2 blastn_out_nt/$i | cut -f 2,9,10 | uniq | wc -l >> unique_subjects; done

    Need to mention here that the first column in the blastn output​ is query id, the second column is subject id, 9th and 10th columns are the start and end of alignments in the subject. I wanted to verify that the work is error-free and also have an idea about what explains the pattern.

    Here's what I have got:

    Click image for larger version

Name:	image.png
Views:	284
Size:	51.4 KB
ID:	325508

    Does a smaller ratio of unique queries and unique subjects potentially indicate that the input fasta sequences were redundant (pcr duplicates) because they hit the same database entry? Also, how I should explain these figures?

    Click image for larger version

Name:	Rplot02.jpg
Views:	182
Size:	41.2 KB
ID:	325509Click image for larger version

Name:	Rplot03.jpg
Views:	178
Size:	32.8 KB
ID:	325510

Latest Articles

Collapse

  • seqadmin
    Pathogen Surveillance with Advanced Genomic Tools
    by seqadmin




    The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
    03-24-2025, 11:48 AM
  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-20-2025, 05:03 AM
0 responses
41 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-19-2025, 07:27 AM
0 responses
51 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-18-2025, 12:50 PM
0 responses
38 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
193 views
0 reactions
Last Post seqadmin  
Working...