I have used 100 nt paired-end sequences to construct a reduced representation reference genome of the organism I am working with. I aligned the reads back to the reference genome. I hope to find SNPs at some point. I have a list of individual reads (with the paired read) which I would like to inspect in the alignment. Is there a way to find out what position in the reference genome these reads are aligned to? I can visualize the aligned reads in IGV and there I can zoom in to a position to inspect a region. But I cannot search for a particular read - I need to know the map position of the read first. Is there a programme of script that could extract the position (and maybe other infrmation) of an individual read from a sam/bam file?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Tablet lets you search for reads by name, the regular expression support is also very handy: http://bioinf.hutton.ac.uk/tablet/
Comment
-
Originally posted by dpryan View PostPresumably you have the SAM file that was output from the aligner. You can look for the location of a read in it using grep: grep -m 1 -w SomeReadName.123455 Aligned.sam
That'll be easy enough provided you only have a few reads you want to look at,
Thanks for the suggestion. However, when I use the command grep -m 1 Sequence_read_tag alignment_file.sam > output.txt
I get the following information:
FCB020AACXX:6:1305:20474:84915#ATGAACCT 163 369552-8 1 60 100M = 61 160 CTTGCAAAGGAAAATCTTGAGATGAACGAGGGCGACATTAGCAAGGAGGCCATCGGAGGCACCGACGGTACCACCGTCGATGGAGAGGATGCGAACCCAT bbbeeeeeggggfiiiiiiiihgifhihffhiiiiihiihiihfghfhiihggggeeecccccccccc]acccccc_acacccccccccccccccccccc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100
I recognize the name of my sequence, the sequence itself, the quality information... but where do I find the positional information?
Comment
-
Originally posted by maubp View PostTablet lets you search for reads by name, the regular expression support is also very handy: http://bioinf.hutton.ac.uk/tablet/
Comment
-
Originally posted by Tectona View PostThanks for the suggestion. However, when I use the command grep -m 1 Sequence_read_tag alignment_file.sam > output.txt
I get the following information:
FCB020AACXX:6:1305:20474:84915#ATGAACCT 163 369552-8 1 60 100M = 61 160 CTTGCAAAGGAAAATCTTGAGATGAACGAGGGCGACATTAGCAAGGAGGCCATCGGAGGCACCGACGGTACCACCGTCGATGGAGAGGATGCGAACCCAT bbbeeeeeggggfiiiiiiiihgifhihffhiiiiihiihiihfghfhiihggggeeecccccccccc]acccccc_acacccccccccccccccccccc XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:100
I recognize the name of my sequence, the sequence itself, the quality information... but where do I find the positional information?
Code:grep -m 1 Sequence_read_tag alignment_file.sam | awk '{ print $3":"$4 }' > output.txt
Comment
-
That's position according to the reference contig it's aligned against. You may want to browse the SAM specification. The read you showed maps to the start of a contig.
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment