Hi, I have 2 sets of 2 columns of data. One set has a column of gene names and the other column is the number of reads per gene. The second set of columns includes one column also with gene names and the other also with the number of reads. Each set used a different program for mapping the genes and counting reads so the columns dont match up. I would like to plot both sets of columns to see the correlation but first I need to create one column of genes that overlap from both and then include 2 columns of reads next to it. Does anyone have a Perl script that can do this?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi,
You can use unix 'join' command for this task. Please paste first 5 lines of your input/output files, So that one could write the script. You may refer this link: http://www.albany.edu/~ig4895/join.htm.
Best wishes,
RahulLast edited by rahularjun86; 10-15-2012, 02:39 AM.Rahul Sharma,
Ph.D
Frankfurt am Main, Germany
-
Hi, thank you. Here are the first several lines of the file:
Gene_Horn Uninduced_Horn Gene_DEGSeq Uninduced_DEGSeq
Tb04.24M18.150 12 Tb04.24M18.150 172
Tb04.3I12.100 21 Tb04.3I12.100 11
Tb05.28F8.200 97 Tb05.5K5.100 52
Tb05.30F7.410 43 Tb05.5K5.10 19
Tb06.3A7.270 572 Tb05.5K5.110 5
Tb06.3A7.960 74 Tb05.5K5.120 9
Tb07.26A24.210 100 Tb05.5K5.130 24
Tb09.142.0320 56 Tb05.5K5.140 63
Tb09.142.0350 201 Tb05.5K5.150 12
There's thousands of these lines, and basically I want a script that would look at Gene_Horn and Gene_DEGSeq and only find those genes that are found in both columns and to put that as the first column in the output file along with the corresponding 2 columns of reads (Unindiced_Horn and Uninduced_DEGSeq).
Comment
-
I mean where column1 (Gene_Horn) and column3 (Gene_DEGSeq) are the same, print a column containing the genes that overlap (called column1), along with column2(Uninduced_Horn) which is the reads of that gene from Horn, and column3 (Uninduced_DEGSeq) which is the reads of that gene from DEGSeq. This way, I can plot both sets of reads for each gene on a scatter plot to see how much variance there is between both data sets.
Comment
-
Rahul, thanks so much but it only gave me 3 that lined up. I checked and the problem is that the one liner you gave me only looks for those lines that exactly match up and gives me those results, but column1 and column3 dont line up because there are genes that are in one and not in the other. So i need a script that will look at all of column 1 and all of column 3 and give me all those genes that are found in both, not just the ones that are on the same parallel line.
Comment
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:59 AM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
Today, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, Yesterday, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
171 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
||
Started by seqadmin, 12-02-2024, 09:06 AM
|
0 responses
61 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:06 AM
|
Comment