- #1
blueberryfive
- 36
- 0
Hi Everyone!
This is my first question for programming. I have very little experience, so if you could explain as simply as possible.
I have a list of output that actually looks like this:
# Query: 2678.m000169 conserved hypothetical protein
# Database: EST_matched_Genome.fas.txt
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 3 hits found
2678.m000169 NODE_295089_length_165_cov_8.775758 100 209 0 0 120 328 1 209 2.00E-107 387
2678.m000169 NODE_575647_length_122_cov_7.155738 100 165 0 0 378 542 166 2 6.00E-83 305
2678.m000169 NODE_311774_length_123_cov_6.276423 100 159 0 0 1 159 159 1 1.00E-79 294
# BLASTN 2.2.23+
# Query: 2678.m000182 hypothetical protein
# Database: EST_matched_Genome.fas.txt
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 8 hits found
2678.m000182 NODE_207397_length_1580_cov_14.322152 99.87 1597 2 0 6354 7950 1598 2 0 2939
2678.m000182 NODE_217031_length_1396_cov_9.866762 99.86 1439 2 0 4724 6162 1440 2 0 2647
2678.m000182 NODE_298029_length_1052_cov_10.303232 100 1096 0 0 10519 11614 1096 1 0 2025
2678.m000182 NODE_224620_length_710_cov_8.192958 100 751 0 0 1357 2107 4 754 0 1387
It's output from a BLAST program.
I would like to write a script in Python, or whichever language is easiest, that (1) checks to see if NODE_X=NODE_Y where NODE_X is the value in the second column directly preceding NODE_Y, and (2) if NODE_X=NODE_Y, it finds the file corresponding to the value in the first column (e.g., 2678.m000182.txt) in a database and puts it into a new folder entitled "2_Same_Nodes."
Does anyone have any suggestions for how I would go about this?
Thanks
This is my first question for programming. I have very little experience, so if you could explain as simply as possible.
I have a list of output that actually looks like this:
# Query: 2678.m000169 conserved hypothetical protein
# Database: EST_matched_Genome.fas.txt
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 3 hits found
2678.m000169 NODE_295089_length_165_cov_8.775758 100 209 0 0 120 328 1 209 2.00E-107 387
2678.m000169 NODE_575647_length_122_cov_7.155738 100 165 0 0 378 542 166 2 6.00E-83 305
2678.m000169 NODE_311774_length_123_cov_6.276423 100 159 0 0 1 159 159 1 1.00E-79 294
# BLASTN 2.2.23+
# Query: 2678.m000182 hypothetical protein
# Database: EST_matched_Genome.fas.txt
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 8 hits found
2678.m000182 NODE_207397_length_1580_cov_14.322152 99.87 1597 2 0 6354 7950 1598 2 0 2939
2678.m000182 NODE_217031_length_1396_cov_9.866762 99.86 1439 2 0 4724 6162 1440 2 0 2647
2678.m000182 NODE_298029_length_1052_cov_10.303232 100 1096 0 0 10519 11614 1096 1 0 2025
2678.m000182 NODE_224620_length_710_cov_8.192958 100 751 0 0 1357 2107 4 754 0 1387
It's output from a BLAST program.
I would like to write a script in Python, or whichever language is easiest, that (1) checks to see if NODE_X=NODE_Y where NODE_X is the value in the second column directly preceding NODE_Y, and (2) if NODE_X=NODE_Y, it finds the file corresponding to the value in the first column (e.g., 2678.m000182.txt) in a database and puts it into a new folder entitled "2_Same_Nodes."
Does anyone have any suggestions for how I would go about this?
Thanks