list - Counting previous lines using information from two files in python -
i delve explaining programming problem: have 2 files; file #1 gene annotation file , file #2 counts base position file (just trying give context problem).
i want extract "start_codon" position in lines there "+" in column 6, , go position in file#2. instance, want extract 954 column number 3 in file #1 , go row number 954 in file #2. then, want count number of lines above line 954 yield count value of 70 or greater in file #2.
file#1 chromosome exon 337 774 0.0 - . gene_id "a"; chromosome start_codon 954 956 0.0 + 0 gene_id "b"; chromosome stop_codon 2502 2504 0.0 + 0 gene_id "b"; file#2 . . . . 942 71 943 63 944 88 945 80 946 80 947 85 948 86 949 97 950 97 951 97 952 104 953 105 954 104 955 108
my final output file tab-separated file of gene_id followed number of lines yield count value of 70 or greater. example files have given output follows:
gene_id count_before_start_codon b 10
i want loop through large files produce 1 long output file.
thank you, hope clear. appreciate guidance!
this should work... first part gets gene information in file 1 , populates dictionary second part opens file 2, checks dictionary , produces output.
d={} open("file1.txt","ru") f1: line in f1: line=line.rstrip().rsplit("\t") if line[6]=="+" , line[2]=="start_codon": d[line[3]] = line[8].rstrip('"')[9] keys = d.keys() count=[] results=[] number = 12 open("file2.txt","ru") f2: line in f2: line=line.rstrip().rsplit("\t") if int(line[1]) >= 70: count.append(line[1]) if line[0] in d: results.append(d[line[0]]) if len(count) > number: results.append(str(number)) else: results.append(str(len(count)-1)) print "\t".join(results) count=[] else: count=[]
ps. copy pasted example. edited files tab-delimited. may need play around "slicing"
Comments
Post a Comment