1.输入文件1
文件1:第一列是基因名字,列2:外显子起始位置,列3:外显子终止位置,列4:外显子的序号
2.输入文件2:
备注:列1:基因id;列2:snp在基因组上的位置
3.输出文件:
干货:
#判断snp的位置是位于外显子还是内含子
import pandas as pd
list1=[]
with open("44.txt","r") as f1:
df1=pd.read_table(f1,header=None,index_col=0)
with open("43out-2.txt","r") as f2:
df2=pd.read_table(f2,header=None,index_col=0)
for i in range(df2.shape[0]):
for j in range(df1.shape[0]):
if df1.index[j]==df2.index[i] and df2.iloc[i,0]>=df1.iloc[j,0] and df2.iloc[i,0]<=df1.iloc[j,1]:
print(df2.index[i],df2.iloc[i,0]," : exon"+str(df1.iloc[j,2]))
# b+=1
elif df1.index[j]==df2.index[i] and df2.iloc[i,0]>=df1.iloc[j,1] and df2.iloc[i,0]<=df1.iloc[j+1,0]:
print(df2.index[i], df2.iloc[i, 0], " : intron"+str(df1.iloc[j,2]))