您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python|將CDS序列轉為PEP序列

編輯：Python

有蛋白編碼基因的核苷酸序列，想要轉化成對應的氨基酸序列，可以利用Python的Biopython Module來實現。

提取核苷酸序列信息

原始的保存著核苷酸序列的fasta文件裡還有著序列的id等說明信息，我們只需要核苷酸序列，可以用for循環遍歷每一行，將偶數行輸出到一個新的文本文檔中。

# opening the file
file1 = open('D:/.../PCGs/cytb/cytb.fas', 'r')
# creating another file to store even lines
file2 = open('D:/.../PCGs/cytb/cytb_no_label.fas', 'w')
# reading content of the files and writing even lines to another file
lines = file1.readlines()
for i in range(0, len(lines)):
if (i % 2 != 0):
file2.write(lines[i])
# closing the files
file1.close()
file2.close()

利用Biopython將CDS轉為PEP

#importing the Biopython package
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
#opening the file
file1 = open('D:/.../PCGs/atp6/atp6_no_label.fas')
file2 = open('D:/.../PCGs/atp6/atp6_aa_no_label.fas', 'w')
#creating a list to store the nucleotide sequence of each row
dataMat = []
for line in file1.readlines():
curLine = line.strip().split(" ")
dataMat.append(curLine[:])
for i in dataMat[0:]:
#list to string
j = "".join(i)
coding_dna = Seq(j, IUPAC.unambiguous_dna)
pep = coding_dna.translate(table="Invertebrate Mitochondrial")
pep2 = str(pep)
print(pep2)
file2.write(pep2)
file2.write("\n")
file2.close()

將輸出的PEP序列添加每條序列原始信息

之前輸出的PEP序列保存只有氨基酸排列信息，沒有最初的序列id等說明信息了，可以再用下面的腳本添加上。

file1=open('D:/.../PCGs/nad6/nad6_aa_no_label.fas','r')
lines=[]
for line in file1:
lines.append(line)
file1.close()
file1=open('D:/.../PCGs/nad6/nad6_aa_no_label.fas','w')
lines.insert(0,'>td')
lines.insert(2,'>tj')
lines.insert(4,'>to')
lines.insert(6,'>tchi')
lines.insert(8,'>tcae')
lines.insert(10,'>tp')
lines.insert(12,'>ma')
s = '\n'.join(lines)
file1.write(s)
file1.close()