GATK Analysis of variation
The following from the network is not verified
GATK Analysis of variationFor big data samples, it may be slow , Therefore, multi thread parallel computing can be carried out after chromosome splitting .
Here's one I wrote python Multithreaded scripts , For reference only , Please correct the clumsiness .
#!/usr/bin/python3import _threadimport osimport threadingimport timemuthreads=[]bam_file="a.mkdup.bam"out_file_prefix="flower" chr_list=["CHR01","CHR02","CHR03","CHR04","CHR05","CHR06","CHR07","CHR08","CHR09","CHR10","CHR11","CHR12","CHR13"]for chr in chr_list: threads_comonder_name= "gatk HaplotypeCaller --intervals " + chr +" -R /mnt/j/BSA/02-read-align/Tifrunner2.fasta -I " + bam_file + " -ERC GVCF -O "+ out_file_prefix +"-"+chr+".erc.g.vcf" muthreads.append(threads_comonder_name)exitFlag = 0class myThread (threading.Thread): def __init__(self, threadID, name, counter, comander): threading.Thread.__init__(self) self.threadID = threadID self.name = name self.counter = counter self.comander = comander def run(self): print (" Start thread :" + self.name) print_time(self.name, self.counter, 5, self.comander) print (" Exit thread :" + self.name)def print_time(threadName, delay, counter,comander): # while counter: if exitFlag: threadName.exit() time.sleep(delay) print(comander) os.system(comander)# Call the operating system command line to process data # counter -= 1# Create a new thread threadlist=[]for i, threadsnu in enumerate(muthreads[0:11]): print(i) print(threadsnu) threadsnew=myThread(1, "Thread-" + str(i), 2, threadsnu) threadlist.append(threadsnew)# Start a new thread for threads in threadlist: threads.start()for threads in threadlist: threads.join()print (" Exit the main thread after running ")
The following from the network is not verified The same sample of multiple chromosomes vcf File merge
# for i in {1..22} X Y ;do echo "-I final_chr$i.vcf" '\';done# for i in {10..19} {1..9} M X Y ;do echo "-I final_chr$i.vcf" '\';donemodule load java/1.8.0_91GATK=/home/jianmingzeng/biosoft/GATK/gatk-4.0.3.0/gatk$GATK GatherVcfs \-I final_chr1.vcf \-I final_chr2.vcf \-I final_chr3.vcf \-I final_chr4.vcf \-I final_chr5.vcf \-I final_chr6.vcf \-I final_chr7.vcf \-I final_chr8.vcf \-I final_chr9.vcf \-I final_chr10.vcf \-I final_chr11.vcf \-I final_chr12.vcf \-I final_chr13.vcf \-I final_chr14.vcf \-I final_chr15.vcf \-I final_chr16.vcf \-I final_chr17.vcf \-I final_chr18.vcf \-I final_chr19.vcf \-I final_chr20.vcf \-I final_chr21.vcf \-I final_chr22.vcf \-I final_chrX.vcf \-I final_chrY.vcf \-O merge.vcf
Attention should be paid to when merging ,vcf The order of documents is the same as that of each vcf The order of header files in the file is the same .
That's all python Realization GATK Details of the multithreading acceleration example , More about python GATK For information on Multithreading acceleration, please pay attention to other relevant articles on software development network !