GATK
-
-
-
新增一个SNP过滤和注释流程包括
过滤:gatk VariantFiltration \ -R reference.fasta \ -V input.vcf.gz \ -O output.vcf.gz \ -filter "QD <4.0 || FS> 60.0 || MQ <40.0" \ --filterName "tobacco" -window "4" -G-filter "GQ<20.0"这些过滤参数要暴露出来 可以设置
snpeff:注释
在gatk中得到的是全的未过滤的vcf
过滤后也用sneff看下结果 -
-
-
https://haplotypecaller1.rssing.com/chan-10646605/all_p46.html
- Merging VCF files
There are three main reasons why you might want to combine variants from different files into one, and the tool to use depends on what you are trying to achieve.
The most common case is when you have been parallelizing your variant calling analyses, e.g. running HaplotypeCaller per-chromosome, producing separate VCF files (or GVCF files) per-chromosome. For that case, you can use the Picard tool MergeVcfs to merge the files. See the relevant Tool Doc page for usage details.
The second case is when you have been using HaplotypeCaller in -ERC GVCF or -ERC BP_RESOLUTION to call variants on a large cohort, producing many GVCF files. You then need to consolidate them before joint-calling variants with GenotypeGVCFs (for performance reasons). This can be done with either CombineGVCFs or ImportGenomicsDB tools, both of which are specifically designed to handle GVCFs in this way. See the relevant Tool Doc pages for usage details and the Best Practices workflow documentation to learn more about the logic of this workflow.
The third case is when you want to compare variant calls that were produced from the same samples but using different methods, for comparison. For example, if you're evaluating variant calls produced by different variant callers, different workflows, or the same but using different parameters. For this case, we recommend taking a different approach; rather than merging the VCF files (which can have all sorts of complicated consequences), you can us the VariantAnnotator tool to annotate one of the VCFs with the other treated as a resource. See the relevant Tool Doc page for usage details.
- Merging VCF files
-
-
gatk VariantsToTable \ -R reference/hg38/Homo_sapiens_assembly38.fasta \ -V output/output.vqsr.varfilter.pass.vcf.gz \ -F CHROM -F POS -F FILTER -F TYPE -GF AD -GF DP \ --show-filtered \ -O output/output.vqsr.varfilter.pass.tsvGATK可以把vcf变成表格
-
https://hpc.nih.gov/training/gatk_tutorial/
A practical introduction to GATK 4 on Biowulf (NIH HPC) -
-
https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07013-y
Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework
