变异分析
-
-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7253413/#:~:text=Multi-nucleotide variants (MNVs),of the individual variants3.
Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes
Multi-nucleotide variants (MNVs) are defined as clusters of two or more nearby variants existing on the same haplotype in an individual1,2 (Fig. 1a). When variants in an MNV are found within the same codon, the overall impact may differ from the functional consequences of the individual variants3.

Identification of MNVs requires the constituent variants to be properly phased—that is, to be identified accurately as either both occurring on the same haplotype (in cis) or on two different haplotypes (in trans). Phasing can be performed following three broad strategies: read-based phasing18, which assesses whether nearby variants co-segregate on the same reads in DNA sequencing data; family-based phasing19, which assesses whether pairs of variants are co-inherited within families; and population-based phasing20, which leverages haplotype sharing between members of a large genotyped population to make a statistical inference of phase. Read-based phasing is particularly effective for pairs of nearby variants, making it suitable for the analysis of MNVs.
-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6957021/
Misannotation of multiple-nucleotide variants risks misdiagnosis

To investigate whether using alternative tools results in correct annotation of MNVs, we re-processed the VCF file of simulated MNVs using GATK 3.6.0 ReadBackedPhasing 10 (default parameters plus “-maxDistMNP 2 -enableMergeToMNP”) or MAC 1.2 9 then annotated the resulting VCF files using Alamut batch version 1.5.2 (Interactive Biosoftware, Rouen, France). We also tested re-calling the variants using VarDict 1.4 7 and Platypus 0.8.1 12.
GATK新版本已经没有了 ReadBackedPhasing 工具
However, they do not emit MNPs. If you would like to combine contiguous SNPs into MNPs, you will need to use the legacy ReadBackedPhasing tool in GATK3 with the MNP merging function activated. See the GATK3 tool documentation for details.
https://gatk.broadinstitute.org/hc/en-us/articles/360035530752-What-types-of-variants-can-GATK-tools-detect-or-handle- -
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521406/
MAC: identifying and correcting annotation for multi-nucleotide variations
https://github.com/leiwei-bioinfo/MAC -
GATK and Picard variant manipulation tools are currently able to recognize the following types of alleles:
SNP (single nucleotide polymorphism)
INDEL (insertion/deletion)
MIXED (combination of SNPs and indels at a single position)
MNP (multi-nucleotide polymorphism, e.g. a dinucleotide substitution)
SYMBOLIC (such as the <NON-REF> allele used in GVCFs produced by HaplotypeCaller, the * allele used to signify the presence of a spanning deletion, or undefined events like a very large allele or one that's fuzzy and not fully modeled; i.e. there's some event going on here but we don't know what exactly) -
-
-
对于MNP(即MNVs) GATK当前的版本不支持 我们有2个选择 使用MAC进行纠正 或者使用另外的工具 freebayes官方宣传支持MNP
https://github.com/freebayes/freebayes
MAC最近一直没有维护 建议直接用freebayesfreebayes验证 对于多个样本 每个样本都要加上RG头
bwa mem ecoli.fasta SRR10000374_1.fastq.gz SRR10000374_2.fastq.gz -R '@RG\tID:SRR10000374\tSM:SRR10000374' | samtools sort -o SRR10000374.bam - bwa mem ecoli.fasta SRR10000377_1.fastq.gz SRR10000377_2.fastq.gz -R '@RG\tID:SRR10000377\tSM:SRR10000377' | samtools sort -o SRR10000377.bam - ../freebayes-1.3.6-linux-amd64-static -L list -f ecoli.fasta -v demo.vcf -
CNV
https://gatk.broadinstitute.org/hc/en-us/articles/360035531452-After-gCNV-calling-considerations
https://gatk.broadinstitute.org/hc/en-us/articles/360035531152

https://www.biorxiv.org/content/10.1101/2021.04.30.442110v1.full
A comparison of tools for copy-number variation detection in germline whole exome and whole genome sequencing data

cnvpytor 验证记录
小服务器 /ceph_disk3/var_demo_datasudo apt-get install python-tk pip3 install cnvpytor -i https://pypi.tuna.tsinghua.edu.cn/simple 用conda的话 需要把源码中的这些文件复制下 cp CNVpytor/cnvpytor/data/*.pytor /opt/miniconda3/lib/python3.7/site-packages/cnvpytor/data/ samtools index NA12877_S1.bam cnvpytor -root 12877.pytor -rd NA12877_S1.bam cnvpytor -root 12877.pytor -his 1000 10000 100000 cnvpytor -root 12877.pytor -partition 1000 10000 100000 cnvpytor -root 12877.pytor -call 1000 10000 100000 如下步骤为可选步骤 在使用snp时运行(CWL、WDL加开关控制) /opt/miniconda3/bin/cnvpytor -root 12877.pytor -snp NA12877_S1.genome.vcf -sample NA12877 /opt/miniconda3/bin/cnvpytor -root 12877.pytor -pileup NA12877_S1.bam /opt/miniconda3/bin/cnvpytor -root 12877.pytor -mask_snps /opt/miniconda3/bin/cnvpytor -root 12877.pytor -baf 1000 10000 100000 /opt/miniconda3/bin/cnvpytor -root 12877.pytor -call baf 1000 10000 100000 注意下面这个命令需要一个文件来描述范围 /opt/miniconda3/bin/cnvpytor -root 12877.pytor -genotype 1000 10000 100000 <regions /opt/miniconda3/bin/cnvpytor -root 12877.pytor -call combined 1000 10000 100000 后续还有几个命令来输出图片 等报告一起做错误:
Traceback (most recent call last): File "/home/anneng/.local/bin/cnvpytor", line 11, in <module> sys.exit(main()) File "/home/anneng/.local/lib/python2.7/site-packages/cnvpytor/__main__.py", line 437, in main use_gc_corr=not args.no_gc_corr, use_mask=args.use_mask_with_rd) File "/home/anneng/.local/lib/python2.7/site-packages/cnvpytor/root.py", line 1502, in call distN = np.zeros_like(NN, dtype="long") - 1 File "/home/anneng/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 168, in zeros_like res = empty_like(a, dtype=dtype, order=order, subok=subok) TypeError: data type "long" not understood 需要使用python3 运行 -
CEPH 1463家系数据
https://catalog.coriell.org/0/Sections/Collections/NIGMS/CEPHFamiliesDetail.aspx?PgId=441&fam=1463&https://www.illumina.com/platinumgenomes.html
https://console.cloud.google.com/storage/browser/genomics-public-data/platinum-genomes/vcf?pageState=("StorageObjectListTable":("f":"%255B%255D"))&prefix=&forceOnObjectsSortingFiltering=false
-





