常见的生物信息格式转换成统一的parquet文件
-
https://github.com/BlueGranite/azure-synapse-vcf-analysis/blob/main/ConvertVCFsToParquet.md
https://techcommunity.microsoft.com/t5/healthcare-and-life-sciences/genomic-data-in-parquet-format-on-azure/ba-p/3150554
https://techcommunity.microsoft.com/t5/healthcare-and-life-sciences/convert-synthetic-fhir-and-pacbio-vcf-data-to-parquet-and/ba-p/3577038
微软的Azure使用的parquet格式主要使用的是Glow
https://medium.com/23andme-engineering/genetic-datastore-4b213256db31https://github.com/natir/vcf2parquet
一个RUST项目 感觉很多小工具使用的是RUST 可能性能比较高https://github.com/BigDataWUR/tomatula
https://documentation.dnanexus.com/user/spark/example-applications/vcf-loader
https://adam.readthedocs.io/en/latest/api/genomicDataset/