暗能星系

    • 登录
    • 搜索

    公共数据集

    生物信息分析
    1
    5
    17
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      The Genome in a Bottle Consortium
      https://www.nist.gov/programs-projects/genome-bottle

      https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/
      https://github.com/genome-in-a-bottle

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 anneng 编辑

        International HapMap Project
        https://www.genome.gov/10001688/international-hapmap-project
        The DNA sequence of any two people is 99.5 percent identical. The variations, however, may greatly affect an individual's disease risk. Sites in the DNA sequence where individuals differ at a single DNA base are called single nucleotide polymorphisms (SNPs). Sets of nearby SNPs on the same chromosome are inherited in blocks. This pattern of SNPs on a block is a haplotype. Blocks may contain a large number of SNPs, but a few SNPs are enough to uniquely identify the haplotypes in a block. The HapMap is a map of these haplotype blocks and the specific SNPs that identify the haplotypes are called tag SNPs.

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          https://www.personalgenomes.org/us
          The Personal Genome Project, initiated in 2005, is a vision and coalition of projects across the world dedicated to creating public genome, health, and trait data. Sharing data is critical to scientific progress, but has been hampered by traditional research practices. The PGP approach is to invite willing participants to publicly share their personal data for the greater good.

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            The International Genome Sample Resource
            The 1000 Genomes Project created a catalogue of common human genetic variation, using openly consented samples from people who declared themselves to be healthy. The reference data resources generated by the project remain heavily used by the biomedical science community.

            The International Genome Sample Resource (IGSR) maintains and shares the human genetic variation resources built by the 1000 Genomes Project. We also update the resources to the current reference assembly, add new data sets generated from the 1000 Genomes Project samples and add data from projects working with other openly consented samples.

            https://www.internationalgenome.org/human-genome-structural-variation-consortium/
            The Human Genome Structural Variation Consortium (HGSV) creates a high-quality maps of human structural variation and develops new methods, taking advantage of the burgeoning array of genomics assays now available to define genomic structure.

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://ddbj.nig.ac.jp/resource/bioproject/PRJEB31736
              We sequenced all 2,504 samples from the 1000 Genomes (1KG) Project to a minimum of 30x mean genome coverage. Though a small number of 1KG samples had been sequenced to high coverage previously, we sequenced all samples to depth on the latest technology, providing a unified dataset for the next phase of analyses. We processed these samples using the laboratory processes we have previously used for the CCDG project (with minor modifications). Specifically, we generated PCR-free sequencing libraries using unique dual indices to avoid the index switching phenomenon that occurs and causes low level sequencing data contamination on the Illumina patterned flow cells. We sequenced these samples on the Illumina NovaSeq 6000 sequencing instrument, with 2x150bp reads. We believe this instrument represents the future for WGS with short-read technology, and it was important to sequence the 1KG samples in a format that is consistent with future large scale sequencing projects. Our automated analysis pipeline for whole genome sequencing matches the CCDG and TOPMed recommended best practices. Sequencing reads were aligned to the human reference, hs38DH, using BWA-MEM v0.7.15. Data are further processed using the GATK best-practices (v3.5), which generates VCF files in the 4.2 format. Single nucleotide variants and Indels are called using GATK HaplotypeCaller (v3.5), which generates a single-sample GVCF.

              1 条回复 最后回复 回复 引用 0
              • First post
                Last post
              Powered by 暗能星系