组装 assembly
-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5503144/
Phased Diploid Genome Assembly with Single Molecule Real-Time Sequencing
most assemblers output a “mosaic” genome sequence that arbitrarily alternates between parental alleles
大多数组装软件随意拼接了等位基因。
为什么人是2倍体,但是参考基因组只有一组?
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305238/
Extending reference assembly models
Employing a clone-based approach, the sequence of each clone represented a single haplotype from a given donor. At clone boundaries, however, haplotypes could switch abruptly, creating a mosaic structure. This design introduced errors within regions of complex structural variation, when sequences unique to one haplotype prevented construction of clone overlaps.
Many alignment and analysis tools penalize reads that align to more than one location under the assumption that the location of these reads cannot be resolved owing to paralogous sequences in the genome. These tools do not distinguish allelic duplication, added by the alternative loci, from paralogous duplication found in the genome, thus confounding repeat and mappability calculations, paired-end placements and downstream interpretation of alignments in regions with alternative loci.
软件在比对时 无法区分reads到底比对到了哪个同源序列上https://en.wikipedia.org/wiki/Reference_genome#:~:text=As they are assembled from,DNA sequences from each donor.
As they are assembled from the sequencing of DNA from a number of individual donors, reference genomes do not accurately represent the set of genes of any single individual organism. Instead a reference provides a haploid mosaic of different DNA sequences from each donor.https://www.bio-itworld.com/news/2014/06/30/the-hunt-for-a-new-human-reference-genome
-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3227110/
Assemblathon 1: A competitive assessment of de novo short read assembly methods -
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1774-4
Is it time to change the reference genome?
a consensus genome represents the most common alleles and variants within a population

-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022571/
A graph-based approach to diploid genome assembly
However, across the short, long and hybrid categories, most assemblers require collapsing the two genome sequences of a diploid sample into a single haploid ‘consensus’ sequence (or primary contig). The consensus sequence is obtained by merging the distinct alleles at regions of heterozygosity into a single allele, and therefore losing a lot of information. The resulting haploid de novo assembly does not represent the true characteristics of the diploid input genome. -
https://gigascience.biomedcentral.com/articles/10.1186/2047-217X-1-18
SOAPdenovo2: an empirically improved memory-efficient short-read de novo assemblerhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2813482/
De novo assembly of human genomes with massively parallel short read sequencing

(iv) merging the bubbles that were caused by repeats or heterozygotes of diploid chromosomes.
soapdenovo(其他软件类似)有个步骤就是合并杂合 所以组装出来的就是单倍体Merging bubbles
We used Dijkstra's algorithm to detect bubbles, which is similar to the “Tour-bus” method in Velvet. We merged the detected bubbles into a single path if the sequences of the parallel paths were very similar; that is, only had a single base pair difference or had fewer than four base pairs difference with >90% identity. -
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4083584/
Genetic anchoring of whole-genome shotgun assembliesThe process of assigning chromosomal locations to contigs of an assembly is referred to as anchoring.
-
Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species.
Inferring synteny between genome assemblies: a systematic evaluationhttps://www.nature.com/scitable/topicpage/synteny-inferring-ancestral-genomes-44022/


对synteny的解释 -
-
-
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04591-4
gcaPDA: a haplotype-resolved diploid assembler

-
https://academic.oup.com/nar/article/44/12/e113/2457531
Redundans: an assembly pipeline for highly heterozygous genomes

-

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1911-6
Reference-guided de novo assembly approach improves genome reconstruction for related species -
https://www.frontiersin.org/articles/10.3389/fpls.2018.01660/full
Current Strategies of Polyploid Plant Genome Sequence Assembly


A reference genome is a digital, linear nucleic acid sequence containing only a single set of chromosomes plus any unanchored heterozygous contigs and/or scaffolds. A reference genome is used to observe variations across different individuals within a species, to study evolution and to aid genome assembly.
-
https://www.melbournebioinformatics.org.au/tutorials/tutorials/hybrid_assembly/nanopore_assembly/
从这个文章看 quast可以使用参考基因组进行质控 busco可以使用同系物进行质控
