单细胞分析

单细胞大模型

anneng — Wed, 16 Oct 2024 02:51:31 GMT

https://medium.com/helical-ai/single-cell-bio-foundation-models-a-beginners-overview-3730a0731bdd

细胞类型注释

anneng — Fri, 30 Aug 2024 07:59:44 GMT

https://www.sciencedirect.com/science/article/pii/S2215016123001966
Negative markers combined with positive markers can increase the specificity of cell type identification, reducing the likelihood of misclassification and improving the overall accuracy of the analysis
[3]
. By default, the signature genes are expected to be highly expressed in one cell type compared to all other cell types. However, depending on the underlying data, these canonical markers may not be enough to segregate cell types with similar expression profiles (e.g., sub-groups of T-cells in the human blood). When this occurs, genes that are expected not to be detected in a specific cell type (e.g., CD8A in CD4 T cells) can be utilized to improve segregation. Therefore, genes that are characteristically lowly expressed in one cell type compared to the other cell types are introduced as the “negative markers”.

merge vs integrate

anneng — Thu, 15 Aug 2024 06:09:05 GMT

https://www.perplexity.ai/search/seurat-merge-vs-integrate-4wIsp710SlSXnWASiWuvjg
The functions merge() and IntegrateData() in Seurat serve distinct purposes in handling single-cell RNA sequencing (scRNA-seq) data, particularly when working with multiple datasets.
merge()
The merge() function is primarily used to concatenate two or more Seurat objects. This method combines the raw count matrices of the specified objects into a single object. Key characteristics include:
Concatenation of Counts: merge() combines the raw count data from the input objects. If there are unique genes in either dataset, those genes are included in the merged object, resulting in a comprehensive count matrix that reflects all genes present across the datasets.
Loss of Normalization: By default, merging erases any previously normalized and scaled data matrices, meaning that it does not account for differences in data distributions between the datasets. If normalization is desired, it must be performed after merging.
Use Cases: This function is suitable for datasets with minimal batch effects or when combining technical replicates, where the primary goal is to create a larger dataset from similar conditions.
IntegrateData()
In contrast, IntegrateData() is designed for more complex scenarios where datasets may have inherent differences, such as batch effects or biological variability. Its main features include:
Anchor-Based Integration: This function utilizes a method based on identifying "anchor" cells that are shared across datasets. This approach allows for the integration of datasets while accounting for differences in gene expression profiles, enabling better comparison of shared cell populations.
Focus on Shared Variance: The result of IntegrateData() is a single integrated dataset that retains only the genes shared among the datasets, which facilitates comparative analysis of gene expression profiles across different conditions or batches.
Use Cases: It is recommended for situations where datasets are expected to have substantial biological differences or batch effects, such as datasets generated from different platforms or experimental conditions.
Summary
In summary, use merge() when you want to combine datasets with minimal differences and retain all gene information, while IntegrateData() is preferred for integrating datasets with significant variability to ensure that shared cell populations are accurately represented and analyzed. The choice between the two depends on the experimental design and the nature of the datasets being analyzed.

新冠单细胞数据分析样例 covid19_sc

anneng — Mon, 12 Aug 2024 05:40:53 GMT

conda install -c conda-forge fftw
由于jupyter的环境是在conda里面所以当报一些包找不到时用conda安装用apt安装可能路径不对还是找不到

STOmics DB

anneng — Thu, 21 Dec 2023 09:47:41 GMT

We constructed the front-end framework with Vue.js (version 2.6.14) and built the backend using Django (version 2.2) and Python (version 3.7.4). STOmicsDB used PostgreSQL (version 9.6) to store the metadata of publications, and datasets. We used Elasticsearch (version 7.16.2) as the search engine in the resource center of STOmicsDB. We employed MongoDB (version 4.2) and Cirrocumulus to manage and visualize curated datasets. We used Redis (v5.0.4) as the cache to store and manage the data in memory. For task queue management, we applied RabbitMQ (v3.8.13). Nginx (v1.20.1) was used as the reverse proxy server. Currently, STOmicsDB supports the following browsers: Google Chrome (v80.0 and above), Opera (v62.0 and above), Safari (v12.0 and above) and Firefox (v80.0 and above).

单细胞 TCR BCR分析

anneng — Mon, 27 Nov 2023 05:36:55 GMT

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9434330/
Research progress on application of single-cell TCR/BCR sequencing technology to the tumor immune microenvironment, autoimmune diseases, and infectious diseases
单细胞的几个应用场景：
Single-cell sequencing technologies mainly include
single-cell transcriptome sequencing,
single-cell assay for transposase accessible chromatin with high-throughput sequencing,
single-cell immune profiling (single-cell T-cell receptor [TCR]/B-cell receptor [BCR] sequencing), and single-cell transcriptomics.
93a86f70-df5d-4805-8d55-6d1ba7368a24-image.png

细胞群内成对分组间差异表达分析

anneng — Mon, 21 Aug 2023 02:19:46 GMT

File "/home/bioinfo/workspace/miniconda3/envs/snakemake2/lib/python3.11/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bioinfo/workspace/miniconda3/envs/snakemake2/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bioinfo/ms_jk_small_tool/分组间差异分析+报告展示/代码/细胞群内成对分组间差异表达分析/workflow/scripts/split_violin_plot.py", line 110, in split_violin_plot
marker_gene = top_up.append(top_down)
^^^^^^^^^^^^^
File "/home/bioinfo/workspace/miniconda3/envs/snakemake2/lib/python3.11/site-packages/pandas/core/generic.py", line 5989, in getattr
return object.getattribute(self, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?
https://stackoverflow.com/questions/75956209/error-dataframe-object-has-no-attribute-append

Accelerating Single-cell Bioinformatics with N-dimensional Arrays in the Cloud

ice-melt — Mon, 14 Feb 2022 03:56:40 GMT

https://github.com/lasersonlab/single-cell-experiments

项目说明

theis lab # scanpy
laserson lab # single-cell-experiments (zappy,zarr,ndarray.scala)

支持读取csv,adata,zarr,zarr_gcs(gcs,g3fs,谷歌亚/马逊云端数据)格式的单细胞数据读取数据后依赖zarr包拆分数据成块(缺点:数据经过重复读取,每次数据读取都是全加载) adata 数据取矩阵(.X属性的值)数据通过指定块大小后按下标索引map到不同的块对象,即PairedRDD(此时的value是zarr,可能为压缩格式,参考代码 zarr_spark.py#read_zarr_chunk|get_chunk_indices) 对RDD进行计算(参考代码anndata_spark.py#log1p) 该项目衍生的问题：目前该项目无维护，源代码未指明依赖版本关系，无法运行项目分析过程无法交互展示，必须定义流程过程和控制参数

10X的数据格式

anneng — Wed, 15 Dec 2021 10:01:54 GMT

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/matrices

https://math.nist.gov/MatrixMarket/formats.html
Market Exchange Format (MEX)

单细胞数据库

anneng — Wed, 01 Dec 2021 05:47:50 GMT

the NIH Human Biomolecular Atlas Program

bbrowser的细胞类型注释工具talk2data

anneng — Wed, 01 Dec 2021 02:54:04 GMT

https://talk2data.bioturing.com/predict

bbrowser产品文档

anneng — Thu, 25 Nov 2021 10:59:07 GMT

blog.bioturing.com.pdf

normalization 归一化

anneng — Thu, 25 Nov 2021 09:55:29 GMT

https://kb.10xgenomics.com/hc/en-us/articles/115004583806-How-are-the-UMI-counts-normalized-before-PCA-and-differential-expression-

scanpy详细研究

anneng — Wed, 24 Nov 2021 08:44:49 GMT

gene_symbols和gene_ids
当读取数据的时候默认设置用基因名还是ID作为列头实际上scanpy会把这两个的对应关系作为第一个var保存下来下面是使用gene_ids打开mtx的情况
e67b672e-505c-4b92-a65b-8785fc609bf9-image.png

单细胞教程

anneng — Wed, 24 Nov 2021 01:51:50 GMT

https://bookdown.org/ytliu13207/SingleCellMultiOmicsDataAnalysis/

单细胞文献

anneng — Mon, 22 Nov 2021 06:43:45 GMT

https://www.nature.com/articles/s12276-020-0409-x
An era of single-cell genomics consortia

Falco Falco: A quick and flexible single-cell RNA-seq processing framework on the cloud

anneng — Mon, 22 Nov 2021 06:29:15 GMT

https://github.com/VCCRI/Falco/

chanzuckerberg基金会两个Spark单细胞项目

anneng — Mon, 22 Nov 2021 06:02:11 GMT

Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark
Project Goal
To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.

Results & Resources
The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:

ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.

Elbow-and-Jackstraw-plots 用来查看PCA

anneng — Sat, 20 Nov 2021 02:49:07 GMT

https://www.researchgate.net/figure/figure-supplement-4-Elbow-and-Jackstraw-plots-used-for-determination-of-principal_fig9_345454907

单细胞浏览器分析

anneng — Fri, 19 Nov 2021 11:43:27 GMT

https://github.com/lilab-bcb/cirrocumulus

单细胞数据分析概要

anneng — Wed, 21 Jul 2021 03:58:45 GMT

单细胞大数据解决方案.pptx