view-toolkit详细设计

anneng

支持大文件（具体多大？）
统计分析

1.fasta/fastq文件的处理
https://bioinf.shenwei.me/seqkit/usage/#grep
seqkit是一个fastq fasta的文件处理工具包，我们看看能不能使用我们的工具来实现它的功能。
1.1 打开并显示 fasta/fastq的内容，支持懒加载（鼠标滚动后按需加载）。
1.2 使用id 查询允许同时搜索多个id 查询到的内容进行着色
1.3 使用子序列进行查询查询到的内容进行着色
1.4 统计序列的长度及其分布（小提琴图或者箱体图）按照序列长度进行过滤
1.5 统计重复序列及其个数
1.6 统计序列总数
1.7 统计N50 https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics#N50
1.8 Q20(%)、Q30(%)、Q40(%) 的分布提供类似fastqc的序列质量图（每个位置的质量盒形图）
1.9 GC(%) 含量
1.10 排序
1.11数据导出支持随机采用类似seqtk sample

anneng

大服务器的cmake版本比较低当前未敢升级
这个软件至少需要cmake 3.11以上服务器上是3.10
exon-duckdb 验证

git clone --recursive https://mirror.ghproxy.com//https://github.com/wheretrue/exon-duckdb.git
curl --proto '=https' --tlsv1.3 https://sh.rustup.rs -sSf | sh
cd exon-duckdb
替换 目录下 CMakeLists.txt 
主要是修改了一些github下载地址
make

CMakeLists.txt

anneng

install parquet;
load parquet;
load exon;
create table testfastq as from read_fastq('test.fastq');
copy testfastq to 'test.parquet' (format parquet);

anneng

https://www.linkedin.com/pulse/how-query-azure-blob-using-duckdb-vivek-anandaraman-f5bfc/
如何使用DuckDB查询Azure Blob? 是一个通用的方法通过网络来读取数据

import duckdb
from adlfs.spec import AzureBlobFileSystem

active_directory_application_id = "Your Application ID"
active_directory_application_secret = "Your Client Secret"
active_directory_tenant_id = "Your Tenant ID"
accountname = "Your Storage account name"

connection = duckdb.connect()

connection.register_filesystem(AzureBlobFileSystem(account_name=accountname, tenant_id=active_directory_tenant_id, client_id = active_directory_application_id, client_secret = active_directory_application_secret ))

query = connection.sql('''
  SELECT count(*) FROM read_csv_auto('abfs://container/path/blob.csv')
''')
print(query.fetchall())

anneng

https://milicendev.netlify.app/article/install-the-latest-cmake-version-in-ubuntu-18-04-bionic/

在大服务器上升级cmake

升级runst
sudo apt update
sudo apt install rustc

anneng

https://sql.quacking.cloud/
https://tobilg.com/using-duckdb-wasm-for-in-browser-data-engineering

一个duckdb-wasm sql在线工具