RNA-seq数据分析

anneng

https://github.com/Jeanielmj/bioinformatics-workshop/wiki/The-Tuxedo-Pipeline

anneng

https://wikis.utexas.edu/display/bioiteam/Running+the+new+tuxedo+suite

anneng

Mapping to the transcriptome with BWA
https://angus.readthedocs.io/en/2013/rnaseq_bwa.html
In this tutorial, we’ll begin by mapping reads from an RNA-seq study involving Drosophila melanogaster to a reference transcriptome. First, make sure you have BWA and SAMTools installed. Next, you will need to download the reference transcriptome:

mkdir bwa_transcriptome
cd bwa_transcriptome
curl -O -L ftp://ftp.flybase.net/releases/current/dmel_r5.51/fasta/dmel-all-transcript-r5.51.fasta.gz
gunzip dmel-all-transcript-r5.51.fasta.gz
How many transcripts are encoded in this file? Let’s look at the file manually first:

less dmel-all-transcript-r5.51.fasta
Notice the fasta format; each line beginning with a > is a new sequence, followed by another line (or multiple lines) containing the sequence itself. If we want to count how many transcripts are in the file, we can just count the number of lines that begin with >

grep '>' | wc -l
You should see 28826.

Next, we need to prepare the file for use with BWA. The first step is to index it:

bwa index dmel-all-transcript-r5.51.fasta
Next, we can map our paired-end sequence reads to the transcriptome. To make our code a little more readable and flexible, we’ll use shell variables in place of the actual file names. In this case, let’s first specify what the values of those variables should be:

reference=dmel-all-transcript-r5.51.fasta
reads_1=OREf_SAMm_vg1_CTTGTA_L005_R1_001.fastq
reads_2=OREf_SAMm_vg1_CTTGTA_L005_R2_001.fastq
output=vg_1
Now we can use these variable names in our mapping commands. The advantage here is that we can just change the variables later on if we want to apply the same pipeline to a new set of samples (which we do):

bwa mem ${reference} ${reads_1} ${reads_2} > ${output}.sam
This command will output a file named vg_1.sam in the current working directory. Next, we want to use SAMTools to convert it to a BAM, and then sort and index it:

samtools import ${reference}.fai ${output}.sam ${output}.unsorted.bam
samtools sort ${output}.unsorted.bam ${output}
samtools index ${output}.bam
Next, you can use your existing knowledge to view the mappings, plot the distribution of mismatch positions, etc.

anneng

https://colauttilab.github.io/NGS/TuxedoTutorial.html

anneng

https://www.frontiersin.org/articles/10.3389/fbinf.2021.693836/full

reads normalization,
scatter plots,
linear/non-linear correlations,
PCA,
clustering (hierarchical, k-means, t-SNE, SOM),
differential expression analyses,
pathway enrichments,
evolutionary analyses,
pathological analyses,
and protein-protein interaction (PPI) identifications.

anneng

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03549-8
BEAVR: a browser-based tool for the exploration and visualization of RNA-seq data

anneng

http://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html
RNA-seq workflow: gene-level exploratory analysis and differential expression
http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

anneng

https://hbctraining.github.io/scRNA-seq/lessons/02_SC_generation_of_count_matrix.html

anneng

https://atap.psu.ac.th/

anneng

https://degust.erc.monash.edu/

anneng

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9130758/

使用Python分析RNA数据所缺少的功能

anneng

https://www.reneshbedre.com/blog/expression_units.html
Gene expression units explained: RPM, RPKM, FPKM, TPM, DESeq, TMM, SCnorm, GeTMM, and ComBat-Seq

anneng

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8

anneng

https://www.intechopen.com/chapters/55603
RNA‐seq: Applications and Best Practices

anneng

https://geoexplorer.rosalind.kcl.ac.uk/