Preprocess flow data#

In this notebook, we load an fcs file into the anndata format, move the forward scatter (FCS) and sideward scatter (SSC) information to the .obs section of the anndata file and perform compensation on the data. Next, we apply different types of normalisation to the data. The fcs file was part of the following reference and originally deposited on the FlowRepository.

import readfcs
import pytometry as pm

%load_ext autoreload
%autoreload 2

Read data from readfcs package example.

path_data = readfcs.datasets.Oetjen18_t1()

adata = pm.io.read_fcs(path_data)

adata

AnnData object with n_obs × n_vars = 241552 × 20
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG'
    uns: 'meta'

Reduce features#

We split the data matrix into the marker intensity part and the FSC/SSC part. Moreover, we move all height related features to the .obs part of the anndata file. Notably. the function split_signal checks if a feature name is either FSC/SSC or whether a name endswith -A for area related features and -H for height related features.

Let us check the var_names of the features and the channel names. In this example, the channel names have been cleaned such that none of the markers have the -A or -H suffix.

adata.var

	n	channel	marker	$PnR	$PnB	$PnV	$PnG
FSC-A	1	FSC-A		262144	32	510	1.0
FSC-H	2	FSC-H		262144	32	510	1.0
FSC-W	3	FSC-W		262144	32	510	1.0
SSC-A	4	SSC-A		262144	32	310	1.0
SSC-H	5	SSC-H		262144	32	310	1.0
SSC-W	6	SSC-W		262144	32	310	1.0
CD95	7	R660-A	CD95	262144	32	490	1.0
CD8	8	R780-A	CD8	262144	32	475	1.0
CD27	9	B515-A	CD27	262144	32	470	1.0
CXCR4	10	B710-A	CXCR4	262144	32	417	1.0
CCR7	11	V450-A	CCR7	262144	32	400	1.0
LIVE/DEAD	12	V545-A	LIVE/DEAD	262144	32	495	1.0
CD4	13	V605-A	CD4	262144	32	400	1.0
CD45RA	14	V655-A	CD45RA	262144	32	375	1.0
CD3	15	V800-A	CD3	262144	32	400	1.0
CD49B	16	G560-A	CD49B	262144	32	400	1.0
CD14/19	17	G610-A	CD14/19	262144	32	415	1.0
CD69	18	G660-A	CD69	262144	32	470	1.0
CD103	19	G780-A	CD103	262144	32	435	1.0
Time	20	Time		262144	32		0.01

We use the channel column of the adata.var data frame to split the matrix.

pm.pp.split_signal(adata, var_key="channel")

adata

AnnData object with n_obs × n_vars = 241552 × 13
    obs: 'FSC-A', 'FSC-H', 'FSC-W', 'SSC-A', 'SSC-H', 'SSC-W', 'Time'
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG', 'signal_type'
    uns: 'meta'

The data matrix was reduced by three features (FSC-A, FSC-H and SSC-A).

Compensation#

Next, we compensate the data using the compensation matrix that is included in the FCS file header. Alternatively, one may provide a custom compensation matrix.

The compensate function matches the var_names of adata with the column names of the spillover matrix to compensate the correct channels.

pm.pp.compensate(adata)

Normalize data#

In the next step, we normalize the data. By default, normalization is an inplace operation, i.e. we only create a new anndata object, if we set the argument inplace=False. We demonstrate three different normalization methods that are build in pytometry:

arcsinh
logicle
bi-exponential

adata_arcsinh = pm.tl.normalize_arcsinh(adata, cofactor=150, inplace=False)

adata_logicle = pm.tl.normalize_logicle(adata, inplace=False)

adata_biex = pm.tl.normalize_biExp(adata, inplace=False)

Read FCS files

API