Design Cards

PoolParty can automatically pair each generated sequence with a design card — a DataFrame row that records how the sequence was constructed. Columns report the changes applied by each operation: mutation positions, substituted characters, scores, orientations, and more. Downstream analysis can filter, group, and model sequences using these columns directly, without parsing the sequences themselves.

Design cards are opt-in: unless you pass the cards parameter, the output contains only name and seq.

All examples assume:

import poolparty as pp
pp.init()

Why use design cards?

Design cards are especially useful when the parameters that vary across a library are themselves the object of study. For example:

In a deep mutational scanning library, cards can record which amino acid was substituted at which position — enabling direct analysis of mutation effects without re-parsing codon sequences.
In an MPRA library, cards can record which binding sites were inserted and in what order — supporting grouping and statistical testing by design factor.
In surrogate modeling of genomic AI predictions, cards can serve directly as covariates in regression models, linking sequence design parameters to model outputs without any post-hoc feature extraction.

Requesting cards

The cards parameter accepts three forms:

None (default): No card columns in the output.
list[str]: Request card keys by name. Column names are prefixed with the operation id (e.g. op[1]:mutagenize.positions).
dict[str, str]: Map card keys to custom column names. No prefix is added.

pool = pp.from_iupac("NNNN", mode="sequential")

# List-style — column is "op[1]:score.gc"
scored = pool.score(pp.calc_gc, card_key="gc", cards=["gc"])

# Dict-style — column is just "gc"
scored = pool.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})

Use the dict form when you want clean, predictable column names in your output.

Universal card keys

Every operation supports two universal keys, regardless of type:

Key	Value
`"seq"`	The output sequence string at this point in the DAG. Useful for recording intermediate sequences in a multi-step pipeline.
`"state"`	The numeric state index for this operation (0, 1, 2, …).

wt   = pp.from_seq("ATCGATCG")
muts = wt.mutagenize(num_mutations=1, num_states=5,
                     cards={"state": "mut_state", "seq": "mut_seq"})
df   = muts.generate_library()

df — 5 rows × 4 columns

name	seq	mut_state	mut_seq
None	CTCGATCG	0	CTCGATCG
None	GTCGATCG	1	GTCGATCG
None	TTCGATCG	2	TTCGATCG
...	...	...	...

Operation-specific card keys

Each operation defines which additional keys it supports. Requesting an invalid key raises ValueError.

Operation	Card Keys	Description
`mutagenize`	`positions`, `wt_chars`, `mut_chars`	Tuple of mutated positions, wild-type characters, and mutant characters.
`mutagenize_orf`	`codon_positions`, `wt_codons`, `mut_codons`, `wt_aas`, `mut_aas`	Codon-level mutation details: positions, original/mutant codons, and original/mutant amino acids.
`score`	(the card_key value)	The score computed by the scoring function. Default key is `"score"`; set `card_key="gc"` to use `"gc"` instead.
`stack`	`active_parent`	Index (0, 1, 2, …) of which input pool produced this sequence.
`repeat`	`repeat_index`	Which repeat copy this sequence belongs to (0, 1, …, times-1).
`flip`	`flip`	`"forward"` or `"rc"` indicating the orientation.
`recombine`	`breakpoints`, `pool_assignments`	Breakpoint positions and which source pool contributed each segment.
`shuffle_seq`	`permutation`	Tuple of the permutation applied to molecular positions.
`filter`	`passed`	`True` if the sequence passed the predicate, `False` otherwise.
`from_seqs`	`seq_name`, `seq_index`	Name and index of the selected input sequence.
`get_kmers`	`kmer_index`, `kmer`	Index and string of the generated k-mer.
`get_barcodes`	`barcode_index`, `barcode`	Index and string of the generated barcode.
`region_scan`	`position_index`, `start`, `end`, `name`, `region_seq`	Scanning position details and the tagged region content.

Examples

Track mutation details

wt   = pp.from_seq("ATCGATCG")
muts = wt.mutagenize(num_mutations=2, num_states=5,
                     cards={"positions": "mut_pos",
                            "wt_chars": "wt",
                            "mut_chars": "mut"})
df = muts.generate_library()

df — 5 rows × 5 columns

name	seq	mut_pos	wt	mut
None	GTCGACCG	(0, 5)	('A', 'T')	('G', 'C')
None	ATCAATCG	(3, 4)	('G', 'A')	('A', 'A')
...	...	...	...	...

Score with a clean column name

wt     = pp.from_iupac("NNNN", mode="sequential")
scored = wt.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
df     = scored.generate_library()

df — 256 rows × 3 columns (no "op[N]:score." prefix)

name	seq	gc
None	AAAA	0.00
None	AAAC	0.25
None	AAAG	0.25
...	...	...

Multiple cards across a pipeline

Each operation in the pipeline can export its own cards independently.

wt     = pp.from_iupac("NNNNNNNN", mode="sequential", num_states=10)
scored = (wt
    .score(pp.calc_gc,        card_key="gc",         cards={"gc": "gc"})
    .score(pp.calc_complexity, card_key="complexity", cards={"complexity": "complexity"})
)
df = scored.generate_library()

df — 10 rows × 4 columns

name	seq	gc	complexity
None	AAAAAAAA	0.00	0.25
None	AAAAAAAC	0.125	0.34
...	...	...	...

Identify which pool produced each sequence

pool_a = pp.from_seqs(["AAAA", "CCCC"], mode="sequential")
pool_b = pp.from_seqs(["GGGG", "TTTT"], mode="sequential")
combined = pp.stack([pool_a, pool_b],
                    cards={"active_parent": "source"})
df = combined.generate_library()

df — 4 rows × 3 columns

name	seq	source
None	AAAA	0
None	CCCC	0
None	GGGG	1
None	TTTT	1

DMS library with codon-level cards

In a deep mutational scanning library, mutagenize_orf cards record the amino-acid-level changes for each variant — no sequence parsing needed.

orf  = pp.from_seq("ATGAAATTTGGGCCCTAA")
muts = (orf
    .annotate_orf("gene")
    .mutagenize_orf(num_mutations=1, mode="sequential",
                    cards={"codon_positions": "position",
                           "wt_aas": "wt_aa",
                           "mut_aas": "mut_aa"})
)
df = muts.generate_library()

df — each row records the amino acid change and position

name	seq	position	wt_aa	mut_aa
None	ATGCAATTTGGGCCCTAA	(1,)	('K',)	('Q',)
None	ATGGAATTTGGGCCCTAA	(1,)	('K',)	('E',)
None	ATGAAAGTTGGGCCCTAA	(2,)	('F',)	('V',)
...	...	...	...	...

Cards as covariates for modeling

Card columns are ordinary DataFrame columns, so they can be used directly as covariates in statistical or machine-learning models. This avoids post-hoc sequence parsing: the design parameters are already structured as regression features.

# Pseudocode: score a library with a model, then regress on card features
df = library.generate_library()
df["model_score"] = predict_with_model(df["seq"])

# Card columns become covariates
import statsmodels.api as sm
X = df[["position", "strength"]]  # from design cards
y = df["model_score"]
model = sm.OLS(y, sm.add_constant(X)).fit()

Disabling cards globally

To suppress all card computation for performance:

pp.toggle_cards(on=False)

This causes every operation to skip card computation regardless of the cards parameter. Re-enable with pp.toggle_cards(on=True).