Design Cards
PoolParty can automatically pair each generated sequence with a design card — a DataFrame row that records how the sequence was constructed. Columns report the changes applied by each operation: mutation positions, substituted characters, scores, orientations, and more. Downstream analysis can filter, group, and model sequences using these columns directly, without parsing the sequences themselves.
Design cards are opt-in: unless you pass the cards parameter, the
output contains only name and seq.
All examples assume:
import poolparty as pp
pp.init()
Why use design cards?
Design cards are especially useful when the parameters that vary across a library are themselves the object of study. For example:
In a deep mutational scanning library, cards can record which amino acid was substituted at which position — enabling direct analysis of mutation effects without re-parsing codon sequences.
In an MPRA library, cards can record which binding sites were inserted and in what order — supporting grouping and statistical testing by design factor.
In surrogate modeling of genomic AI predictions, cards can serve directly as covariates in regression models, linking sequence design parameters to model outputs without any post-hoc feature extraction.
Requesting cards
The cards parameter accepts three forms:
None(default)No card columns in the output.
list[str]Request card keys by name. Column names are prefixed with the operation id (e.g.
op[1]:mutagenize.positions).dict[str, str]Map card keys to custom column names. No prefix is added.
pool = pp.from_iupac("NNNN", mode="sequential")
# List-style — column is "op[1]:score.gc"
scored = pool.score(pp.calc_gc, card_key="gc", cards=["gc"])
# Dict-style — column is just "gc"
scored = pool.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
Use the dict form when you want clean, predictable column names in your output.
Universal card keys
Every operation supports two universal keys, regardless of type:
Key |
Value |
|---|---|
|
The output sequence string at this point in the DAG. Useful for recording intermediate sequences in a multi-step pipeline. |
|
The numeric state index for this operation (0, 1, 2, …). |
wt = pp.from_seq("ATCGATCG")
muts = wt.mutagenize(num_mutations=1, num_states=5,
cards={"state": "mut_state", "seq": "mut_seq"})
df = muts.generate_library()
| name | seq | mut_state | mut_seq |
|---|---|---|---|
| None | CTCGATCG | 0 | CTCGATCG |
| None | GTCGATCG | 1 | GTCGATCG |
| None | TTCGATCG | 2 | TTCGATCG |
| ... | ... | ... | ... |
Operation-specific card keys
Each operation defines which additional keys it supports. Requesting an
invalid key raises ValueError.
Operation |
Card Keys |
Description |
|---|---|---|
|
|
Tuple of mutated positions, wild-type characters, and mutant characters. |
|
|
Codon-level mutation details: positions, original/mutant codons, and original/mutant amino acids. |
|
(the card_key value) |
The score computed by the scoring function. Default key is
|
|
|
Index (0, 1, 2, …) of which input pool produced this sequence. |
|
|
Which repeat copy this sequence belongs to (0, 1, …, times-1). |
|
|
|
|
|
Breakpoint positions and which source pool contributed each segment. |
|
|
Tuple of the permutation applied to molecular positions. |
|
|
|
|
|
Name and index of the selected input sequence. |
|
|
Index and string of the generated k-mer. |
|
|
Index and string of the generated barcode. |
|
|
Scanning position details and the tagged region content. |
Examples
Track mutation details
wt = pp.from_seq("ATCGATCG")
muts = wt.mutagenize(num_mutations=2, num_states=5,
cards={"positions": "mut_pos",
"wt_chars": "wt",
"mut_chars": "mut"})
df = muts.generate_library()
| name | seq | mut_pos | wt | mut |
|---|---|---|---|---|
| None | GTCGACCG | (0, 5) | ('A', 'T') | ('G', 'C') |
| None | ATCAATCG | (3, 4) | ('G', 'A') | ('A', 'A') |
| ... | ... | ... | ... | ... |
Score with a clean column name
wt = pp.from_iupac("NNNN", mode="sequential")
scored = wt.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
df = scored.generate_library()
| name | seq | gc |
|---|---|---|
| None | AAAA | 0.00 |
| None | AAAC | 0.25 |
| None | AAAG | 0.25 |
| ... | ... | ... |
Multiple cards across a pipeline
Each operation in the pipeline can export its own cards independently.
wt = pp.from_iupac("NNNNNNNN", mode="sequential", num_states=10)
scored = (wt
.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
.score(pp.calc_complexity, card_key="complexity", cards={"complexity": "complexity"})
)
df = scored.generate_library()
| name | seq | gc | complexity |
|---|---|---|---|
| None | AAAAAAAA | 0.00 | 0.25 |
| None | AAAAAAAC | 0.125 | 0.34 |
| ... | ... | ... | ... |
Identify which pool produced each sequence
pool_a = pp.from_seqs(["AAAA", "CCCC"], mode="sequential")
pool_b = pp.from_seqs(["GGGG", "TTTT"], mode="sequential")
combined = pp.stack([pool_a, pool_b],
cards={"active_parent": "source"})
df = combined.generate_library()
| name | seq | source |
|---|---|---|
| None | AAAA | 0 |
| None | CCCC | 0 |
| None | GGGG | 1 |
| None | TTTT | 1 |
DMS library with codon-level cards
In a deep mutational scanning library, mutagenize_orf cards record the
amino-acid-level changes for each variant — no sequence parsing needed.
orf = pp.from_seq("ATGAAATTTGGGCCCTAA")
muts = (orf
.annotate_orf("gene")
.mutagenize_orf(num_mutations=1, mode="sequential",
cards={"codon_positions": "position",
"wt_aas": "wt_aa",
"mut_aas": "mut_aa"})
)
df = muts.generate_library()
| name | seq | position | wt_aa | mut_aa |
|---|---|---|---|---|
| None | ATGCAATTTGGGCCCTAA | (1,) | ('K',) | ('Q',) |
| None | ATGGAATTTGGGCCCTAA | (1,) | ('K',) | ('E',) |
| None | ATGAAAGTTGGGCCCTAA | (2,) | ('F',) | ('V',) |
| ... | ... | ... | ... | ... |
Cards as covariates for modeling
Card columns are ordinary DataFrame columns, so they can be used directly as covariates in statistical or machine-learning models. This avoids post-hoc sequence parsing: the design parameters are already structured as regression features.
# Pseudocode: score a library with a model, then regress on card features
df = library.generate_library()
df["model_score"] = predict_with_model(df["seq"])
# Card columns become covariates
import statsmodels.api as sm
X = df[["position", "strength"]] # from design cards
y = df["model_score"]
model = sm.OLS(y, sm.add_constant(X)).fit()
Disabling cards globally
To suppress all card computation for performance:
pp.toggle_cards(on=False)
This causes every operation to skip card computation regardless of the
cards parameter. Re-enable with pp.toggle_cards(on=True).