Quickstart Guide
The examples below walk through the main ideas behind PoolParty, from creating a single pool to composing a complete combinatorial library.
Installation
pip install poolparty
Getting started
import poolparty as pp
pp.init()
pp.init() initializes a fresh design context. Call it before each
independent library design. For scoped contexts (e.g., inside a function
that builds one library), use with pp.Party() instead.
See Pools for details.
Core concepts
Pools. A Pool represents a designed collection of DNA sequences. Pools are lazy: they record the rules for generating sequences but delay actual generation until the user requests it. Pools are also immutable: every operation returns a new Pool, leaving the original unchanged. This means you can branch a pipeline at any point without interference.
Operations. An Operation takes one or more Pools as input and produces a new Pool as output. By chaining operations, you build a directed acyclic graph (DAG) that specifies your library design. PoolParty provides over 20 built-in operations in four categories (source, transformation, composition, and state). See Operations for the full catalog.
Modes. Most operations accept a mode parameter that controls how
outputs are produced:
sequential– enumerate every possibility deterministicallyrandom– sample from the design spacefixed– output is uniquely determined by the input (no variation)
See Operation Modes for details.
These three ideas underlie every PoolParty pipeline. The sections below show how they work in practice.
Creating a pool
All pools originate from a source operation. Source operations do not
require an existing pool as input. The simplest is from_seq, which wraps
a single DNA sequence:
wt = pp.from_seq("ATCGATCG")
wt.print_library()
This pool has num_states=1 – the number of distinct sequences the
pool can produce – because it contains exactly one sequence.
seq_length reports the length of every sequence in the pool.
Other source operations include from_seqs (multiple sequences),
from_iupac (degenerate IUPAC codes), from_fasta (FASTA files),
get_kmers (all k-mers of a given length), and get_barcodes
(constrained barcodes). See Source Operations.
Applying an operation
Operations transform pools. Each operation returns a new pool; the original
is unchanged. Here, mutagenize in sequential mode generates every
single-nucleotide substitution:
mutants = wt.mutagenize(num_mutations=1, mode="sequential")
mutants.print_library(num_seqs=6)
GTCGATCG
TTCGATCG
AACGATCG
ACCGATCG
AGCGATCG ... (24 total)
mode="sequential" enumerates all 24 single-nucleotide substitutions:
8 positions times 3 alternative bases. The original wt pool is still a
single-sequence pool – mutagenize returned a new pool.
With mode="random", the operation would draw a single random mutant
instead; passing num_states=N in random mode draws N random designs.
See Operation Modes.
Operations can be called as standalone functions or as methods on a Pool:
wt.mutagenize(...) and pp.mutagenize(wt, ...) are equivalent.
See Operations for the full catalog.
Working with pools
A few API patterns make PoolParty pipelines easier to write and debug.
Method chaining. Since every operation returns a new Pool, calls can be chained left-to-right into a pipeline:
library = (
pp.from_seq("ATCGATCG")
.mutagenize(num_mutations=1, mode="sequential")
.named("mutants")
.print_library(num_seqs=3)
.repeat(times=2)
)
print(library.num_states) # 48
GTCGATCG
TTCGATCG ... (24 total)
.named("mutants")labels the pool for display (appears inprint_libraryheaders andprint_dagoutput). This is distinct fromprefix, which labels sequence names in the output DataFrame (see Sequence Names)..print_library(num_seqs=3)previews 3 sequences mid-chain, then returns the pool unchanged so the chain continues.The final pool has
num_states=48: 24 mutants times 2 repeats.
Branching. Because pools are immutable, you can apply different operations to the same input without interference:
branch_a = wt.mutagenize(num_mutations=1, mode="sequential")
branch_b = wt.deletion_scan(deletion_length=2, mode="sequential")
# wt is unchanged; branch_a and branch_b are independent
This branching pattern is how you build multi-component libraries (as in the capstone example below).
Inspecting a pool. At any point you can check pool.num_states,
pool.seq_length, and pool.regions. Call pool.print_dag() to
visualize the full pipeline structure (demonstrated in the capstone).
Reproducibility. Pass seed=42 to print_library,
generate_library, or to_df for reproducible output across runs.
Sequence regions
You often want to perform different operations on different parts of a sequence. Regions let you mark specific segments with XML-style tags so that operations can target them by name:
template = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
cre_mutants = template.mutagenize(
num_mutations=1, region="cre", mode="sequential"
).named("cre_mutants")
cre_mutants.print_library(num_seqs=4)
AAAA<cre>GTCGATCG</cre>TTTT
AAAA<cre>TTCGATCG</cre>TTTT
AAAA<cre>AACGATCG</cre>TTTT ... (24 total)
Only the 8 bases inside <cre> are mutated; the flanking AAAA and
TTTT remain unchanged. Tags persist through the DAG, so multiple
operations can target the same region in series. PoolParty also supports
self-closing tags (e.g., <ins/>) for zero-length insertion points.
See Sequence Regions for full tag syntax, persistence rules, and programmatic tag insertion.
Scanning operations
Scanning operations systematically tile a window across a sequence (or a region), producing one variant per position. They are the workhorse for saturation-style screens:
dels = template.deletion_scan(
deletion_length=3, region="cre", mode="sequential"
).named("dels")
dels.print_library(num_seqs=4)
AAAA<cre>A---ATCG</cre>TTTT
AAAA<cre>AT---TCG</cre>TTTT
AAAA<cre>ATC---CG</cre>TTTT ... (6 total)
deletion_scan slides a 3-bp window across the 8-bp region, yielding
8 - 3 + 1 = 6 variants (one per valid window position). The scan is
restricted to <cre>, so flanking sequences remain intact.
Other scanning operations include insertion_scan, replacement_scan,
mutagenize_scan, and their multi-window variants
(insertion_multiscan, etc.). See Scanning Operations.
Combining pools
Composition operations combine sequences from multiple pools. The two
primary operations are stack (merge state spaces) and join
(concatenate sequences end-to-end).
stack merges the mutagenesis and deletion pools:
combined = pp.stack([cre_mutants, dels])
print(combined.num_states) # 30 (24 + 6)
repeat duplicates a pool’s sequences for replication:
wt_copies = template.repeat(times=5)
print(wt_copies.num_states) # 5
stack produces a pool whose state space is the union of all inputs
(24 + 6 = 30). repeat produces N copies of each input sequence
(multiplicative). Another key composition operation is join, which
concatenates sequences from different pools end-to-end (Cartesian product
of state spaces).
See Composition Operations. For how state counts compose across different operation types, see Library Size.
Sequence metadata
PoolParty automatically tracks how each sequence was constructed through three complementary mechanisms:
Names. Each sequence receives a dot-separated name summarizing its construction history (e.g.,
mut_03.rep_1). Users assign labels via theprefixparameter on operations.Design cards. Structured DataFrame columns that record every design choice – mutation positions, substituted characters, scores, orientations – ready for filtering, grouping, and statistical modeling.
Styling. Per-character color and formatting annotations that highlight mutations, deletions, and regions in
print_libraryoutput for quick visual auditing.
The capstone example below demonstrates all three: names via prefix,
styling via style, and design cards via cards.
See Sequence Names, Design Cards, and
Styling.
Generating libraries
print_library() previews sequences in the terminal. To produce a
pandas.DataFrame, use generate_library():
df = combined.generate_library()
The DataFrame contains a name column, a seq column, and any design
card columns requested via the cards parameter. For larger libraries
(above ~10k sequences), use to_df (chunked streaming) or to_file
(stream directly to CSV, FASTA, or JSONL). See Pools for full export
options.
Putting it all together
The following example combines every concept from the preceding sections
into a complete pipeline. A template
sequence contains a <tag> region targeted for both mutagenesis and
deletion scanning. We start a fresh session to build the example from
scratch.
pp.init()
template = pp.from_seq("TCCGACT<tag>GCA</tag>ATTCGGA").named("template")
mut_pool = template.mutagenize(
num_mutations=1,
region="tag",
style="red bold",
prefix="mut",
mode="sequential",
cards={"positions": "mut_pos", "wt_chars": "wt", "mut_chars": "mut"},
).named("mut_pool")
del_pool = template.deletion_scan(
deletion_length=1,
region="tag",
style="green bold",
prefix="del",
mode="sequential",
cards={"start": "del_start"},
).repeat(times=2, prefix="rep",
cards={"repeat_index": "rep_idx"},
).named("del_pool")
pool_final = pp.stack([mut_pool, del_pool]).named("pool_final")
pool_final.print_library(show_name=True)
| name | seq |
|---|---|
| mut_0 | TCCGACT<tag>ACA</tag>ATTCGGA |
| mut_1 | TCCGACT<tag>CCA</tag>ATTCGGA |
| mut_2 | TCCGACT<tag>TCA</tag>ATTCGGA |
| mut_3 | TCCGACT<tag>GAA</tag>ATTCGGA |
| mut_4 | TCCGACT<tag>GGA</tag>ATTCGGA |
| mut_5 | TCCGACT<tag>GTA</tag>ATTCGGA |
| mut_6 | TCCGACT<tag>GCC</tag>ATTCGGA |
| mut_7 | TCCGACT<tag>GCG</tag>ATTCGGA |
| mut_8 | TCCGACT<tag>GCT</tag>ATTCGGA |
| del_0.rep_0 | TCCGACT<tag>-CA</tag>ATTCGGA |
| del_0.rep_1 | TCCGACT<tag>-CA</tag>ATTCGGA |
| del_1.rep_0 | TCCGACT<tag>G-A</tag>ATTCGGA |
| del_1.rep_1 | TCCGACT<tag>G-A</tag>ATTCGGA |
| del_2.rep_0 | TCCGACT<tag>GC-</tag>ATTCGGA |
| del_2.rep_1 | TCCGACT<tag>GC-</tag>ATTCGGA |
This pipeline combines every concept from the preceding sections: a source
operation creates the template, two transformation operations
(mutagenize and deletion_scan) branch from it targeting the same
<tag> region, repeat replicates one branch, and stack merges
them into the final library. The prefix parameter labels each variant
type so names are self-documenting (mut_0, del_0.rep_0, etc.).
The style parameter applies color annotations visible in the output:
mutations in red, deletions in green.
The DAG view confirms the pipeline structure:
pool_final.print_dag()
pool_final (pool, n=15)
└── op[6]:stack [mode=sequential, n=2]
├── mut_pool (pool, n=9)
│ └── op[1]:mutagenize [mode=sequential, n=9]
│ └── template (pool, n=1)
│ └── op[0]:from_seq [mode=fixed, n=1]
└── del_pool (pool, n=6)
└── op[5]:repeat [mode=sequential, n=2]
└── pool[4] (pool, n=3)
└── op[4]:deletion_scan(replace_region) [mode=fixed, n=1]
├── pool[2] (pool, n=3)
│ └── op[2]:deletion_scan(region_scan) [mode=sequential, n=3]
│ └── template (pool, n=1)
│ └── op[0]:from_seq [mode=fixed, n=1]
└── pool[3] (pool, n=1)
└── op[3]:deletion_scan(from_seq) [mode=fixed, n=1]
Each node shows its mode and internal state count. Named pools
(template, mut_pool, del_pool, pool_final) appear with
their labels; unnamed intermediate pools use default identifiers.
Because each operation was called with cards=, the exported DataFrame
includes design card columns alongside name and seq:
df = pool_final.generate_library()
| name | mut_pos | wt | mut | del_start | rep_idx |
|---|---|---|---|---|---|
| mut_0 | (0,) | (G,) | (A,) | None | None |
| mut_1 | (0,) | (G,) | (C,) | None | None |
| mut_2 | (0,) | (G,) | (T,) | None | None |
| ... | ... | ... | ... | ... | ... |
| del_0.rep_0 | None | None | None | 0 | 0 |
| del_0.rep_1 | None | None | None | 0 | 1 |
| ... | ... | ... | ... | ... | ... |
Each operation contributes its own card columns: mutagenize records
the mutation position, wild-type base, and substituted base;
deletion_scan records the deletion start position; and repeat records
the copy index. Sequences that did not pass through a given operation
have None in its columns.
Each operation defines its own set of available card keys.
See Design Cards.
Next steps
Walk through complete real-world library designs in the Tutorials (deep mutational scanning, MPRA)
Browse the Operations for the full operation catalog
See Pools for Pool properties, export methods (
to_df,to_file), and context management