Quickstart Guide

The examples below walk through the main ideas behind PoolParty, from creating a single pool to composing a complete combinatorial library.

Installation

pip install poolparty

Getting started

import poolparty as pp
pp.init()

pp.init() initializes a fresh design context. Call it before each independent library design. For scoped contexts (e.g., inside a function that builds one library), use with pp.Party() instead. See Pools for details.

Core concepts

Pools. A Pool represents a designed collection of DNA sequences. Pools are lazy: they record the rules for generating sequences but delay actual generation until the user requests it. Pools are also immutable: every operation returns a new Pool, leaving the original unchanged. This means you can branch a pipeline at any point without interference.

Operations. An Operation takes one or more Pools as input and produces a new Pool as output. By chaining operations, you build a directed acyclic graph (DAG) that specifies your library design. PoolParty provides over 20 built-in operations in four categories (source, transformation, composition, and state). See Operations for the full catalog.

Modes. Most operations accept a mode parameter that controls how outputs are produced:

sequential – enumerate every possibility deterministically
random – sample from the design space
fixed – output is uniquely determined by the input (no variation)

See Operation Modes for details.

These three ideas underlie every PoolParty pipeline. The sections below show how they work in practice.

Creating a pool

All pools originate from a source operation. Source operations do not require an existing pool as input. The simplest is from_seq, which wraps a single DNA sequence:

wt = pp.from_seq("ATCGATCG")
wt.print_library()

pool[0]: seq_length=8, num_states=1 ATCGATCG

This pool has num_states=1 – the number of distinct sequences the pool can produce – because it contains exactly one sequence. seq_length reports the length of every sequence in the pool.

Other source operations include from_seqs (multiple sequences), from_iupac (degenerate IUPAC codes), from_fasta (FASTA files), get_kmers (all k-mers of a given length), and get_barcodes (constrained barcodes). See Source Operations.

Applying an operation

Operations transform pools. Each operation returns a new pool; the original is unchanged. Here, mutagenize in sequential mode generates every single-nucleotide substitution:

mutants = wt.mutagenize(num_mutations=1, mode="sequential")
mutants.print_library(num_seqs=6)

pool[1]: seq_length=8, num_states=24 CTCGATCG
GTCGATCG
TTCGATCG
AACGATCG
ACCGATCG
AGCGATCG ... (24 total)

mode="sequential" enumerates all 24 single-nucleotide substitutions: 8 positions times 3 alternative bases. The original wt pool is still a single-sequence pool – mutagenize returned a new pool.

With mode="random", the operation would draw a single random mutant instead; passing num_states=N in random mode draws N random designs. See Operation Modes.

Operations can be called as standalone functions or as methods on a Pool: wt.mutagenize(...) and pp.mutagenize(wt, ...) are equivalent. See Operations for the full catalog.

Working with pools

A few API patterns make PoolParty pipelines easier to write and debug.

Method chaining. Since every operation returns a new Pool, calls can be chained left-to-right into a pipeline:

library = (
    pp.from_seq("ATCGATCG")
    .mutagenize(num_mutations=1, mode="sequential")
    .named("mutants")
    .print_library(num_seqs=3)
    .repeat(times=2)
)
print(library.num_states)  # 48

mutants: seq_length=8, num_states=24 CTCGATCG
GTCGATCG
TTCGATCG ... (24 total)

.named("mutants") labels the pool for display (appears in print_library headers and print_dag output). This is distinct from prefix, which labels sequence names in the output DataFrame (see Sequence Names).
.print_library(num_seqs=3) previews 3 sequences mid-chain, then returns the pool unchanged so the chain continues.
The final pool has num_states=48: 24 mutants times 2 repeats.

Branching. Because pools are immutable, you can apply different operations to the same input without interference:

branch_a = wt.mutagenize(num_mutations=1, mode="sequential")
branch_b = wt.deletion_scan(deletion_length=2, mode="sequential")
# wt is unchanged; branch_a and branch_b are independent

This branching pattern is how you build multi-component libraries (as in the capstone example below).

Inspecting a pool. At any point you can check pool.num_states, pool.seq_length, and pool.regions. Call pool.print_dag() to visualize the full pipeline structure (demonstrated in the capstone).

Reproducibility. Pass seed=42 to print_library, generate_library, or to_df for reproducible output across runs.

Sequence regions

You often want to perform different operations on different parts of a sequence. Regions let you mark specific segments with XML-style tags so that operations can target them by name:

template = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
cre_mutants = template.mutagenize(
    num_mutations=1, region="cre", mode="sequential"
).named("cre_mutants")
cre_mutants.print_library(num_seqs=4)

cre_mutants: seq_length=16, num_states=24 AAAA<cre>CTCGATCG</cre>TTTT
AAAA<cre>GTCGATCG</cre>TTTT
AAAA<cre>TTCGATCG</cre>TTTT
AAAA<cre>AACGATCG</cre>TTTT ... (24 total)

Only the 8 bases inside <cre> are mutated; the flanking AAAA and TTTT remain unchanged. Tags persist through the DAG, so multiple operations can target the same region in series. PoolParty also supports self-closing tags (e.g., <ins/>) for zero-length insertion points.

See Sequence Regions for full tag syntax, persistence rules, and programmatic tag insertion.

Scanning operations

Scanning operations systematically tile a window across a sequence (or a region), producing one variant per position. They are the workhorse for saturation-style screens:

dels = template.deletion_scan(
    deletion_length=3, region="cre", mode="sequential"
).named("dels")
dels.print_library(num_seqs=4)

dels: seq_length=16, num_states=6 AAAA<cre>---GATCG</cre>TTTT
AAAA<cre>A---ATCG</cre>TTTT
AAAA<cre>AT---TCG</cre>TTTT
AAAA<cre>ATC---CG</cre>TTTT ... (6 total)

deletion_scan slides a 3-bp window across the 8-bp region, yielding 8 - 3 + 1 = 6 variants (one per valid window position). The scan is restricted to <cre>, so flanking sequences remain intact.

Other scanning operations include insertion_scan, replacement_scan, mutagenize_scan, and their multi-window variants (insertion_multiscan, etc.). See Scanning Operations.

Combining pools

Composition operations combine sequences from multiple pools. The two primary operations are stack (merge state spaces) and join (concatenate sequences end-to-end).

stack merges the mutagenesis and deletion pools:

combined = pp.stack([cre_mutants, dels])
print(combined.num_states)  # 30 (24 + 6)

repeat duplicates a pool’s sequences for replication:

wt_copies = template.repeat(times=5)
print(wt_copies.num_states)  # 5

stack produces a pool whose state space is the union of all inputs (24 + 6 = 30). repeat produces N copies of each input sequence (multiplicative). Another key composition operation is join, which concatenates sequences from different pools end-to-end (Cartesian product of state spaces).

See Composition Operations. For how state counts compose across different operation types, see Library Size.

Sequence metadata

PoolParty automatically tracks how each sequence was constructed through three complementary mechanisms:

Names. Each sequence receives a dot-separated name summarizing its construction history (e.g., mut_03.rep_1). Users assign labels via the prefix parameter on operations.
Design cards. Structured DataFrame columns that record every design choice – mutation positions, substituted characters, scores, orientations – ready for filtering, grouping, and statistical modeling.
Styling. Per-character color and formatting annotations that highlight mutations, deletions, and regions in print_library output for quick visual auditing.

The capstone example below demonstrates all three: names via prefix, styling via style, and design cards via cards. See Sequence Names, Design Cards, and Styling.

Generating libraries

print_library() previews sequences in the terminal. To produce a pandas.DataFrame, use generate_library():

df = combined.generate_library()

The DataFrame contains a name column, a seq column, and any design card columns requested via the cards parameter. For larger libraries (above ~10k sequences), use to_df (chunked streaming) or to_file (stream directly to CSV, FASTA, or JSONL). See Pools for full export options.

Putting it all together

The following example combines every concept from the preceding sections into a complete pipeline. A template sequence contains a <tag> region targeted for both mutagenesis and deletion scanning. We start a fresh session to build the example from scratch.

Example PoolParty workflow combining mutagenesis and deletion scanning into a single library.

pp.init()

template = pp.from_seq("TCCGACT<tag>GCA</tag>ATTCGGA").named("template")

mut_pool = template.mutagenize(
    num_mutations=1,
    region="tag",
    style="red bold",
    prefix="mut",
    mode="sequential",
    cards={"positions": "mut_pos", "wt_chars": "wt", "mut_chars": "mut"},
).named("mut_pool")

del_pool = template.deletion_scan(
    deletion_length=1,
    region="tag",
    style="green bold",
    prefix="del",
    mode="sequential",
    cards={"start": "del_start"},
).repeat(times=2, prefix="rep",
    cards={"repeat_index": "rep_idx"},
).named("del_pool")

pool_final = pp.stack([mut_pool, del_pool]).named("pool_final")
pool_final.print_library(show_name=True)

pool_final: seq_length=17, num_states=15

name	seq
mut_0	TCCGACT<tag>ACA</tag>ATTCGGA
mut_1	TCCGACT<tag>CCA</tag>ATTCGGA
mut_2	TCCGACT<tag>TCA</tag>ATTCGGA
mut_3	TCCGACT<tag>GAA</tag>ATTCGGA
mut_4	TCCGACT<tag>GGA</tag>ATTCGGA
mut_5	TCCGACT<tag>GTA</tag>ATTCGGA
mut_6	TCCGACT<tag>GCC</tag>ATTCGGA
mut_7	TCCGACT<tag>GCG</tag>ATTCGGA
mut_8	TCCGACT<tag>GCT</tag>ATTCGGA
del_0.rep_0	TCCGACT<tag>-CA</tag>ATTCGGA
del_0.rep_1	TCCGACT<tag>-CA</tag>ATTCGGA
del_1.rep_0	TCCGACT<tag>G-A</tag>ATTCGGA
del_1.rep_1	TCCGACT<tag>G-A</tag>ATTCGGA
del_2.rep_0	TCCGACT<tag>GC-</tag>ATTCGGA
del_2.rep_1	TCCGACT<tag>GC-</tag>ATTCGGA

This pipeline combines every concept from the preceding sections: a source operation creates the template, two transformation operations (mutagenize and deletion_scan) branch from it targeting the same <tag> region, repeat replicates one branch, and stack merges them into the final library. The prefix parameter labels each variant type so names are self-documenting (mut_0, del_0.rep_0, etc.). The style parameter applies color annotations visible in the output: mutations in red, deletions in green.

The DAG view confirms the pipeline structure:

pool_final.print_dag()

pool_final (pool, n=15)
└── op[6]:stack [mode=sequential, n=2]
    ├── mut_pool (pool, n=9)
    │   └── op[1]:mutagenize [mode=sequential, n=9]
    │       └── template (pool, n=1)
    │           └── op[0]:from_seq [mode=fixed, n=1]
    └── del_pool (pool, n=6)
        └── op[5]:repeat [mode=sequential, n=2]
            └── pool[4] (pool, n=3)
                └── op[4]:deletion_scan(replace_region) [mode=fixed, n=1]
                    ├── pool[2] (pool, n=3)
                    │   └── op[2]:deletion_scan(region_scan) [mode=sequential, n=3]
                    │       └── template (pool, n=1)
                    │           └── op[0]:from_seq [mode=fixed, n=1]
                    └── pool[3] (pool, n=1)
                        └── op[3]:deletion_scan(from_seq) [mode=fixed, n=1]

Each node shows its mode and internal state count. Named pools (template, mut_pool, del_pool, pool_final) appear with their labels; unnamed intermediate pools use default identifiers.

Because each operation was called with cards=, the exported DataFrame includes design card columns alongside name and seq:

df = pool_final.generate_library()

df — 15 rows x 7 columns (seq omitted for clarity)

name	mut_pos	wt	mut	del_start	rep_idx
mut_0	(0,)	(G,)	(A,)	None	None
mut_1	(0,)	(G,)	(C,)	None	None
mut_2	(0,)	(G,)	(T,)	None	None
...	...	...	...	...	...
del_0.rep_0	None	None	None	0	0
del_0.rep_1	None	None	None	0	1
...	...	...	...	...	...

Each operation contributes its own card columns: mutagenize records the mutation position, wild-type base, and substituted base; deletion_scan records the deletion start position; and repeat records the copy index. Sequences that did not pass through a given operation have None in its columns. Each operation defines its own set of available card keys. See Design Cards.

Next steps

Walk through complete real-world library designs in the Tutorials (deep mutational scanning, MPRA)
Browse the Operations for the full operation catalog
See Pools for Pool properties, export methods (to_df, to_file), and context management