Pools
Every PoolParty operation returns a Pool. A Pool represents a designed
sequence library: it records which operation was applied and to what inputs,
forming a directed acyclic graph (DAG) of operations. PoolParty walks this
graph to generate sequences on demand when you call generate_library(),
print_library(), to_df(), or to_file().
Every operation returns a new Pool — the original is never modified. This means you can branch a pipeline at any point and apply different operations to each branch without interference.
Each pool carries a reference to the operation that created it. You can inspect
it via pool.operation to check settings like operation.mode and
operation.num_states at any point in a pipeline. See Operation Modes
for details.
Pools must be created inside an active context. Call pp.init() once at the
top of a notebook, or use with pp.Party(): for automatic cleanup when the
block exits. See Quickstart Guide for details.
All examples assume:
import poolparty as pp
pp.init()
Properties
Attribute |
Type |
Description |
|---|---|---|
|
|
Human-readable name for this pool. Settable. Defaults to |
|
|
Number of distinct sequences this pool produces. |
|
|
Fixed sequence length, or |
|
|
Iteration priority. Controls which pool’s sequences change most rapidly when generating combinations in a joined or stacked pool. |
|
|
Set of |
|
|
Input pools that this pool’s operation reads from. |
|
|
The operation that created this pool. Exposes |
Note that pool.num_states and pool.operation.num_states are different
values. The pool’s num_states is the total across the entire pipeline,
while the operation’s num_states is just that operation’s contribution
(see Operation Modes and Library Size):
seqs = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
mut = seqs.mutagenize(num_mutations=1, mode="sequential")
mut.num_states # 27 (3 inputs × 9 mutants)
mut.operation.num_states # 9 (mutagenize alone)
mut.operation.natural_num_states # 9 (before any num_states override)
Naming and copying
named(name)
Set the pool’s name and return self, allowing in-line renaming without
breaking a chain.
wt = pp.from_seq("ACGT").named("wildtype")
# wt.name == "wildtype"
scored = (
pp.from_iupac("NNNN", mode="sequential")
.mutagenize(num_mutations=1)
.named("single_mut")
)
copy() and deepcopy()
copy() creates a new pool that shares the same input pools — useful for
branching a design at a specific point without re-running earlier operations.
deepcopy() creates a fully independent copy of the entire upstream DAG
— nothing is shared with the original. In most cases copy() is sufficient.
Use deepcopy() when the two branches must be fully independent and share
no input pools.
base = pp.from_iupac("NNNN", mode="sequential")
branch_a = base.mutagenize(num_mutations=1).named("branch_a")
branch_b = base.copy().mutagenize(num_mutations=2).named("branch_b")
# branch_a and branch_b share the same "base" input pool
Operator shortcuts
Pools support three Python operators as shorthand for common operations:
pool_a + pool_bEquivalent to
pp.stack([pool_a, pool_b]). See stack.pool * NEquivalent to
pp.repeat(pool, times=N). See repeat.pool[start:stop]Equivalent to
pp.slice_states(pool, start=start, stop=stop). See slice_states.
a = pp.from_seqs(["AAA", "CCC"], mode="sequential")
b = pp.from_seqs(["GGG", "TTT"], mode="sequential")
combined = a + b # 4 states (2 + 2)
repeated = a * 3 # 6 states (2 × 3)
sliced = combined[:3] # 3 states (first 3 of 4)
Generating sequences
generate_library(...)
Generate all sequences from this pool and return them as a
pandas.DataFrame. Best for small to medium pools; for libraries above ~10k
sequences, use to_df which streams in chunks. See
generate_library for full documentation.
pool = pp.from_iupac("NNNN", mode="sequential")
df = pool.generate_library()
# df has columns: name, seq (plus any design card columns)
print_library(...)
Print a formatted preview of the pool’s sequences to stdout. Returns self
so it can be used mid-pipeline.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Number of sequences to show. |
|
|
|
Number of complete passes through the pool’s |
|
|
|
Print a summary header line before the sequences. |
|
|
|
Include the sequence name column. |
|
|
|
Include the sequence column. |
|
|
|
Include the state index column. |
|
|
|
Align sequences by padding names to the same width. |
|
|
|
Random seed for reproducible previews. |
|
|
|
Skip sequences removed by a |
See Pool in the API Reference for the full parameter list.
pp.from_iupac("NNNNN", mode="sequential").print_library(num_seqs=6)
pool[0].1 AAAAC
pool[0].2 AAAAG
pool[0].3 AAAAT
pool[0].4 AAACA
pool[0].5 AAACC
Exporting to a DataFrame — to_df(...)
Generate sequences and collect them into a pandas.DataFrame using
chunked streaming. Prefer to_df over generate_library for large
libraries (above ~10k sequences). It processes sequences in batches, keeping
peak memory proportional to chunk_size rather than the full library.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Total sequences to generate. Required when |
|
|
|
Number of complete passes through the pool’s |
|
|
|
Sequences generated per internal batch. Larger values may be faster but use more memory. |
|
|
|
If |
|
|
|
Random seed for reproducibility. |
|
|
|
Skip sequences removed by a |
|
|
|
Columns to keep. Defaults to all columns ( |
|
|
|
Display a |
See Pool in the API Reference for the full parameter list.
Basic usage
pool = pp.from_iupac("NNNNNNNN", mode="sequential")
df = pool.to_df(num_cycles=1)
# 65536 rows, columns: name, seq
Large library with chunked streaming
pool = pp.from_iupac("NNNNNNNNNN")
df = pool.to_df(num_seqs=500_000, chunk_size=10_000, seed=42) # Random sample of 500k sequences from ~1M possible sequences
Keep only name and seq (drop design cards)
scored = pool.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
df = scored.to_df(num_cycles=1, columns=["name", "seq"])
# "gc" column is excluded
Exporting to file — to_file(...)
Stream sequences directly to disk without ever holding the full library in memory. Supports CSV, TSV, FASTA, and JSONL formats, including gzip compression.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Output file path. Use a |
|
|
|
|
|
|
|
Total sequences to write. |
|
|
|
Number of complete passes through the pool’s |
|
|
|
Sequences written per internal batch. |
|
|
|
Include region tags in output sequences. |
|
|
|
Random seed for reproducibility. |
|
|
|
Skip sequences removed by a |
|
|
|
Columns to write (CSV/TSV only). |
|
|
|
FASTA only: wrap sequence lines at this width. |
|
|
|
FASTA only: additional description text after the sequence name.
A string is treated as a format template (e.g. |
|
|
|
Show a |
Returns the number of sequences written. See Pool in the
API Reference for the full parameter list.
Export to CSV
pool = pp.from_iupac("NNNNNNNN")
n = pool.to_file("library.csv", num_seqs=100_000)
# n == 100000
name,seq
pool[0].0,AAAAAAAA
pool[0].1,AAAAAAAC
pool[0].2,AAAAAAAG
pool[0].3,AAAAAAAT
pool[0].4,AAAAAACA
...
Export to gzip-compressed CSV
n = pool.to_file("library.csv.gz", num_seqs=1_000_000, chunk_size=50_000)
Export to FASTA
n = pool.to_file("library.fasta", num_seqs=10_000)
>pool[0].0
AAAAAAAA
>pool[0].1
AAAAAAAC
>pool[0].2
AAAAAAAG
...
FASTA with a custom description line
scored = pool.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
n = scored.to_file(
"library.fasta",
num_seqs=1000,
description=lambda row: f"GC={row['gc']:.3f}",
)
>pool[0].0 GC=0.000
AAAAAAAA
>pool[0].1 GC=0.125
AAAAAAAC
>pool[0].2 GC=0.125
AAAAAAAG
...
Visualising the DAG — print_dag(...)
Print an ASCII tree of the computation graph rooted at this pool. Returns
self so it can be used mid-pipeline.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
|
Tree drawing style. |
|
|
|
Show pool nodes in addition to operation nodes. |
wt = pp.from_seq("ACG")
mut = wt.mutagenize(num_mutations=1, mode="sequential")
scored = mut.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
scored.print_dag()
pool[2] (pool, n=9)
└── op[2]:score [mode=fixed, n=1]
└── pool[1] (pool, n=9)
└── op[1]:mutagenize [mode=sequential, n=9]
└── pool[0] (pool, n=1)
└── op[0]:from_seq [mode=fixed, n=1]