Pools

Every PoolParty operation returns a Pool. A Pool represents a designed sequence library: it records which operation was applied and to what inputs, forming a directed acyclic graph (DAG) of operations. PoolParty walks this graph to generate sequences on demand when you call generate_library(), print_library(), to_df(), or to_file().

Every operation returns a new Pool — the original is never modified. This means you can branch a pipeline at any point and apply different operations to each branch without interference.

Each pool carries a reference to the operation that created it. You can inspect it via pool.operation to check settings like operation.mode and operation.num_states at any point in a pipeline. See Operation Modes for details.

Pools must be created inside an active context. Call pp.init() once at the top of a notebook, or use with pp.Party(): for automatic cleanup when the block exits. See Quickstart Guide for details.

All examples assume:

import poolparty as pp
pp.init()

Properties

Attribute	Type	Description
`name`	`str`	Human-readable name for this pool. Settable. Defaults to `"pool[N]"`.
`num_states`	`int`	Number of distinct sequences this pool produces.
`seq_length`	`int \| None`	Fixed sequence length, or `None` for variable-length pools.
`iter_order`	`float`	Iteration priority. Controls which pool’s sequences change most rapidly when generating combinations in a joined or stacked pool.
`regions`	`set[Region]`	Set of `Region` objects present in this pool’s sequences. See Sequence Regions for details.
`parents`	`list[Pool]`	Input pools that this pool’s operation reads from.
`operation`	`Operation`	The operation that created this pool. Exposes `operation.mode`, `operation.num_states`, and `operation.natural_num_states`.

Note that pool.num_states and pool.operation.num_states are different values. The pool’s num_states is the total across the entire pipeline, while the operation’s num_states is just that operation’s contribution (see Operation Modes and Library Size):

seqs = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
mut  = seqs.mutagenize(num_mutations=1, mode="sequential")

mut.num_states                    # 27 (3 inputs × 9 mutants)
mut.operation.num_states          # 9  (mutagenize alone)
mut.operation.natural_num_states  # 9  (before any num_states override)

Naming and copying

`named(name)`

Set the pool’s name and return self, allowing in-line renaming without breaking a chain.

wt = pp.from_seq("ACGT").named("wildtype")
# wt.name == "wildtype"

scored = (
    pp.from_iupac("NNNN", mode="sequential")
      .mutagenize(num_mutations=1)
      .named("single_mut")
)

`copy()` and `deepcopy()`

copy() creates a new pool that shares the same input pools — useful for branching a design at a specific point without re-running earlier operations.

deepcopy() creates a fully independent copy of the entire upstream DAG — nothing is shared with the original. In most cases copy() is sufficient. Use deepcopy() when the two branches must be fully independent and share no input pools.

base = pp.from_iupac("NNNN", mode="sequential")
branch_a = base.mutagenize(num_mutations=1).named("branch_a")
branch_b = base.copy().mutagenize(num_mutations=2).named("branch_b")
# branch_a and branch_b share the same "base" input pool

Operator shortcuts

Pools support three Python operators as shorthand for common operations:

pool_a + pool_b: Equivalent to pp.stack([pool_a, pool_b]). See stack.
pool * N: Equivalent to pp.repeat(pool, times=N). See repeat.
pool[start:stop]: Equivalent to pp.slice_states(pool, start=start, stop=stop). See slice_states.

a = pp.from_seqs(["AAA", "CCC"], mode="sequential")
b = pp.from_seqs(["GGG", "TTT"], mode="sequential")

combined = a + b          # 4 states (2 + 2)
repeated = a * 3          # 6 states (2 × 3)
sliced   = combined[:3]   # 3 states (first 3 of 4)

Generating sequences

`generate_library(...)`

Generate all sequences from this pool and return them as a pandas.DataFrame. Best for small to medium pools; for libraries above ~10k sequences, use to_df which streams in chunks. See generate_library for full documentation.

pool = pp.from_iupac("NNNN", mode="sequential")
df   = pool.generate_library()
# df has columns: name, seq  (plus any design card columns)

`print_library(...)`

Print a formatted preview of the pool’s sequences to stdout. Returns self so it can be used mid-pipeline.

Parameter	Type	Default	Description
`num_seqs`	`int \| None`	`None`	Number of sequences to show.
`num_cycles`	`int \| None`	`1`	Number of complete passes through the pool’s `num_states` sequences (used when `num_seqs` is not given). One cycle produces `num_states` sequences.
`show_header`	`bool`	`True`	Print a summary header line before the sequences.
`show_name`	`bool`	`True`	Include the sequence name column.
`show_seq`	`bool`	`True`	Include the sequence column.
`show_state`	`bool`	`False`	Include the state index column.
`pad_names`	`bool`	`True`	Align sequences by padding names to the same width.
`seed`	`int \| None`	`None`	Random seed for reproducible previews.
`discard_null_seqs`	`bool`	`False`	Skip sequences removed by a `filter` operation (`NullSeq`).

See Pool in the API Reference for the full parameter list.

pp.from_iupac("NNNNN", mode="sequential").print_library(num_seqs=6)

pool[0]: seq_length=5, num_states=1024 pool[0].0 AAAAA
pool[0].1 AAAAC
pool[0].2 AAAAG
pool[0].3 AAAAT
pool[0].4 AAACA
pool[0].5 AAACC

Exporting to a DataFrame — `to_df(...)`

Generate sequences and collect them into a pandas.DataFrame using chunked streaming. Prefer to_df over generate_library for large libraries (above ~10k sequences). It processes sequences in batches, keeping peak memory proportional to chunk_size rather than the full library.

Parameter	Type	Default	Description
`num_seqs`	`int \| None`	`None`	Total sequences to generate. Required when `num_cycles` is not given.
`num_cycles`	`int \| None`	`None`	Number of complete passes through the pool’s `num_states` sequences. One cycle produces `num_states` sequences.
`chunk_size`	`int`	`1000`	Sequences generated per internal batch. Larger values may be faster but use more memory.
`write_tags`	`bool`	`False`	If `True`, include region tags (e.g. `<region>…</region>`) in the `seq` column.
`seed`	`int \| None`	`None`	Random seed for reproducibility.
`discard_null_seqs`	`bool`	`True`	Skip sequences removed by a `filter` operation (`NullSeq`).
`columns`	`list[str] \| None`	`None`	Columns to keep. Defaults to all columns (`name`, `seq`, plus any design card columns). Pass `["name", "seq"]` to drop cards.
`show_progress`	`bool`	`True`	Display a `tqdm` progress bar during generation.

See Pool in the API Reference for the full parameter list.

Basic usage

pool = pp.from_iupac("NNNNNNNN", mode="sequential")
df   = pool.to_df(num_cycles=1)
# 65536 rows, columns: name, seq

Large library with chunked streaming

pool = pp.from_iupac("NNNNNNNNNN")
df   = pool.to_df(num_seqs=500_000, chunk_size=10_000, seed=42) # Random sample of 500k sequences from ~1M possible sequences

Keep only name and seq (drop design cards)

scored = pool.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
df     = scored.to_df(num_cycles=1, columns=["name", "seq"])
# "gc" column is excluded

Exporting to file — `to_file(...)`

Stream sequences directly to disk without ever holding the full library in memory. Supports CSV, TSV, FASTA, and JSONL formats, including gzip compression.

Parameter	Type	Default	Description
`path`	`str \| Path`	(required)	Output file path. Use a `.gz` suffix for transparent gzip compression (e.g. `library.csv.gz`).
`file_type`	`str \| None`	`None`	`"csv"`, `"tsv"`, `"fasta"`, or `"jsonl"`. Auto-detected from the file extension when `None`.
`num_seqs`	`int \| None`	`None`	Total sequences to write.
`num_cycles`	`int \| None`	`None`	Number of complete passes through the pool’s `num_states` sequences. One cycle produces `num_states` sequences.
`chunk_size`	`int`	`1000`	Sequences written per internal batch.
`write_tags`	`bool`	`False`	Include region tags in output sequences.
`seed`	`int \| None`	`None`	Random seed for reproducibility.
`discard_null_seqs`	`bool`	`True`	Skip sequences removed by a `filter` operation (`NullSeq`).
`columns`	`list[str] \| None`	`None`	Columns to write (CSV/TSV only).
`line_width`	`int \| None`	`60`	FASTA only: wrap sequence lines at this width. `None` for no wrapping.
`description`	`str \| callable \| None`	`None`	FASTA only: additional description text after the sequence name. A string is treated as a format template (e.g. `"GC={gc:.2f}"`); a callable receives the row dict and should return a string.
`show_progress`	`bool`	`True`	Show a `tqdm` progress bar.

Returns the number of sequences written. See Pool in the API Reference for the full parameter list.

Export to CSV

pool = pp.from_iupac("NNNNNNNN")
n    = pool.to_file("library.csv", num_seqs=100_000)
# n == 100000

name,seq
pool[0].0,AAAAAAAA
pool[0].1,AAAAAAAC
pool[0].2,AAAAAAAG
pool[0].3,AAAAAAAT
pool[0].4,AAAAAACA
...

Export to gzip-compressed CSV

n = pool.to_file("library.csv.gz", num_seqs=1_000_000, chunk_size=50_000)

Export to FASTA

n = pool.to_file("library.fasta", num_seqs=10_000)

>pool[0].0
AAAAAAAA
>pool[0].1
AAAAAAAC
>pool[0].2
AAAAAAAG
...

FASTA with a custom description line

scored = pool.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
n = scored.to_file(
    "library.fasta",
    num_seqs=1000,
    description=lambda row: f"GC={row['gc']:.3f}",
)

>pool[0].0 GC=0.000
AAAAAAAA
>pool[0].1 GC=0.125
AAAAAAAC
>pool[0].2 GC=0.125
AAAAAAAG
...

Visualising the DAG — `print_dag(...)`

Print an ASCII tree of the computation graph rooted at this pool. Returns self so it can be used mid-pipeline.

Parameter	Type	Default	Description
`style`	`str`	`"clean"`	Tree drawing style. `"clean"` uses Unicode box-drawing characters; `"ascii"` uses only ASCII.
`show_pools`	`bool`	`True`	Show pool nodes in addition to operation nodes.

wt     = pp.from_seq("ACG")
mut    = wt.mutagenize(num_mutations=1, mode="sequential")
scored = mut.score(pp.calc_gc, card_key="gc", cards={"gc": "gc"})
scored.print_dag()

pool[2] (pool, n=9)
└── op[2]:score [mode=fixed, n=1]
    └── pool[1] (pool, n=9)
        └── op[1]:mutagenize [mode=sequential, n=9]
            └── pool[0] (pool, n=1)
                └── op[0]:from_seq [mode=fixed, n=1]

Pools

Properties

Naming and copying

named(name)

copy() and deepcopy()

Operator shortcuts

Generating sequences

generate_library(...)

print_library(...)

Exporting to a DataFrame — to_df(...)

Exporting to file — to_file(...)

Visualising the DAG — print_dag(...)