Library Size

Every pool has a num_states property — the number of distinct sequences it can produce. Each operation has an internal state (see Operation Modes) whose count is determined by its mode and parameters. When operations are chained, the output pool’s num_states is the product of all internal state counts along the chain. Other composition patterns (stacking, synchronisation) follow different rules described below.

import poolparty as pp
pp.init()

Composition rules

Three rules determine how pool.num_states changes as operations are applied.

Multiply (Cartesian product)

Chaining an operation pairs every input sequence with every possibility of the operation, producing all combinations. The resulting num_states is the input pool’s num_states multiplied by operation.num_states.

seqs    = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")  # seqs (pool): 3 states
mutants = pp.mutagenize(seqs, num_mutations=1, mode="sequential")  # mutagenize (op): 9 internal states
print(mutants.num_states)   # 27  (3 × 9)

Add (disjoint union)

stack (or the + operator) places its input pools side by side. Sequences from each branch appear in the output but are not combined with each other, so the resulting num_states is the sum of the inputs’ num_states.

a = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")          # a (pool): 3 states
b = pp.from_seqs(["TTT", "ATA", "GAG", "CTC"], mode="sequential")  # b (pool): 4 states
combined = pp.stack([a, b])
print(combined.num_states)  # 7  (3 + 4)

No change (×1)

Fixed-mode operations transform each input sequence in exactly one deterministic way, so the number of sequences stays the same. operation.num_states is 1.

seqs    = pp.from_seqs(["ACG", "TGA", "CCC"], mode="sequential")  # seqs (pool): 3 states
flipped = pp.rc(seqs)                                              # rc (op): 1 internal state
print(flipped.num_states)   # 3  (3 × 1)

Per-category behaviour

Category	Effect	Operation(s)
Source	sets initial size	`from_seq`, `from_seqs`, `from_fasta`, `from_iupac`, `from_motif`, `get_kmers`, `get_barcodes`
Mutagenesis	multiplies	`mutagenize`, `shuffle_seq`, `recombine`, `flip`
Scanning	multiplies	`deletion_scan`, `insertion_scan`, `replacement_scan`, `shuffle_scan`, `mutagenize_scan`, `subseq_scan`, and multi-window variants
Regions	multiplies	`replace_region`, `region_scan`, `region_multiscan`
Regions	unchanged	`insert_tags`, `remove_tags`, `annotate_region`, `apply_at_region`, `extract_region`
ORF	multiplies	`mutagenize_orf`, `reverse_translate`
ORF	unchanged	`translate`, `annotate_orf`, `stylize_orf`
Composition	multiplies	`join`
Composition	adds	`stack`
State	multiplies	`repeat`
State	reduces	`sample`, `filter`, `slice_states`
State	unchanged	`shuffle_states`, `sync`, `score`, `materialize`
Utilities	unchanged	`rc`, `upper`, `lower`, `swapcase`, `stylize`, `clear_gaps`, `clear_annotation`, `slice_seq`, `add_prefix`

Synchronisation

Without sync, joining two independent 3-state pools produces 3 × 3 = 9 states (Cartesian product). After syncing, the pools iterate in lockstep, producing only 3 paired states. See sync for details.

left  = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")

# Without sync: 3 × 3 = 9
print(pp.join([left, right]).num_states)   # 9

left  = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")

pp.sync([left, right])
paired = pp.join([left, right])
print(paired.num_states)                   # 3

Worked example

A realistic pipeline that uses chaining (multiply), stack (add), and a final chained step (multiply again):

wt = pp.from_seq("ACGT<cre>ATCG</cre>TTTT<bc/>GGGG")        # wt.num_states: 1

# Branch 1: sequential mutagenesis
mutants = wt.mutagenize(region="cre", num_mutations=1,
                        mode="sequential")
# mutagenize (op): 4 positions × 3 alt bases = 12 internal states
# mutants.num_states: 1 × 12 = 12

# Branch 2: deletion scan
dels = wt.deletion_scan(region="cre", deletion_length=2,
                        mode="sequential")
# deletion_scan (op): 3 window positions = 3 internal states
# dels.num_states: 1 × 3 = 3

# Stack branches (addition)
combined = pp.stack([mutants, dels])
# combined.num_states: 12 + 3 = 15

# Add barcodes (Cartesian product)
barcoded = combined.insert_kmers(region="bc", length=2,
                                 mode="sequential")
# insert_kmers (op): 4² = 16 internal states
# barcoded.num_states: 15 × 16 = 240

print(barcoded.num_states)  # 240

Practical tips

Use the num_states parameter to cap large sequential enumerations before they multiply downstream.
Use sync to pair pools that should iterate together instead of forming a Cartesian product.
Use sample or slice_states to reduce an oversized library after construction.
In random mode without the num_states parameter, each sequence gets a fresh random draw (×1, no multiplication). With num_states=N, the operation contributes N randomly chosen designs that multiply with the input pool (×N).