Library Size
Every pool has a num_states property — the number of distinct sequences it
can produce. Each operation has an internal state (see Operation Modes) whose
count is determined by its mode and parameters. When operations are chained, the output pool’s num_states is the product
of all internal state counts along the chain. Other composition patterns
(stacking, synchronisation) follow different rules described below.
import poolparty as pp
pp.init()
Composition rules
Three rules determine how pool.num_states changes as operations are
applied.
Multiply (Cartesian product)
Chaining an operation pairs every input sequence with every possibility of
the operation, producing all combinations. The resulting num_states is
the input pool’s num_states multiplied by operation.num_states.
seqs = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential") # seqs (pool): 3 states
mutants = pp.mutagenize(seqs, num_mutations=1, mode="sequential") # mutagenize (op): 9 internal states
print(mutants.num_states) # 27 (3 × 9)
Add (disjoint union)
stack (or the + operator) places its input pools side by side. Sequences
from each branch appear in the output but are not combined with each other,
so the resulting num_states is the sum of the inputs’ num_states.
a = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential") # a (pool): 3 states
b = pp.from_seqs(["TTT", "ATA", "GAG", "CTC"], mode="sequential") # b (pool): 4 states
combined = pp.stack([a, b])
print(combined.num_states) # 7 (3 + 4)
No change (×1)
Fixed-mode operations transform each input sequence in exactly one
deterministic way, so the number of sequences stays the same.
operation.num_states is 1.
seqs = pp.from_seqs(["ACG", "TGA", "CCC"], mode="sequential") # seqs (pool): 3 states
flipped = pp.rc(seqs) # rc (op): 1 internal state
print(flipped.num_states) # 3 (3 × 1)
Per-category behaviour
Category |
Effect |
Operation(s) |
|---|---|---|
Source |
sets initial size |
|
Mutagenesis |
multiplies |
|
Scanning |
multiplies |
|
Regions |
multiplies |
|
Regions |
unchanged |
|
ORF |
multiplies |
|
ORF |
unchanged |
|
Composition |
multiplies |
|
Composition |
adds |
|
State |
multiplies |
|
State |
reduces |
|
State |
unchanged |
|
Utilities |
unchanged |
|
Synchronisation
Without sync, joining two independent 3-state pools produces
3 × 3 = 9 states (Cartesian product). After syncing, the pools iterate in
lockstep, producing only 3 paired states. See sync for details.
left = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")
# Without sync: 3 × 3 = 9
print(pp.join([left, right]).num_states) # 9
left = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")
pp.sync([left, right])
paired = pp.join([left, right])
print(paired.num_states) # 3
Worked example
A realistic pipeline that uses chaining (multiply), stack (add), and a final chained step (multiply again):
wt = pp.from_seq("ACGT<cre>ATCG</cre>TTTT<bc/>GGGG") # wt.num_states: 1
# Branch 1: sequential mutagenesis
mutants = wt.mutagenize(region="cre", num_mutations=1,
mode="sequential")
# mutagenize (op): 4 positions × 3 alt bases = 12 internal states
# mutants.num_states: 1 × 12 = 12
# Branch 2: deletion scan
dels = wt.deletion_scan(region="cre", deletion_length=2,
mode="sequential")
# deletion_scan (op): 3 window positions = 3 internal states
# dels.num_states: 1 × 3 = 3
# Stack branches (addition)
combined = pp.stack([mutants, dels])
# combined.num_states: 12 + 3 = 15
# Add barcodes (Cartesian product)
barcoded = combined.insert_kmers(region="bc", length=2,
mode="sequential")
# insert_kmers (op): 4² = 16 internal states
# barcoded.num_states: 15 × 16 = 240
print(barcoded.num_states) # 240
Practical tips
Use the
num_statesparameter to cap large sequential enumerations before they multiply downstream.Use sync to pair pools that should iterate together instead of forming a Cartesian product.
Use
sampleorslice_statesto reduce an oversized library after construction.In random mode without the
num_statesparameter, each sequence gets a fresh random draw (×1, no multiplication). Withnum_states=N, the operation contributes N randomly chosen designs that multiply with the input pool (×N).