Library Size
Every pool has a num_states property — the number of distinct sequences it
can produce. Each operation has an internal state (see Operation Modes) whose
count is determined by its mode and parameters. How num_states composes
when operations are combined depends on the operation type. Three rules cover
all cases: multiplication, addition, and no change. These are described below.
import poolparty as pp
pp.init()
Composition rules
Multiply (Cartesian product)
Chaining an operation pairs every input sequence with every possibility of
the operation, producing all combinations. The resulting num_states is
the input pool’s num_states multiplied by operation.num_states.
seqs = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential") # seqs (pool): 3 states
mutants = pp.mutagenize(seqs, num_mutations=1, mode="sequential") # mutagenize (op): 9 internal states
print(mutants.num_states) # 27 (3 × 9)
Add (disjoint union)
stack (or the + operator) places its input pools side by side. Sequences
from each branch appear in the output but are not combined with each other,
so the resulting num_states is the sum of the inputs’ num_states.
a = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential") # a (pool): 3 states
b = pp.from_seqs(["TTT", "ATA", "GAG", "CTC"], mode="sequential") # b (pool): 4 states
combined = pp.stack([a, b])
print(combined.num_states) # 7 (3 + 4)
Unchanged (×1)
Fixed-mode operations transform each input sequence in exactly one
deterministic way, so the number of sequences stays the same.
operation.num_states is 1.
seqs = pp.from_seqs(["ACG", "TGA", "CCC"], mode="sequential") # seqs (pool): 3 states
flipped = pp.rc(seqs) # rc (op): 1 internal state
print(flipped.num_states) # 3 (3 × 1)
Worked example
A realistic pipeline that uses chaining (multiply), stack (add), and a final chained step (multiply again):
wt = pp.from_seq("ACGT<cre>ATCG</cre>TTTT<bc/>GGGG") # wt.num_states: 1
# Branch 1: sequential mutagenesis
mutants = wt.mutagenize(region="cre", num_mutations=1,
mode="sequential")
# mutagenize (op): 4 positions × 3 alt bases = 12 internal states
# mutants.num_states: 1 × 12 = 12
# Branch 2: deletion scan
dels = wt.deletion_scan(region="cre", deletion_length=2,
mode="sequential")
# deletion_scan (op): 3 window positions = 3 internal states
# dels.num_states: 1 × 3 = 3
# Stack branches (addition)
combined = pp.stack([mutants, dels])
# combined.num_states: 12 + 3 = 15
# Add barcodes (Cartesian product)
barcoded = combined.insert_kmers(region="bc", length=2,
mode="sequential")
# insert_kmers (op): 4² = 16 internal states
# barcoded.num_states: 15 × 16 = 240
print(barcoded.num_states) # 240
Synchronisation
Without sync, joining two independent 3-state pools produces
3 × 3 = 9 states (Cartesian product). After syncing, the pools iterate in
lockstep, producing only 3 paired states. See sync for details.
left = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")
# Without sync: 3 × 3 = 9
print(pp.join([left, right]).num_states) # 9
# Start a fresh context to demonstrate the synced case
pp.init()
left = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")
pp.sync([left, right])
paired = pp.join([left, right])
print(paired.num_states) # 3
Per-category behaviour
Category |
Effect |
Operation(s) |
|---|---|---|
Source |
sets initial size |
|
Mutagenesis |
multiplies |
|
Scanning |
multiplies |
|
Regions |
multiplies |
|
Regions |
multiplies or unchanged |
|
Regions |
unchanged |
|
ORF |
multiplies |
|
ORF |
unchanged |
|
Composition |
multiplies |
|
Composition |
adds |
|
State |
multiplies |
|
State |
reduces |
|
State |
unchanged |
|
Utilities |
unchanged |
|
Practical tips
Use the
num_statesparameter to cap large sequential enumerations before they multiply downstream.Use sync to pair pools that should iterate together instead of forming a Cartesian product.
Use
sampleorslice_statesto reduce an oversized library after construction.In random mode, passing
num_states=Ndraws N fixed random designs that multiply with the input pool. Withoutnum_states, each sequence gets a fresh draw andpool.num_statesis unchanged.