Library Size

Every pool has a num_states property — the number of distinct sequences it can produce. Each operation has an internal state (see Operation Modes) whose count is determined by its mode and parameters. How num_states composes when operations are combined depends on the operation type. Three rules cover all cases: multiplication, addition, and no change. These are described below.

import poolparty as pp
pp.init()

Composition rules

Multiply (Cartesian product)

Chaining an operation pairs every input sequence with every possibility of the operation, producing all combinations. The resulting num_states is the input pool’s num_states multiplied by operation.num_states.

seqs    = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")  # seqs (pool): 3 states
mutants = pp.mutagenize(seqs, num_mutations=1, mode="sequential")  # mutagenize (op): 9 internal states
print(mutants.num_states)   # 27  (3 × 9)

Add (disjoint union)

stack (or the + operator) places its input pools side by side. Sequences from each branch appear in the output but are not combined with each other, so the resulting num_states is the sum of the inputs’ num_states.

a = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")          # a (pool): 3 states
b = pp.from_seqs(["TTT", "ATA", "GAG", "CTC"], mode="sequential")  # b (pool): 4 states
combined = pp.stack([a, b])
print(combined.num_states)  # 7  (3 + 4)

Unchanged (×1)

Fixed-mode operations transform each input sequence in exactly one deterministic way, so the number of sequences stays the same. operation.num_states is 1.

seqs    = pp.from_seqs(["ACG", "TGA", "CCC"], mode="sequential")  # seqs (pool): 3 states
flipped = pp.rc(seqs)                                              # rc (op): 1 internal state
print(flipped.num_states)   # 3  (3 × 1)

Worked example

A realistic pipeline that uses chaining (multiply), stack (add), and a final chained step (multiply again):

wt = pp.from_seq("ACGT<cre>ATCG</cre>TTTT<bc/>GGGG")        # wt.num_states: 1

# Branch 1: sequential mutagenesis
mutants = wt.mutagenize(region="cre", num_mutations=1,
                        mode="sequential")
# mutagenize (op): 4 positions × 3 alt bases = 12 internal states
# mutants.num_states: 1 × 12 = 12

# Branch 2: deletion scan
dels = wt.deletion_scan(region="cre", deletion_length=2,
                        mode="sequential")
# deletion_scan (op): 3 window positions = 3 internal states
# dels.num_states: 1 × 3 = 3

# Stack branches (addition)
combined = pp.stack([mutants, dels])
# combined.num_states: 12 + 3 = 15

# Add barcodes (Cartesian product)
barcoded = combined.insert_kmers(region="bc", length=2,
                                 mode="sequential")
# insert_kmers (op): 4² = 16 internal states
# barcoded.num_states: 15 × 16 = 240

print(barcoded.num_states)  # 240

Synchronisation

Without sync, joining two independent 3-state pools produces 3 × 3 = 9 states (Cartesian product). After syncing, the pools iterate in lockstep, producing only 3 paired states. See sync for details.

left  = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")

# Without sync: 3 × 3 = 9
print(pp.join([left, right]).num_states)   # 9
# Start a fresh context to demonstrate the synced case
pp.init()

left  = pp.from_seqs(["AAA", "CCC", "GGG"], mode="sequential")
right = pp.from_seqs(["TTT", "AAA", "CCC"], mode="sequential")

pp.sync([left, right])
paired = pp.join([left, right])
print(paired.num_states)                   # 3

Per-category behaviour

Category

Effect

Operation(s)

Source

sets initial size

from_seq, from_seqs, from_fasta, from_iupac, from_motif, get_kmers, get_barcodes

Mutagenesis

multiplies

mutagenize, shuffle_seq, recombine, flip

Scanning

multiplies

deletion_scan, insertion_scan, replacement_scan, shuffle_scan, mutagenize_scan, subseq_scan, and multi-window variants

Regions

multiplies

region_scan, region_multiscan

Regions

multiplies or unchanged

replace_region — multiplies with sync=False (Cartesian product); unchanged with sync=True (default, 1:1 pairing)

Regions

unchanged

insert_tags, remove_tags, annotate_region, apply_at_region, extract_region

ORF

multiplies

mutagenize_orf, reverse_translate

ORF

unchanged

translate, annotate_orf, stylize_orf

Composition

multiplies

join

Composition

adds

stack

State

multiplies

repeat

State

reduces

sample, filter, slice_states

State

unchanged

shuffle_states, sync, score, materialize

Utilities

unchanged

rc, upper, lower, swapcase, stylize, clear_gaps, clear_annotation, slice_seq, add_prefix


Practical tips

  • Use the num_states parameter to cap large sequential enumerations before they multiply downstream.

  • Use sync to pair pools that should iterate together instead of forming a Cartesian product.

  • Use sample or slice_states to reduce an oversized library after construction.

  • In random mode, passing num_states=N draws N fixed random designs that multiply with the input pool. Without num_states, each sequence gets a fresh draw and pool.num_states is unchanged.