generate_library

Evaluate a pool pipeline and return the resulting sequences as a pandas.DataFrame with name and seq columns (or a plain list when seqs_only=True). This is a terminal operation: it triggers all upstream computation and produces concrete output. Randomized upstream operations (for example mutagenize(..., mode="random")) should set mode explicitly so draws match the intent of the example.

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

pool

Pool | DnaPool | ProteinPool

(required)

Pool to evaluate.

num_cycles

int

1

Number of complete cycles through the state space. Each cycle visits every state exactly once.

num_seqs

int | None

None

Exact number of sequences to generate. Overrides num_cycles when provided.

seed

int | None

None

Random seed for reproducible output (see examples).

init_state

int | None

None

Starting state index. None begins from state 0.

seqs_only

bool

False

If True, return a plain list[str] instead of a DataFrame.

discard_null_seqs

bool

False

If True, skip sequences that were filtered out (NullSeq).

max_iterations

int | None

None

Maximum iterations before stopping (useful with filters that reject most draws).

min_acceptance_rate

float | None

None

If the acceptance rate drops below this threshold, generation stops early.

attempts_per_rate_assessment

int

100

Number of draws between acceptance-rate checks.


Note

Only the most commonly used parameters are shown above. For the full parameter list, see generate_library() in the API Reference.

Examples

Basic usage: generate sequences from a scan pool

Build a mutagenized pool and call generate_library to collect the output into a DataFrame.

wt   = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
df   = pp.generate_library(pool, num_seqs=5)
print(df.to_string())
print(df.to_string()) — 5 rows
name seq
0 None ATCGGTCG
1 None ATCGAACG
2 None ATCGCTCG
3 None GTCGATCG
4 None ACCGATCG

Controlling output size with num_seqs

Pass num_seqs= to generate an exact number of sequences regardless of the pool’s state-space size.

wt   = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
df   = pp.generate_library(pool, num_seqs=3)
print(len(df))
print(df.to_string())
len(df) and print(df.to_string())
3
name seq
0 None ATCGGTCG
1 None ATCGAACG
2 None ATCGCTCG

Reproducible output with seed

Pass seed= to fix the per-row draw for a given pool. The same seed and the same pool object yield the same rows within one session. After pp.init(), rebuilding the pipeline and calling with the same seed matches a fresh interpreter run (operation IDs enter the internal seed sequence).

wt   = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
df   = pp.generate_library(pool, num_seqs=3, seed=42)
print(df.to_string())
print(df.to_string()) with seed=42
name seq
0 None ATCGAACG
1 None ACCGATCG
2 None ATCGATCT

Get a plain list with seqs_only=True

When only the sequence strings are needed (e.g. to pass directly to another function), set seqs_only=True to skip DataFrame construction.

wt   = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
seqs = pp.generate_library(pool, num_seqs=4, seed=7, seqs_only=True)
print(seqs)
print(seqs) with seed=7, seqs_only=True
['ATCGATAG', 'GTCGATCG', 'ATCGAGCG', 'ATCGCTCG']

Chain a full pipeline: mutagenize → filter → generate_library

Compose multiple operations and materialise the result in a single call.

wt      = pp.from_seq("ATCGATCG")
mutants = pp.mutagenize(wt, num_mutations=1, mode="random")
singles = pp.filter(
    mutants,
    lambda s: sum(a != b for a, b in zip(s, "ATCGATCG")) == 1,
)
df      = pp.generate_library(singles, num_seqs=5, seed=0, discard_null_seqs=True)
print(df.to_string())
print(df.to_string()) with seed=0
name seq
0 None ATCGGTCG
1 None ATCGAACG
2 None ATCGCTCG
3 None GTCGATCG
4 None ACCGATCG

See generate_library().