generate_library
Evaluate a pool pipeline and return the resulting sequences as a
pandas.DataFrame with name and seq columns (or a plain list
when seqs_only=True). This is a terminal operation: it triggers all
upstream computation and produces concrete output. Randomized upstream
operations (for example mutagenize(..., mode="random")) should set
mode explicitly so draws match the intent of the example.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Pool to evaluate. |
|
|
|
Number of complete cycles through the state space. Each cycle visits every state exactly once. |
|
|
|
Exact number of sequences to generate. Overrides |
|
|
|
Random seed for reproducible output (see examples). |
|
|
|
Starting state index. |
|
|
|
If |
|
|
|
If |
|
|
|
Maximum iterations before stopping (useful with filters that reject most draws). |
|
|
|
If the acceptance rate drops below this threshold, generation stops early. |
|
|
|
Number of draws between acceptance-rate checks. |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see generate_library() in the
API Reference.
Examples
Basic usage: generate sequences from a scan pool
Build a mutagenized pool and call generate_library to collect the output
into a DataFrame.
wt = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
df = pp.generate_library(pool, num_seqs=5)
print(df.to_string())
| name | seq | |
|---|---|---|
| 0 | None | ATCGGTCG |
| 1 | None | ATCGAACG |
| 2 | None | ATCGCTCG |
| 3 | None | GTCGATCG |
| 4 | None | ACCGATCG |
Controlling output size with num_seqs
Pass num_seqs= to generate an exact number of sequences regardless of
the pool’s state-space size.
wt = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
df = pp.generate_library(pool, num_seqs=3)
print(len(df))
print(df.to_string())
3
| name | seq | |
|---|---|---|
| 0 | None | ATCGGTCG |
| 1 | None | ATCGAACG |
| 2 | None | ATCGCTCG |
Reproducible output with seed
Pass seed= to fix the per-row draw for a given pool. The same seed
and the same pool object yield the same rows within one session. After
pp.init(), rebuilding the pipeline and calling with the same seed
matches a fresh interpreter run (operation IDs enter the internal seed
sequence).
wt = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
df = pp.generate_library(pool, num_seqs=3, seed=42)
print(df.to_string())
| name | seq | |
|---|---|---|
| 0 | None | ATCGAACG |
| 1 | None | ACCGATCG |
| 2 | None | ATCGATCT |
Get a plain list with seqs_only=True
When only the sequence strings are needed (e.g. to pass directly to another
function), set seqs_only=True to skip DataFrame construction.
wt = pp.from_seq("ATCGATCG")
pool = pp.mutagenize(wt, num_mutations=1, mode="random")
seqs = pp.generate_library(pool, num_seqs=4, seed=7, seqs_only=True)
print(seqs)
['ATCGATAG', 'GTCGATCG', 'ATCGAGCG', 'ATCGCTCG']
Chain a full pipeline: mutagenize → filter → generate_library
Compose multiple operations and materialise the result in a single call.
wt = pp.from_seq("ATCGATCG")
mutants = pp.mutagenize(wt, num_mutations=1, mode="random")
singles = pp.filter(
mutants,
lambda s: sum(a != b for a, b in zip(s, "ATCGATCG")) == 1,
)
df = pp.generate_library(singles, num_seqs=5, seed=0, discard_null_seqs=True)
print(df.to_string())
| name | seq | |
|---|---|---|
| 0 | None | ATCGGTCG |
| 1 | None | ATCGAACG |
| 2 | None | ATCGCTCG |
| 3 | None | GTCGATCG |
| 4 | None | ACCGATCG |
See generate_library().