:orphan: generate_library ================ Evaluate a pool pipeline and return the resulting sequences as a ``pandas.DataFrame`` with ``name`` and ``seq`` columns (or a plain ``list`` when ``seqs_only=True``). This is a *terminal* operation: it triggers all upstream computation and produces concrete output. Randomized upstream operations (for example ``mutagenize(..., mode="random")``) should set ``mode`` explicitly so draws match the intent of the example. .. code-block:: python import poolparty as pp pp.init() ---- Parameters ---------- .. list-table:: :widths: 20 18 12 50 :header-rows: 1 * - Parameter - Type - Default - Description * - ``pool`` - ``Pool | DnaPool | ProteinPool`` - *(required)* - Pool to evaluate. * - ``num_cycles`` - ``int`` - ``1`` - Number of complete cycles through the state space. Each cycle visits every state exactly once. * - ``num_seqs`` - ``int | None`` - ``None`` - Exact number of sequences to generate. Overrides ``num_cycles`` when provided. * - ``seed`` - ``int | None`` - ``None`` - Random seed for reproducible output (see examples). * - ``init_state`` - ``int | None`` - ``None`` - Starting state index. ``None`` begins from state 0. * - ``seqs_only`` - ``bool`` - ``False`` - If ``True``, return a plain ``list[str]`` instead of a DataFrame. * - ``discard_null_seqs`` - ``bool`` - ``False`` - If ``True``, skip sequences that were filtered out (``NullSeq``). * - ``max_iterations`` - ``int | None`` - ``None`` - Maximum iterations before stopping (useful with filters that reject most draws). * - ``min_acceptance_rate`` - ``float | None`` - ``None`` - If the acceptance rate drops below this threshold, generation stops early. * - ``attempts_per_rate_assessment`` - ``int`` - ``100`` - Number of draws between acceptance-rate checks. ---- .. note:: Only the most commonly used parameters are shown above. For the full parameter list, see :func:`~poolparty.generate_library` in the :doc:`API Reference `. Examples -------- Basic usage: generate sequences from a scan pool ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Build a mutagenized pool and call ``generate_library`` to collect the output into a DataFrame. .. code-block:: python wt = pp.from_seq("ATCGATCG") pool = pp.mutagenize(wt, num_mutations=1, mode="random") df = pp.generate_library(pool, num_seqs=5) print(df.to_string()) .. raw:: html
print(df.to_string()) — 5 rows
name seq
0 None ATCGGTCG
1 None ATCGAACG
2 None ATCGCTCG
3 None GTCGATCG
4 None ACCGATCG
Controlling output size with ``num_seqs`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pass ``num_seqs=`` to generate an exact number of sequences regardless of the pool's state-space size. .. code-block:: python wt = pp.from_seq("ATCGATCG") pool = pp.mutagenize(wt, num_mutations=1, mode="random") df = pp.generate_library(pool, num_seqs=3) print(len(df)) print(df.to_string()) .. raw:: html
len(df) and print(df.to_string())
3
name seq
0 None ATCGGTCG
1 None ATCGAACG
2 None ATCGCTCG
Reproducible output with ``seed`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pass ``seed=`` to fix the per-row draw for a given pool. The same ``seed`` and the same pool object yield the same rows within one session. After ``pp.init()``, rebuilding the pipeline and calling with the same ``seed`` matches a fresh interpreter run (operation IDs enter the internal seed sequence). .. code-block:: python wt = pp.from_seq("ATCGATCG") pool = pp.mutagenize(wt, num_mutations=1, mode="random") df = pp.generate_library(pool, num_seqs=3, seed=42) print(df.to_string()) .. raw:: html
print(df.to_string()) with seed=42
name seq
0 None ATCGAACG
1 None ACCGATCG
2 None ATCGATCT
Get a plain list with ``seqs_only=True`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When only the sequence strings are needed (e.g. to pass directly to another function), set ``seqs_only=True`` to skip DataFrame construction. .. code-block:: python wt = pp.from_seq("ATCGATCG") pool = pp.mutagenize(wt, num_mutations=1, mode="random") seqs = pp.generate_library(pool, num_seqs=4, seed=7, seqs_only=True) print(seqs) .. raw:: html
print(seqs) with seed=7, seqs_only=True
['ATCGATAG', 'GTCGATCG', 'ATCGAGCG', 'ATCGCTCG']
Chain a full pipeline: mutagenize → filter → generate_library ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compose multiple operations and materialise the result in a single call. .. code-block:: python wt = pp.from_seq("ATCGATCG") mutants = pp.mutagenize(wt, num_mutations=1, mode="random") singles = pp.filter( mutants, lambda s: sum(a != b for a, b in zip(s, "ATCGATCG")) == 1, ) df = pp.generate_library(singles, num_seqs=5, seed=0, discard_null_seqs=True) print(df.to_string()) .. raw:: html
print(df.to_string()) with seed=0
name seq
0 None ATCGGTCG
1 None ATCGAACG
2 None ATCGCTCG
3 None GTCGATCG
4 None ACCGATCG
See :func:`~poolparty.generate_library`.