filter ====== Retain only the sequences for which a predicate function returns ``True``; all other sequences are replaced with a ``NullSeq`` sentinel. Upstream pools are often built with ``mode="sequential"`` (deterministic enumeration) or ``mode="random"`` (stochastic draws), depending on whether you need a fixed walk through states or sampled variants. .. code-block:: python import poolparty as pp pp.init() .. note:: Rejected sequences are **not removed** from the state space — they become ``NullSeq`` values that propagate silently through every downstream operation. By default ``generate_library`` still includes ``NullSeq`` rows (as empty values). Pass ``discard_null_seqs=True`` to exclude them from the output. The predicate receives the **tag-free** sequence string (region tags are stripped before evaluation). ---- Parameters ---------- .. list-table:: :widths: auto :header-rows: 1 * - Parameter - Type - Default - Description * - ``pool`` - ``Pool | DnaPool | ProteinPool`` - *(required)* - Input pool to filter. * - ``predicate`` - ``Callable[[str], bool]`` - *(required)* - Function taking the clean (tag-free) sequence string; return ``True`` to keep the sequence. * - ``name`` - ``str | None`` - ``None`` - Optional name for the filter operation. * - ``prefix`` - ``str | None`` - ``None`` - Prefix for sequence names in the resulting pool. * - ``cards`` - ``list[str] | dict[str, str] | None`` - ``None`` - Design card keys to include. Available keys: ``'passed'``. ---- .. note:: Only the most commonly used parameters are shown above. For the full parameter list, see :func:`~poolparty.filter` in the :doc:`API Reference `. Examples -------- Filter 6-mers by GC content ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Keep only 6-mers whose GC count is at least 3 (GC content ≥ 50 %). .. code-block:: python pool = pp.get_kmers(6, mode="sequential") high_gc = pp.filter(pool, lambda s: s.count("G") + s.count("C") >= 3) high_gc.print_library() df = pp.generate_library(high_gc, num_seqs=6, discard_null_seqs=True) .. raw:: html