filter ====== Retain only the sequences for which a predicate function returns ``True``; all other sequences are replaced with a ``NullSeq`` sentinel. .. code-block:: python import poolparty as pp pp.init() .. note:: Rejected sequences are **not removed** from the state space — they become ``NullSeq`` values that propagate silently through every downstream operation. By default ``generate_library`` still includes ``NullSeq`` rows (as empty values). Pass ``discard_null_seqs=True`` to exclude them from the output. The predicate receives the **tag-free** sequence string (region tags are stripped before evaluation). ---- Parameters ---------- .. list-table:: :widths: auto :header-rows: 1 * - Parameter - Type - Default - Description * - ``pool`` - ``Pool | DnaPool | ProteinPool`` - *(required)* - Input pool to filter. * - ``predicate`` - ``Callable[[str], bool]`` - *(required)* - Function taking the clean (tag-free) sequence string; return ``True`` to keep the sequence. * - ``name`` - ``str | None`` - ``None`` - Optional name for the filter operation. * - ``prefix`` - ``str | None`` - ``None`` - Prefix for sequence names in the resulting pool. * - ``cards`` - ``list[str] | dict[str, str] | None`` - ``None`` - Design card keys to include. Available keys: ``'passed'``. ---- .. note:: Only the most commonly used parameters are shown above. For the full parameter list, see :func:`~poolparty.filter` in the :doc:`API Reference `. Examples -------- Filter by GC content ~~~~~~~~~~~~~~~~~~~~~ Keep only sequences whose GC count is at least 3 (GC content ≥ 50 %). Sequences that fail the predicate become ``None`` (a ``NullSeq`` sentinel); pass ``discard_null_seqs=True`` to ``generate_library`` to exclude them from the final DataFrame. .. code-block:: python seqs = pp.from_seqs( ["AAAAAA", "GCGCGC", "AAACCC", "TTTTTT", "GGCCAA"], mode="sequential", ) high_gc = pp.filter(seqs, lambda s: s.count("G") + s.count("C") >= 3) high_gc.print_library() .. raw:: html