filter
Retain only the sequences for which a predicate function returns True; all
other sequences are replaced with a NullSeq sentinel. Upstream pools are
often built with mode="sequential" (deterministic enumeration) or
mode="random" (stochastic draws), depending on whether you need a fixed
walk through states or sampled variants.
import poolparty as pp
pp.init()
Note
Rejected sequences are not removed from the state space — they
become NullSeq values that propagate silently through every
downstream operation. By default generate_library still includes
NullSeq rows (as empty values). Pass discard_null_seqs=True to
exclude them from the output.
The predicate receives the tag-free sequence string (region tags are stripped before evaluation).
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Input pool to filter. |
|
|
(required) |
Function taking the clean (tag-free) sequence string; return
|
|
|
|
Optional name for the filter operation. |
|
|
|
Prefix for sequence names in the resulting pool. |
|
|
|
Design card keys to include. Available keys: |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see filter() in the
API Reference.
Examples
Filter 6-mers by GC content
Keep only 6-mers whose GC count is at least 3 (GC content ≥ 50 %).
pool = pp.get_kmers(6, mode="sequential")
high_gc = pp.filter(pool, lambda s: s.count("G") + s.count("C") >= 3)
high_gc.print_library()
df = pp.generate_library(high_gc, num_seqs=6, discard_null_seqs=True)
None
None
None
None ... (4096 total)
Filter by sequence length
When a pool may contain sequences of varying length, keep only those that are exactly 8 bases long.
seqs = pp.from_seqs(
["ATCG", "ATCGATCG", "GGCC", "TTTTAAAA", "ACG"],
mode="sequential",
)
trimmed = pp.filter(seqs, lambda s: len(s) == 8)
trimmed.print_library()
df = pp.generate_library(trimmed, discard_null_seqs=True)
ATCGATCG
None
TTTTAAAA
None
Exclude sequences containing a restriction site
Remove any 8-mer that contains the EcoRI recognition site GAATTC. (Here
get_kmers uses mode="sequential" with length 8; length 12 is too large
for sequential enumeration under the default state limit.)
pool = pp.get_kmers(8, mode="sequential")
no_ecori = pp.filter(pool, lambda s: "GAATTC" not in s)
no_ecori.print_library()
df = pp.generate_library(no_ecori, num_seqs=6, discard_null_seqs=True)
AAAAAAAG
AAAAAAAT
AAAAAACA
AAAAAACC ... (65536 total)
Chain: mutagenize then filter for single-mutant sequences
Build single-point mutants of a wild-type sequence, then keep only those that differ from the wild type at exactly one position.
wt = pp.from_seq("ATCGATCG")
mutants = pp.mutagenize(wt, num_mutations=1, mode="random")
singles = pp.filter(
mutants,
lambda s: sum(a != b for a, b in zip(s, "ATCGATCG")) == 1,
)
singles.print_library()
df = pp.generate_library(singles, num_seqs=5, discard_null_seqs=True)
See filter().