filter
Retain only the sequences for which a predicate function returns True; all
other sequences are replaced with a NullSeq sentinel.
import poolparty as pp
pp.init()
Note
Rejected sequences are not removed from the state space — they
become NullSeq values that propagate silently through every
downstream operation. By default generate_library still includes
NullSeq rows (as empty values). Pass discard_null_seqs=True to
exclude them from the output.
The predicate receives the tag-free sequence string (region tags are stripped before evaluation).
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Input pool to filter. |
|
|
(required) |
Function taking the clean (tag-free) sequence string; return
|
|
|
|
Optional name for the filter operation. |
|
|
|
Prefix for sequence names in the resulting pool. |
|
|
|
Design card keys to include. Available keys: |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see filter() in the
API Reference.
Examples
Filter by GC content
Keep only sequences whose GC count is at least 3 (GC content ≥ 50 %).
Sequences that fail the predicate become None (a NullSeq sentinel);
pass discard_null_seqs=True to generate_library to exclude them from
the final DataFrame.
seqs = pp.from_seqs(
["AAAAAA", "GCGCGC", "AAACCC", "TTTTTT", "GGCCAA"],
mode="sequential",
)
high_gc = pp.filter(seqs, lambda s: s.count("G") + s.count("C") >= 3)
high_gc.print_library()
GCGCGC
AAACCC
None
GGCCAA
Filter by sequence length
When a pool may contain sequences of varying length, keep only those that are exactly 8 bases long.
seqs = pp.from_seqs(
["ATCG", "ATCGATCG", "GGCC", "TTTTAAAA", "ACG"],
mode="sequential",
)
trimmed = pp.filter(seqs, lambda s: len(s) == 8)
trimmed.print_library()
df = pp.generate_library(trimmed, discard_null_seqs=True)
ATCGATCG
None
TTTTAAAA
None
Exclude sequences containing a restriction site
Remove any 8-mer that contains the EcoRI recognition site GAATTC. (Here
get_kmers uses mode="sequential" with length 8; length 12 is too large
for sequential enumeration under the default state limit.)
pool = pp.get_kmers(8, mode="sequential")
no_ecori = pp.filter(pool, lambda s: "GAATTC" not in s)
no_ecori.print_library()
df = pp.generate_library(no_ecori, num_seqs=6, discard_null_seqs=True)
AAAAAAAC
AAAAAAAG
AAAAAAAT
AAAAAACA ... (65536 total)
Chain: mutagenize then filter by GC content
Generate all single-nucleotide mutants, then keep only those whose GC content is at least 5 out of 8 bases.
wt = pp.from_seq("ATCGATCG")
mutants = pp.mutagenize(wt, num_mutations=1, mode="sequential")
high_gc = pp.filter(mutants, lambda s: s.count("G") + s.count("C") >= 5)
high_gc.print_library(num_seqs=8)
GTCGATCG
None
None
ACCGATCG
AGCGATCG
None
None ... (24 total)
See filter().