score
Evaluate a user-supplied function on each sequence and record the result
as a design card column. The sequence passes through unchanged — score
is a passthrough operation that adds metadata without altering content.
The function receives the clean (tag-stripped) sequence string, or the
clean content of a named region when region is specified.
Compatible with built-in utilities such as pp.calc_gc,
pp.calc_dust, and pp.calc_complexity.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
The Pool or sequence string to score. |
|
|
(required) |
Scoring function |
|
|
|
Design card column name under which the result is stored. |
|
|
|
Region to score. A named tag (str), |
|
|
|
Prefix for auto-generated sequence names. |
|
|
|
Design card keys to include. The available key is the value of
|
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see score() in the
API Reference.
Examples
Custom scoring function
The scoring function takes a sequence string and returns any scalar value. Define it as a regular function so the pattern is explicit.
def gc_fraction(seq):
return (seq.count("G") + seq.count("C")) / len(seq)
pool = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
mode="sequential")
scored = pp.score(pool, gc_fraction, card_key="gc",
cards={"gc": "gc"})
df = scored.generate_library()
| name | seq | gc |
|---|---|---|
| None | AAAA | 0.0 |
| None | ACGT | 0.5 |
| None | GCGC | 1.0 |
| None | CCCC | 1.0 |
| None | ATAT | 0.0 |
The cards parameter controls how the card column is named in the
output. A dict {"gc": "gc"} maps the card key directly to the column
name. A list ["gc"] also works but prefixes the column with the
operation id (e.g., op[1]:score.gc); use the dict form to keep column
names clean.
Built-in scoring functions
PoolParty includes several scoring functions that match the same
(str) -> scalar pattern:
pp.calc_gc— GC fractionpp.calc_complexity— linguistic complexity (0–1)pp.calc_dust— DUST low-complexity score (lower = more complex)
pool = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
mode="sequential")
scored = pp.score(pool, pp.calc_gc, card_key="gc",
cards={"gc": "gc"})
df = scored.generate_library()
| name | seq | gc |
|---|---|---|
| None | AAAA | 0.0 |
| None | ACGT | 0.5 |
| None | GCGC | 1.0 |
| None | CCCC | 1.0 |
| None | ATAT | 0.0 |
Score only a named region
region restricts scoring to the tagged segment; the full sequence
passes through unchanged.
wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
muts = pp.mutagenize(wt, num_mutations=1, region="cre",
mode="random", num_states=5)
scored = pp.score(muts, pp.calc_gc, region="cre", card_key="cre_gc",
cards={"cre_gc": "cre_gc"})
df = scored.generate_library()
| name | seq | cre_gc |
|---|---|---|
| None | AAAA<cre>ATCGGTCG</cre>TTTT | 0.625 |
| None | AAAA<cre>ATCGAACG</cre>TTTT | 0.500 |
| None | AAAA<cre>ATCGCTCG</cre>TTTT | 0.625 |
| None | AAAA<cre>GTCGATCG</cre>TTTT | 0.625 |
| None | AAAA<cre>ACCGATCG</cre>TTTT | 0.625 |
Multiple scores in a pipeline
Chain two score calls to record multiple metrics in one library.
pool = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
mode="sequential")
scored = pp.score(pool, pp.calc_gc, card_key="gc",
cards={"gc": "gc"})
scored = pp.score(scored, pp.calc_complexity, card_key="complexity",
cards={"complexity": "complexity"})
df = scored.generate_library()
| name | seq | gc | complexity |
|---|---|---|---|
| None | AAAA | 0.00 | 0.36 |
| None | ACGT | 0.50 | 1.00 |
| None | GCGC | 1.00 | 0.72 |
| None | CCCC | 1.00 | 0.36 |
| None | ATAT | 0.00 | 0.72 |
See score().