score

Evaluate a user-supplied function on each sequence and record the result as a design card column. The sequence passes through unchanged — score is a passthrough operation that adds metadata without altering content. The function receives the clean (tag-stripped) sequence string, or the clean content of a named region when region is specified.

Compatible with built-in utilities such as pp.calc_gc, pp.calc_dust, and pp.calc_complexity.

import poolparty as pp
pp.init()

Parameters

Parameter	Type	Default	Description
`pool`	`Pool \| str`	(required)	The Pool or sequence string to score.
`fn`	`callable`	(required)	Scoring function `(str) -> any`. Receives a clean (tag-free) sequence string and returns any scalar value to record.
`card_key`	`str`	`'score'`	Design card column name under which the result is stored.
`region`	`str \| list \| None`	`None`	Region to score. A named tag (str), `[start, stop]` interval, or `None` to score the full sequence.
`prefix`	`str \| None`	`None`	Prefix for auto-generated sequence names.
`cards`	`list \| dict \| None`	`None`	Design card keys to include. The available key is the value of `card_key` (default `'score'`).

Note

Only the most commonly used parameters are shown above. For the full parameter list, see score() in the API Reference.

Examples

Custom scoring function

The scoring function takes a sequence string and returns any scalar value. Define it as a regular function so the pattern is explicit.

def gc_fraction(seq):
    return (seq.count("G") + seq.count("C")) / len(seq)

pool   = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
                      mode="sequential")
scored = pp.score(pool, gc_fraction, card_key="gc",
                  cards={"gc": "gc"})
df     = scored.generate_library()

df — 5 rows × 3 columns

name	seq	gc
None	AAAA	0.0
None	ACGT	0.5
None	GCGC	1.0
None	CCCC	1.0
None	ATAT	0.0

The cards parameter controls how the card column is named in the output. A dict {"gc": "gc"} maps the card key directly to the column name. A list ["gc"] also works but prefixes the column with the operation id (e.g., op[1]:score.gc); use the dict form to keep column names clean.

Built-in scoring functions

PoolParty includes several scoring functions that match the same (str) -> scalar pattern:

pp.calc_gc — GC fraction
pp.calc_complexity — linguistic complexity (0–1)
pp.calc_dust — DUST low-complexity score (lower = more complex)

pool   = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
                      mode="sequential")
scored = pp.score(pool, pp.calc_gc, card_key="gc",
                  cards={"gc": "gc"})
df     = scored.generate_library()

df — 5 rows × 3 columns

name	seq	gc
None	AAAA	0.0
None	ACGT	0.5
None	GCGC	1.0
None	CCCC	1.0
None	ATAT	0.0

Score only a named region

region restricts scoring to the tagged segment; the full sequence passes through unchanged.

wt     = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
muts   = pp.mutagenize(wt, num_mutations=1, region="cre",
                      mode="random", num_states=5)
scored = pp.score(muts, pp.calc_gc, region="cre", card_key="cre_gc",
                 cards={"cre_gc": "cre_gc"})
df     = scored.generate_library()

df — 5 rows × 3 columns

name	seq	cre_gc
None	AAAA<cre>ATCGGTCG</cre>TTTT	0.625
None	AAAA<cre>ATCGAACG</cre>TTTT	0.500
None	AAAA<cre>ATCGCTCG</cre>TTTT	0.625
None	AAAA<cre>GTCGATCG</cre>TTTT	0.625
None	AAAA<cre>ACCGATCG</cre>TTTT	0.625

Multiple scores in a pipeline

Chain two score calls to record multiple metrics in one library.

pool   = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
                      mode="sequential")
scored = pp.score(pool,   pp.calc_gc,        card_key="gc",
                  cards={"gc": "gc"})
scored = pp.score(scored, pp.calc_complexity, card_key="complexity",
                  cards={"complexity": "complexity"})
df     = scored.generate_library()

df — 5 rows × 4 columns

name	seq	gc	complexity
None	AAAA	0.00	0.36
None	ACGT	0.50	1.00
None	GCGC	1.00	0.72
None	CCCC	1.00	0.36
None	ATAT	0.00	0.72

See score().