score

Evaluate a user-supplied function on each sequence and record the result as a design card column. The sequence passes through unchanged — score is a passthrough operation that adds metadata without altering content. The function receives the clean (tag-stripped) sequence string, or the clean content of a named region when region is specified.

Compatible with built-in utilities such as pp.calc_gc, pp.calc_dust, and pp.calc_complexity.

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

pool

Pool | str

(required)

The Pool or sequence string to score.

fn

callable

(required)

Scoring function (str) -> any. Receives a clean (tag-free) sequence string and returns any scalar value to record.

card_key

str

'score'

Design card column name under which the result is stored.

region

str | list | None

None

Region to score. A named tag (str), [start, stop] interval, or None to score the full sequence.

prefix

str | None

None

Prefix for auto-generated sequence names.

cards

list | dict | None

None

Design card keys to include. The available key is the value of card_key (default 'score').


Note

Only the most commonly used parameters are shown above. For the full parameter list, see score() in the API Reference.

Examples

Custom scoring function

The scoring function takes a sequence string and returns any scalar value. Define it as a regular function so the pattern is explicit.

def gc_fraction(seq):
    return (seq.count("G") + seq.count("C")) / len(seq)

pool   = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
                      mode="sequential")
scored = pp.score(pool, gc_fraction, card_key="gc",
                  cards={"gc": "gc"})
df     = scored.generate_library()
df — 5 rows × 3 columns
nameseqgc
NoneAAAA0.0
NoneACGT0.5
NoneGCGC1.0
NoneCCCC1.0
NoneATAT0.0

The cards parameter controls how the card column is named in the output. A dict {"gc": "gc"} maps the card key directly to the column name. A list ["gc"] also works but prefixes the column with the operation id (e.g., op[1]:score.gc); use the dict form to keep column names clean.

Built-in scoring functions

PoolParty includes several scoring functions that match the same (str) -> scalar pattern:

  • pp.calc_gc — GC fraction

  • pp.calc_complexity — linguistic complexity (0–1)

  • pp.calc_dust — DUST low-complexity score (lower = more complex)

pool   = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
                      mode="sequential")
scored = pp.score(pool, pp.calc_gc, card_key="gc",
                  cards={"gc": "gc"})
df     = scored.generate_library()
df — 5 rows × 3 columns
nameseqgc
NoneAAAA0.0
NoneACGT0.5
NoneGCGC1.0
NoneCCCC1.0
NoneATAT0.0

Score only a named region

region restricts scoring to the tagged segment; the full sequence passes through unchanged.

wt     = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
muts   = pp.mutagenize(wt, num_mutations=1, region="cre",
                      mode="random", num_states=5)
scored = pp.score(muts, pp.calc_gc, region="cre", card_key="cre_gc",
                 cards={"cre_gc": "cre_gc"})
df     = scored.generate_library()
df — 5 rows × 3 columns
nameseqcre_gc
NoneAAAA<cre>ATCGGTCG</cre>TTTT0.625
NoneAAAA<cre>ATCGAACG</cre>TTTT0.500
NoneAAAA<cre>ATCGCTCG</cre>TTTT0.625
NoneAAAA<cre>GTCGATCG</cre>TTTT0.625
NoneAAAA<cre>ACCGATCG</cre>TTTT0.625

Multiple scores in a pipeline

Chain two score calls to record multiple metrics in one library.

pool   = pp.from_seqs(["AAAA", "ACGT", "GCGC", "CCCC", "ATAT"],
                      mode="sequential")
scored = pp.score(pool,   pp.calc_gc,        card_key="gc",
                  cards={"gc": "gc"})
scored = pp.score(scored, pp.calc_complexity, card_key="complexity",
                  cards={"complexity": "complexity"})
df     = scored.generate_library()
df — 5 rows × 4 columns
nameseqgccomplexity
NoneAAAA0.000.36
NoneACGT0.501.00
NoneGCGC1.000.72
NoneCCCC1.000.36
NoneATAT0.000.72

See score().