materialize

Eagerly generate sequences from a pool and cache them in a new, standalone pool whose state space is exactly the set of stored sequences. The resulting pool is independent of its parent pools, so it can be used as a cheap starting point for any number of independent downstream pipelines.

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

pool

Pool

(required)

Input pool to materialize.

num_seqs

int | None

None

Number of sequences to generate and cache. Provide either num_seqs or num_cycles.

num_cycles

int | None

None

Number of complete cycles through the state space.

seed

int | None

None

Random seed for reproducible generation.

discard_null_seqs

bool

True

If True, skip filtered-out (NullSeq) sequences.

max_iterations

int | None

None

Maximum iterations before stopping (useful with filters that reject most draws).

min_acceptance_rate

float | None

None

If the acceptance rate drops below this threshold, generation stops early.

attempts_per_rate_assessment

int

100

Number of draws between acceptance-rate checks.

name

str | None

None

Name for the materialized pool.

prefix

str | None

None

Prefix for the operation node name in the pool graph.

cards

dict | list | None

None

Design card columns to include in library output.


Note

Only the most commonly used parameters are shown above. For the full parameter list, see materialize() in the API Reference.

Examples

Materialize before applying downstream scans

Pre-compute an expensive mutagenize result once and reuse it across multiple scan operations without re-running the mutation logic each time.

wt      = pp.from_seq("ATCGATCG")
mutants = pp.mutagenize(wt, num_mutations=1)

# Freeze 20 mutants into a standalone pool
cached  = pp.materialize(mutants, num_seqs=20, seed=42)

# Apply different downstream scans to the same cached pool
scan_a  = pp.deletion_scan(cached, deletion_length=2)
scan_b  = pp.mutagenize(cached, num_mutations=1)

df_a    = pp.generate_library(scan_a, num_seqs=6)
df_b    = pp.generate_library(scan_b, num_seqs=6)

cached.print_library()
cached: seq_length=8, num_states=20 ATCGAACG
ACCGATCG
ATCGATCT
ATCGACCG
ATCGAGCG
... (20 total)

Reproducible caching with seed

Pass seed= so that re-running the same script produces the same materialized pool every time.

wt     = pp.from_seq("ATCGATCG")
pool   = pp.mutagenize(wt, num_mutations=1, mode="random")
cached = pp.materialize(pool, num_seqs=5, seed=0)
cached.print_library()
cached: seq_length=8, num_states=5 ATCGGTCG
ATCGAACG
ATCGCTCG
GTCGATCG
ACCGATCG

Materialize after filtering

Combine filter with materialize to lock in the accepted sequences. The materialized pool contains only the sequences that passed the predicate, with NullSeq entries already discarded.

wt      = pp.from_seq("ATCGATCG")
mutants = pp.mutagenize(wt, num_mutations=1, mode="random", num_states=20)
passed  = pp.filter(mutants, lambda s: s.count("G") + s.count("C") >= 4)
cached  = pp.materialize(passed, num_seqs=5, seed=0, discard_null_seqs=True)
cached.print_library()
cached: seq_length=8, num_states=5 ATCGGTCG
ATCGAACG
ATCGCTCG
GTCGATCG
ACCGATCG

See materialize() or materialize().