sample
Draw a fixed number of sequences from a pool’s state space, optionally with a random seed for reproducibility or with cycling when more sequences are requested than the pool contains.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Input pool to sample from. |
|
|
|
Number of sequences to draw. Provide either |
|
|
|
Explicit list of state indices to select. Overrides |
|
|
|
Random seed for reproducible sampling. |
|
|
|
If |
|
|
|
Prefix for the operation node name in the pool graph. |
|
|
|
Iteration priority for downstream multi-pool iteration. |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see sample() in the
API Reference.
Examples
Sample 5 Sequences from a 256-Sequence Pool
Draw a small random subset from all 256 4-mers to obtain a manageable representative sample. Without a seed, which sequences appear changes on each evaluation.
import poolparty as pp
pp.init()
kmers = pp.get_kmers(length=4, mode="sequential")
subset = pp.sample(kmers, num_seqs=5)
subset.print_library()
TTGA
ATCG
GGTA
AGGC
Sample with a Fixed Seed for Reproducibility
Provide a seed to guarantee the same subset is selected every time the pipeline is evaluated.
import poolparty as pp
pp.init()
kmers = pp.get_kmers(length=4, mode="sequential")
subset = pp.sample(kmers, num_seqs=5, seed=42)
subset.print_library()
AACG
CACG
ATGC
GTTA
Sample More Sequences Than the Pool Has States (Cycling)
When num_seqs exceeds the pool’s state count and with_replacement=True
(the default), states are resampled with replacement so the requested count is
always honoured.
import poolparty as pp
pp.init()
small = pp.from_seqs(["AAAA", "CCCC", "GGGG"], mode="sequential")
large = pp.sample(small, num_seqs=9, seed=0)
large.print_library()
GGGG
CCCC
AAAA
CCCC
CCCC
GGGG
AAAA
CCCC
Sample from a Stochastic Pool
Use sample on a mutagenized pool to select a reproducible subset of
stochastic draws, combining random mutation with deterministic sampling. The
mutagenized sequences themselves still vary between runs unless the upstream
stochastic pool is seeded.
import poolparty as pp
pp.init()
wt = pp.from_seq("ATCGATCG")
mutants = wt.mutagenize(num_mutations=1)
sampled = pp.sample(mutants, num_seqs=4, seed=7)
sampled.print_library()
ATCGTTCG
ATGGATCG
GTCGATCG
See sample().