sample
Draw a fixed number of sequences from a pool’s state space, optionally with a random seed for reproducibility or with cycling when more sequences are requested than the pool contains.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Input pool to sample from. |
|
|
|
Number of sequences to draw. Provide either |
|
|
|
Explicit list of state indices to select. Overrides |
|
|
|
Random seed for reproducible sampling. |
|
|
|
If |
|
|
|
Prefix for the operation node name in the pool graph. |
|
|
|
Enumeration order when combined with other pools. |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see sample() in the
API Reference.
Examples
Sample 5 Sequences from a 256-Sequence Pool
Draw a small random subset from all 256 4-mers to obtain a manageable representative sample. Without a seed, which sequences appear changes on each evaluation.
kmers = pp.get_kmers(length=4, mode="sequential")
subset = pp.sample(kmers, num_seqs=5)
subset.print_library()
TTGA
ATCG
GGTA
AGGC
Sample with a Fixed Seed for Reproducibility
Provide a seed to guarantee the same subset is selected every time the pipeline is evaluated.
kmers = pp.get_kmers(length=4, mode="sequential")
subset = pp.sample(kmers, num_seqs=5, seed=42)
subset.print_library()
AACG
CACG
ATGC
GTTA
Sample More Sequences Than the Pool Has States (Cycling)
When num_seqs exceeds the pool’s state count and with_replacement=True
(the default), states are resampled with replacement so the requested count is
always honoured.
small = pp.from_seqs(["AAAA", "CCCC", "GGGG"], mode="sequential")
large = pp.sample(small, num_seqs=9, seed=0)
large.print_library()
GGGG
CCCC
AAAA
CCCC
CCCC
GGGG
AAAA
CCCC
Sample from a Stochastic Pool
Use sample on a mutagenized pool to select a reproducible subset of
stochastic draws, combining random mutation with deterministic sampling. The
mutagenized sequences themselves still vary between runs unless the upstream
stochastic pool is seeded.
wt = pp.from_seq("ATCGATCG")
mutants = wt.mutagenize(num_mutations=1)
sampled = pp.sample(mutants, num_seqs=4, seed=7)
sampled.print_library()
ATCGAACG
ATCGCTCG
GTCGATCG
See sample().