from_motif
Sample sequences from a position-probability matrix (PPM), supplied as a pandas DataFrame with base columns (A, C, G, T) and one row per position. Each row is normalised automatically so values need not sum exactly to 1. Sampling is always stochastic; sequential enumeration is not supported.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
|
|
|
|
Background pool or sequence string. When provided with |
|
|
|
Region to replace in |
|
|
|
Prefix for auto-generated sequence names. |
|
|
|
Must be |
|
|
|
Number of sequences to sample. |
|
|
|
Enumeration order when combined with other pools. |
|
|
|
Display style applied to every generated sequence. |
|
|
|
Design card columns to include in library output. |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see from_motif() in the
API Reference.
Examples
Uniform motif
All positions equal probability — equivalent to sampling random sequences.
import pandas as pd
pfm = pd.DataFrame(
{"A": [0.25, 0.25], "C": [0.25, 0.25],
"G": [0.25, 0.25], "T": [0.25, 0.25]}
)
pool = pp.from_motif(pfm, num_states=4)
pool.print_library()
CC
GC
TC
Biased motif
Position 0 strongly prefers A (80%), position 1 prefers C (80%).
Draws cluster near the consensus AC but vary stochastically.
import pandas as pd
pfm = pd.DataFrame(
{"A": [0.80, 0.05], "C": [0.05, 0.80],
"G": [0.10, 0.10], "T": [0.05, 0.05]}
)
pool = pp.from_motif(pfm, num_states=5)
pool.print_library()
AC
AC
AC
TC
Fixing num_states for a reproducible library
Set num_states to pre-commit the number of draws, then fix the seed in
generate_library() for fully reproducible output.
import pandas as pd
pfm4 = pd.DataFrame({
"A": [0.8, 0.05, 0.05, 0.1],
"C": [0.05, 0.8, 0.05, 0.1],
"G": [0.1, 0.1, 0.8, 0.1],
"T": [0.05, 0.05, 0.1, 0.7],
})
pool = pp.from_motif(pfm4, num_states=5)
pool.print_library()
df = pool.generate_library(seed=42)
ACGG
ACGT
ACGT
TCGT
Sampling into a named region
Provide pool and region to draw motif sequences into a fixed context.
import pandas as pd
pfm = pd.DataFrame(
{"A": [0.7, 0.1], "C": [0.1, 0.7],
"G": [0.1, 0.1], "T": [0.1, 0.1]}
)
bg = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = pp.from_motif(pfm, pool=bg, region="insert", num_states=4)
pool.print_library()
GCGC<insert>AC</insert>GCGC
GCGC<insert>CA</insert>GCGC
GCGC<insert>AC</insert>GCGC
Pool method shorthand
When inserting into a region, the same operation is available as a method
on any DnaPool. The call bg.insert_from_motif(...) is equivalent
to pp.from_motif(..., pool=bg) — it simply passes self as the
background pool.
import pandas as pd
pfm = pd.DataFrame(
{"A": [0.7, 0.1], "C": [0.1, 0.7],
"G": [0.1, 0.1], "T": [0.1, 0.1]}
)
bg = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = bg.insert_from_motif(pfm, region="insert", num_states=4)
pool.print_library()
GCGC<insert>AC</insert>GCGC
GCGC<insert>CA</insert>GCGC
GCGC<insert>AC</insert>GCGC
See from_motif() and insert_from_motif().