from_motif

Sample sequences from a position-probability matrix (PPM), supplied as a pandas DataFrame with base columns (A, C, G, T) and one row per position. Each row is normalised automatically so values need not sum exactly to 1. Sampling is always stochastic; sequential enumeration is not supported.

import poolparty as pp
pp.init()

Parameters

Parameter	Type	Default	Description
`prob_df`	`DataFrame`	(required)	`pandas.DataFrame` with columns in `{'A','C','G','T'}` and one row per sequence position. Missing columns are treated as probability 0. Each row is normalised to sum to 1.
`pool`	`Pool \| str \| None`	`None`	Background pool or sequence string. When provided with `region`, the sampled sequence replaces the content of that region.
`region`	`str \| list \| None`	`None`	Region to replace in `pool`: a marker name or `[start, stop]` interval. Required when `pool` is provided.
`prefix`	`str \| None`	`None`	Prefix for auto-generated sequence names.
`mode`	`str`	`'random'`	Must be `'random'`; sequential enumeration is not supported.
`num_states`	`int \| None`	`None`	Number of sequences to sample. `None` means a single independent draw each call.
`iter_order`	`int \| None`	`None`	Enumeration order when combined with other pools.
`style`	`str \| None`	`None`	Display style applied to every generated sequence.
`cards`	`dict \| list \| None`	`None`	Design card columns to include in library output.

Note

Only the most commonly used parameters are shown above. For the full parameter list, see from_motif() in the API Reference.

Examples

Uniform motif

All positions equal probability — equivalent to sampling random sequences.

import pandas as pd

pfm = pd.DataFrame(
    {"A": [0.25, 0.25], "C": [0.25, 0.25],
     "G": [0.25, 0.25], "T": [0.25, 0.25]}
)
pool = pp.from_motif(pfm, num_states=4)
pool.print_library()

pool: seq_length=2, num_states=4 GC
CC
GC
TC

Biased motif

Position 0 strongly prefers A (80%), position 1 prefers C (80%). Draws cluster near the consensus AC but vary stochastically.

import pandas as pd

pfm = pd.DataFrame(
    {"A": [0.80, 0.05], "C": [0.05, 0.80],
     "G": [0.10, 0.10], "T": [0.05, 0.05]}
)
pool = pp.from_motif(pfm, num_states=5)
pool.print_library()

pool: seq_length=2, num_states=5 AC
AC
AC
AC
TC

Fixing `num_states` for a reproducible library

Set num_states to pre-commit the number of draws, then fix the seed in generate_library() for fully reproducible output.

import pandas as pd

pfm4 = pd.DataFrame({
    "A": [0.8, 0.05, 0.05, 0.1],
    "C": [0.05, 0.8, 0.05, 0.1],
    "G": [0.1,  0.1, 0.8,  0.1],
    "T": [0.05, 0.05, 0.1, 0.7],
})
pool = pp.from_motif(pfm4, num_states=5)
pool.print_library()
df   = pool.generate_library(seed=42)

pool: seq_length=4, num_states=5 ACAA
ACGG
ACGT
ACGT
TCGT

Sampling into a named region

Provide pool and region to draw motif sequences into a fixed context.

import pandas as pd

pfm = pd.DataFrame(
    {"A": [0.7, 0.1], "C": [0.1, 0.7],
     "G": [0.1, 0.1], "T": [0.1, 0.1]}
)
bg   = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = pp.from_motif(pfm, pool=bg, region="insert", num_states=4)
pool.print_library()

pool: seq_length=10, num_states=4 GCGC<insert>AC</insert>GCGC
GCGC<insert>AC</insert>GCGC
GCGC<insert>CA</insert>GCGC
GCGC<insert>AC</insert>GCGC

Pool method shorthand

When inserting into a region, the same operation is available as a method on any DnaPool. The call bg.insert_from_motif(...) is equivalent to pp.from_motif(..., pool=bg) — it simply passes self as the background pool.

import pandas as pd

pfm = pd.DataFrame(
    {"A": [0.7, 0.1], "C": [0.1, 0.7],
     "G": [0.1, 0.1], "T": [0.1, 0.1]}
)
bg   = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = bg.insert_from_motif(pfm, region="insert", num_states=4)
pool.print_library()