get_kmers

Enumerate every k-mer of a given length over the DNA alphabet (A, C, G, T). By default the pool samples uniformly at random; pass mode='sequential' to iterate through all 4k k-mers in lexicographic order.

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

length

int

(required)

k-mer length. Total possible k-mers = 4length.

pool

Pool | None

None

Background pool. When provided with region, each k-mer replaces the content of that region.

region

str | None

None

Region to replace in pool. Required when pool is provided.

style

str | None

None

Display style applied to every k-mer.

case

str

'upper'

'upper' (default) or 'lower' output case.

prefix

str | None

None

Prefix for auto-generated sequence names.

mode

str

'random'

'sequential' iterates all 4length k-mers in lexicographic order; 'random' samples uniformly at random.

num_states

int | None

None

Cap on total states. With mode='sequential' takes the first N.

iter_order

int | None

None

Dimension-name ordering for downstream multi-pool iteration.

cards

dict | list | None

None

Design card columns to include in library output.


Note

Only the most commonly used parameters are shown above. For the full parameter list, see get_kmers() in the API Reference.

Examples

All dinucleotides (length=2, sequential)

mode='sequential' enumerates all 16 dinucleotides in lexicographic order.

pool = pp.get_kmers(length=2, mode="sequential")
pool.print_library()
pool: seq_length=2, num_states=16 AA
AC
AG
AT
CA ... (16 total)

Random subset of 4-mers with num_states

Cap a large k-mer space using num_states in random mode to draw a representative subset without enumerating all 256 4-mers.

pool = pp.get_kmers(length=4, mode="random", num_states=8)
pool.print_library()
pool: seq_length=4, num_states=8 TGGC
TCAC
AGCC
GTTC
ATTC
TTAA
GGAG
TAAG

Lowercase k-mers

case='lower' produces lowercase output, useful for visual distinction when k-mers are joined with uppercase flanking sequences.

pool = pp.get_kmers(length=2, mode="sequential", case="lower")
pool.print_library()
pool: seq_length=2, num_states=16 aa
ac
ag
at
ca ... (16 total)

Inserting k-mers into a named region

Provide pool and region to place every k-mer inside a fixed context, creating a combinatorial library in one step.

bg   = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = pp.get_kmers(length=2, mode="sequential", pool=bg, region="insert")
pool.print_library()
pool: seq_length=10, num_states=16 GCGC<insert>AA</insert>GCGC
GCGC<insert>AC</insert>GCGC
GCGC<insert>AG</insert>GCGC
GCGC<insert>AT</insert>GCGC
GCGC<insert>CA</insert>GCGC ... (16 total)

Pool method shorthand

When inserting into a region, the same operation is available as a method on any DnaPool. The call bg.insert_kmers(...) is equivalent to pp.get_kmers(..., pool=bg) — it simply passes self as the background pool.

bg   = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = bg.insert_kmers(length=2, region="insert", mode="sequential")
pool.print_library()
pool: seq_length=10, num_states=16 GCGC<insert>AA</insert>GCGC
GCGC<insert>AC</insert>GCGC
GCGC<insert>AG</insert>GCGC
GCGC<insert>AT</insert>GCGC
GCGC<insert>CA</insert>GCGC ... (16 total)

See get_kmers() and insert_kmers().