get_kmers
Enumerate every k-mer of a given length over the DNA alphabet (A, C, G, T).
By default the pool samples uniformly at random; pass mode='sequential'
to iterate through all 4k k-mers in lexicographic order.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
k-mer length. Total possible k-mers = 4length. |
|
|
|
Background pool. When provided with |
|
|
|
Region to replace in |
|
|
|
Display style applied to every k-mer. |
|
|
|
|
|
|
|
Prefix for auto-generated sequence names. |
|
|
|
|
|
|
|
Cap on total states. With |
|
|
|
Dimension-name ordering for downstream multi-pool iteration. |
|
|
|
Design card columns to include in library output. |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see get_kmers() in the
API Reference.
Examples
All dinucleotides (length=2, sequential)
mode='sequential' enumerates all 16 dinucleotides in lexicographic order.
pool = pp.get_kmers(length=2, mode="sequential")
pool.print_library()
AC
AG
AT
CA ... (16 total)
Random subset of 4-mers with num_states
Cap a large k-mer space using num_states in random mode to draw a
representative subset without enumerating all 256 4-mers.
pool = pp.get_kmers(length=4, mode="random", num_states=8)
pool.print_library()
TCAC
AGCC
GTTC
ATTC
TTAA
GGAG
TAAG
Lowercase k-mers
case='lower' produces lowercase output, useful for visual distinction
when k-mers are joined with uppercase flanking sequences.
pool = pp.get_kmers(length=2, mode="sequential", case="lower")
pool.print_library()
ac
ag
at
ca ... (16 total)
Inserting k-mers into a named region
Provide pool and region to place every k-mer inside a fixed context,
creating a combinatorial library in one step.
bg = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = pp.get_kmers(length=2, mode="sequential", pool=bg, region="insert")
pool.print_library()
GCGC<insert>AC</insert>GCGC
GCGC<insert>AG</insert>GCGC
GCGC<insert>AT</insert>GCGC
GCGC<insert>CA</insert>GCGC ... (16 total)
Pool method shorthand
When inserting into a region, the same operation is available as a method
on any DnaPool. The call bg.insert_kmers(...) is equivalent to
pp.get_kmers(..., pool=bg) — it simply passes self as the
background pool.
bg = pp.from_seq("GCGC<insert>XX</insert>GCGC")
pool = bg.insert_kmers(length=2, region="insert", mode="sequential")
pool.print_library()
GCGC<insert>AC</insert>GCGC
GCGC<insert>AG</insert>GCGC
GCGC<insert>AT</insert>GCGC
GCGC<insert>CA</insert>GCGC ... (16 total)
See get_kmers() and insert_kmers().