get_barcodes

Generate a pool of DNA barcodes satisfying distance and quality constraints. All barcodes are pre-generated at construction time using a greedy random algorithm, so the resulting pool is a sequential leaf with num_states == num_barcodes.

Constraints available: minimum edit (Levenshtein) distance, minimum Hamming distance (fixed-length only), GC content range, maximum homopolymer run length, and minimum edit distance from a set of user-supplied sequences to avoid (e.g. adapters).

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

num_barcodes

int

(required)

Number of barcodes to generate.

length

int | list[int]

(required)

Barcode length. A single int gives fixed-length barcodes; a list of ints generates variable-length barcodes padded to the maximum length.

length_proportions

list[float] | None

None

Target fraction of each length in length list. Values are normalised to sum to 1. None distributes evenly. Ignored when length is a single int.

min_edit_distance

int | None

None

Minimum Levenshtein distance between any two barcodes. Works for both fixed- and variable-length sets.

min_hamming_distance

int | None

None

Minimum Hamming distance between same-length barcodes. Cannot be combined with variable-length length lists; use min_edit_distance instead.

gc_range

tuple[float, float] | None

None

(min_gc, max_gc) as fractions in [0, 1]. Barcodes outside this range are rejected.

max_homopolymer

int | None

None

Maximum consecutive identical bases allowed. Barcodes with longer runs are rejected.

avoid_sequences

list[str] | None

None

External sequences (e.g. adapters) that barcodes must stay away from. Requires avoid_min_distance.

avoid_min_distance

int | None

None

Minimum edit distance from every sequence in avoid_sequences. Required when avoid_sequences is provided.

padding_char

str

'-'

Character used to pad shorter variable-length barcodes to the maximum length.

padding_side

str

'right'

'right' appends padding; 'left' prepends it.

seed

int | None

None

Random seed for reproducible barcode generation.

max_attempts

int

100000

Maximum candidate attempts before raising a ValueError. Raise this or relax constraints if generation fails.

name

str | None

None

Operation name.

style

str | None

None

Display style applied to barcode sequences.

iter_order

int | None

None

Dimension-name ordering for downstream multi-pool iteration.

prefix

str | None

None

Prefix for auto-generated sequence names.

cards

list | dict | None

None

Design card keys to include. Available keys: 'barcode_index', 'barcode'.


Note

Only the most commonly used parameters are shown above. For the full parameter list, see get_barcodes() in the API Reference.

Examples

Basic fixed-length barcodes

Generate 10 length-8 barcodes with minimum edit distance 3.

barcodes = pp.get_barcodes(
    num_barcodes=10,
    length=8,
    min_edit_distance=3,
    seed=42,
)
barcodes.print_library()
barcodes: seq_length=8, num_states=10 AAGCCCAA
TAAACCAC
TCTGACTG
GCCGAATA
GGGATATA
GGCAACGA
CATGTGCG
GCGACCCT
TGCGACAG
TGACGCTT

GC content and homopolymer constraints

Restrict GC content to 40–60 % and disallow runs of 3 or more identical bases.

barcodes = pp.get_barcodes(
    num_barcodes=20,
    length=10,
    min_edit_distance=3,
    gc_range=(0.4, 0.6),
    max_homopolymer=2,
    seed=0,
)
barcodes.print_library()
barcodes: seq_length=10, num_states=20 TTAGTTGTGC
TAGTGCTTGA
AATATGCGAC
GAGCGTATGC
CAATGCCTGT
... (20 total)

Avoiding adapter sequences

Keep all barcodes at least edit distance 4 from a set of adapter sequences to prevent ligation artefacts.

adapters = ["AGATCGGAAG", "CTGTCTCTTA"]
barcodes = pp.get_barcodes(
    num_barcodes=50,
    length=8,
    min_edit_distance=3,
    avoid_sequences=adapters,
    avoid_min_distance=4,
    seed=1,
)
barcodes.print_library()
barcodes: seq_length=8, num_states=50 CAGATTTT
GCAGAAAA
TCTACTTC
GCCTGATA
CGAGTCGG
... (50 total)

Variable-length barcodes

Mix 6-mer and 8-mer barcodes in a 1:1 ratio; shorter barcodes are right-padded with -. To obtain the unpadded sequences, apply clear_gaps() before calling generate_library().

barcodes = pp.get_barcodes(
    num_barcodes=10,
    length=[6, 8],
    length_proportions=[0.5, 0.5],
    min_edit_distance=3,
    seed=7,
    prefix="bc",
)
barcodes.print_library()
barcodes: seq_length=8, num_states=10 CTAAAGAC
ATTACA--
AACATACA
GTCAGC--
CGAAAC--
TGTTGGCC
AGTGTG--
ATCGCT--
AAGGGTTA
GTAAGTGT
cleaned = barcodes.clear_gaps()
df = cleaned.generate_library()
df — 10 rows × 2 columns
nameseq
bc_0CTAAAGAC
bc_1ATTACA
bc_2AACATACA
bc_3GTCAGC
bc_4CGAAAC
bc_5TGTTGGCC
bc_6AGTGTG
bc_7ATCGCT
bc_8AAGGGTTA
bc_9GTAAGTGT

See get_barcodes().