get_barcodes
Generate a pool of DNA barcodes satisfying distance and quality
constraints. All barcodes are pre-generated at construction time using a
greedy random algorithm, so the resulting pool is a sequential leaf with
num_states == num_barcodes.
Constraints available: minimum edit (Levenshtein) distance, minimum Hamming distance (fixed-length only), GC content range, maximum homopolymer run length, and minimum edit distance from a set of user-supplied sequences to avoid (e.g. adapters).
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Number of barcodes to generate. |
|
|
(required) |
Barcode length. A single |
|
|
|
Target fraction of each length in |
|
|
|
Minimum Levenshtein distance between any two barcodes. Works for both fixed- and variable-length sets. |
|
|
|
Minimum Hamming distance between same-length barcodes. Cannot be
combined with variable-length |
|
|
|
|
|
|
|
Maximum consecutive identical bases allowed. Barcodes with longer runs are rejected. |
|
|
|
External sequences (e.g. adapters) that barcodes must stay away
from. Requires |
|
|
|
Minimum edit distance from every sequence in |
|
|
|
Character used to pad shorter variable-length barcodes to the maximum length. |
|
|
|
|
|
|
|
Random seed for reproducible barcode generation. |
|
|
|
Maximum candidate attempts before raising a |
|
|
|
Operation name. |
|
|
|
Display style applied to barcode sequences. |
|
|
|
Dimension-name ordering for downstream multi-pool iteration. |
|
|
|
Prefix for auto-generated sequence names. |
|
|
|
Design card keys to include. Available keys: |
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see get_barcodes() in the
API Reference.
Examples
Basic fixed-length barcodes
Generate 10 length-8 barcodes with minimum edit distance 3.
barcodes = pp.get_barcodes(
num_barcodes=10,
length=8,
min_edit_distance=3,
seed=42,
)
barcodes.print_library()
TAAACCAC
TCTGACTG
GCCGAATA
GGGATATA
GGCAACGA
CATGTGCG
GCGACCCT
TGCGACAG
TGACGCTT
GC content and homopolymer constraints
Restrict GC content to 40–60 % and disallow runs of 3 or more identical bases.
barcodes = pp.get_barcodes(
num_barcodes=20,
length=10,
min_edit_distance=3,
gc_range=(0.4, 0.6),
max_homopolymer=2,
seed=0,
)
barcodes.print_library()
TAGTGCTTGA
AATATGCGAC
GAGCGTATGC
CAATGCCTGT
... (20 total)
Avoiding adapter sequences
Keep all barcodes at least edit distance 4 from a set of adapter sequences to prevent ligation artefacts.
adapters = ["AGATCGGAAG", "CTGTCTCTTA"]
barcodes = pp.get_barcodes(
num_barcodes=50,
length=8,
min_edit_distance=3,
avoid_sequences=adapters,
avoid_min_distance=4,
seed=1,
)
barcodes.print_library()
GCAGAAAA
TCTACTTC
GCCTGATA
CGAGTCGG
... (50 total)
Variable-length barcodes
Mix 6-mer and 8-mer barcodes in a 1:1 ratio; shorter barcodes are
right-padded with -. To obtain the unpadded sequences, apply
clear_gaps() before calling generate_library().
barcodes = pp.get_barcodes(
num_barcodes=10,
length=[6, 8],
length_proportions=[0.5, 0.5],
min_edit_distance=3,
seed=7,
prefix="bc",
)
barcodes.print_library()
ATTACA--
AACATACA
GTCAGC--
CGAAAC--
TGTTGGCC
AGTGTG--
ATCGCT--
AAGGGTTA
GTAAGTGT
cleaned = barcodes.clear_gaps()
df = cleaned.generate_library()
| name | seq |
|---|---|
| bc_0 | CTAAAGAC |
| bc_1 | ATTACA |
| bc_2 | AACATACA |
| bc_3 | GTCAGC |
| bc_4 | CGAAAC |
| bc_5 | TGTTGGCC |
| bc_6 | AGTGTG |
| bc_7 | ATCGCT |
| bc_8 | AAGGGTTA |
| bc_9 | GTAAGTGT |
See get_barcodes().