Sequence Names

Every sequence produced by generate_library() has a name — a dot-separated string built from segments contributed by each operation. Names let you trace exactly how a sequence was constructed.

All examples assume:

import poolparty as pp
pp.init()

How names are built

Each operation can contribute a name segment via its prefix parameter. Segments are collected from source to downstream and joined with dots:

name = "prefix_A.prefix_B.prefix_C"

If an operation has prefix=None (the default), it contributes nothing to the name. If no operation in the pipeline sets a prefix, the name column is None.


The prefix parameter

Most operations accept a prefix parameter. How the prefix is formatted depends on the operation’s mode:

Fixed mode (single deterministic output):

Contributes the prefix string as-is.

pool = pp.from_seq("ACGT", prefix="wt")
df   = pool.generate_library()
df — 1 row × 2 columns
nameseq
wtACGT
Sequential mode (one state per variant):

Appends a zero-padded state index. The width adjusts to the number of states so names sort correctly.

pool = pp.from_seqs(["AAAA", "CCCC", "GGGG"], mode="sequential", prefix="var")
df   = pool.generate_library()
df — 3 rows × 2 columns
nameseq
var_0AAAA
var_1CCCC
var_2GGGG

With more states the padding grows:

pool = pp.from_iupac("NNNN", mode="sequential", prefix="seq")
df   = pool.generate_library()
# names: "seq_000", "seq_001", ..., "seq_255"
Random mode (random draws):

Appends a zero-padded global draw index, based on how many sequences were requested.

wt   = pp.from_seq("ATCGATCG")
pool = wt.mutagenize(num_mutations=1, prefix="mut")
df   = pool.generate_library(num_seqs=50)
# names: "mut_00", "mut_01", ..., "mut_49"

Chaining operations

When multiple operations in a pipeline set prefix, each contributes a segment and they are joined with dots:

wt   = pp.from_seq("ATCGATCG", prefix="bg")
muts = wt.mutagenize(num_mutations=1, num_states=3, prefix="mut")
df   = muts.generate_library()
df — 3 rows × 2 columns
nameseq
bg.mut_0CTCGATCG
bg.mut_1GTCGATCG
bg.mut_2TTCGATCG

Add more segments with add_prefix:

tagged = muts.add_prefix("final")
df     = tagged.generate_library()
df — 3 rows × 2 columns
nameseq
bg.mut_0.finalCTCGATCG
bg.mut_1.finalGTCGATCG
bg.mut_2.finalTTCGATCG

Custom sequence names with from_seqs

from_seqs accepts a seq_names parameter for explicit names that override the prefix logic:

pool = pp.from_seqs(
    ["ATCG", "ATAG", "AACG"],
    seq_names=["wt", "mut_A", "mut_B"],
    mode="sequential",
)
df = pool.generate_library()
df — 3 rows × 2 columns
nameseq
wtATCG
mut_AATAG
mut_BAACG

Scan operation names

Scan operations can contribute compound names with separate segments for the position index and the variant index. These are controlled by additional prefix parameters:

wt   = pp.from_seq("ACGTACGT")
alt  = pp.from_seqs(["A", "C", "G", "T"], mode="sequential", prefix="base")
scan = wt.replacement_scan(replacement_pool=alt, mode="sequential",
                           prefix="scan", prefix_position="pos",
                           prefix_insert="ins")
df   = scan.generate_library(num_seqs=8)
# names: "scan_00.pos_0.base_0", "scan_01.pos_0.base_1", ...

Pool.named() vs prefix

These are different things:

  • pool.named("my_pool") sets the pool’s metadata name — used for display, DAG visualization, and internal tracking. It does not affect the name column in the output.

  • prefix="label" on an operation affects the sequence names in the generated DataFrame.

pool = pp.from_seq("ACGT", prefix="bg").named("my_pool")
print(pool.name)   # "my_pool" (pool metadata)
df = pool.generate_library()
df — 1 row × 2 columns
nameseq
bgACGT

The pool is called "my_pool" (used in DAG display), but the sequence’s name in the output is "bg" (from the prefix parameter).