Sequence Names
Every sequence produced by generate_library() has a name: a
dot-separated string built from segments contributed by each operation.
Names let you trace exactly how a sequence was constructed.
All examples assume:
import poolparty as pp
pp.init()
How names are built
Each operation can contribute a name segment via its prefix parameter.
Segments are collected from source to downstream and joined with dots:
name = "prefix_A.prefix_B.prefix_C"
If an operation has prefix=None (the default), it contributes nothing
to the name. If no operation in the pipeline sets a prefix, the name
column is None.
The prefix parameter
Most operations accept a prefix parameter. How the prefix is formatted
depends on the operation’s mode:
- Fixed mode (one output):
The name is the prefix string itself, with no index.
pool = pp.from_seq("ACGT", prefix="wt") df = pool.generate_library()
df — 1 row × 2 columnsname seq wt ACGT - Sequential mode (one output per variant):
Appends an underscore and a numeric index to the prefix (
var_0,var_1, …). Each index corresponds to a specific variant, sovar_3always identifies the same sequence. Indexes are zero-padded when there are many variants so that names sort correctly.pool = pp.from_seqs(["AAAA", "CCCC", "GGGG"], mode="sequential", prefix="var") df = pool.generate_library()
df — 3 rows × 2 columnsname seq var_0 AAAA var_1 CCCC var_2 GGGG With more states the padding grows:
pool = pp.from_iupac("NNNN", mode="sequential", prefix="seq") df = pool.generate_library() # names: "seq_000", "seq_001", ..., "seq_255"
- Random mode (sampled outputs):
Appends an underscore and a running counter (
mut_0,mut_1, …). The counter reflects the draw order, not a specific variant. With a fixed seed, the same counter always produces the same sequence, but the mapping depends on the seed.wt = pp.from_seq("ATCGATCG") pool = wt.mutagenize(num_mutations=1, prefix="mut") df = pool.generate_library(num_seqs=4)
df — 4 rows × 2 columnsname seq mut_0 ATCGGTCG mut_1 ATCGAACG mut_2 ATCGCTCG mut_3 GTCGATCG
Chaining operations
When multiple operations in a pipeline set prefix, each contributes a
segment and they are joined with dots:
wt = pp.from_seq("ATCGATCG", prefix="bg")
muts = wt.mutagenize(num_mutations=1, num_states=3, prefix="mut")
df = muts.generate_library()
| name | seq |
|---|---|
| bg.mut_0 | CTCGATCG |
| bg.mut_1 | GTCGATCG |
| bg.mut_2 | TTCGATCG |
Add more segments with add_prefix:
tagged = muts.add_prefix("final")
df = tagged.generate_library()
| name | seq |
|---|---|
| bg.mut_0.final | CTCGATCG |
| bg.mut_1.final | GTCGATCG |
| bg.mut_2.final | TTCGATCG |
Custom sequence names with from_seqs
from_seqs accepts a seq_names parameter for explicit names that
override the prefix logic:
pool = pp.from_seqs(
["ATCG", "ATAG", "AACG"],
seq_names=["wt", "mut_A", "mut_B"],
mode="sequential",
)
df = pool.generate_library()
| name | seq |
|---|---|
| wt | ATCG |
| mut_A | ATAG |
| mut_B | AACG |
Scan operation names
Scan operations can contribute compound names with separate segments for the position index and the variant index. These are controlled by additional prefix parameters:
wt = pp.from_seq("ACGTACGT")
alt = pp.from_seqs(["A", "C", "G", "T"], mode="sequential", prefix="base")
scan = wt.replacement_scan(replacement_pool=alt, mode="sequential",
prefix="scan", prefix_position="pos",
prefix_insert="ins")
df = scan.generate_library(num_seqs=8)
# names: "scan_00.pos_0.base_0", "scan_01.pos_0.base_1", ...
Pool.named() vs prefix
These are different things:
pool.named("my_pool")sets the pool’s metadata name, used for display, DAG visualization, and internal tracking. It does not affect thenamecolumn in the output.prefix="label"on an operation affects the sequence names in the generated DataFrame.
pool = pp.from_seq("ACGT", prefix="bg").named("my_pool")
print(pool.name) # "my_pool" (pool metadata)
df = pool.generate_library()
| name | seq |
|---|---|
| bg | ACGT |
The pool is called "my_pool" (used in DAG display), but the sequence’s
name in the output is "bg" (from the prefix parameter).