Sequence Regions

You often want to perform different operations on different parts of a sequence. Regions let you mark specific segments with XML-style tags so that operations can target them by name.

import poolparty as pp
pp.init()

Tag syntax

PoolParty supports two forms of region tag:

Opening/closing pairs enclose a segment of the sequence:

wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
wt.print_library()
pool[0]: seq_length=16, num_states=1 AAAA<cre>ATCGATCG</cre>TTTT

Self-closing tags mark a zero-length insertion point:

wt = pp.from_seq("ACGT<ins/>ACGT")
wt.print_library()
pool[0]: seq_length=8, num_states=1 ACGT<ins/>ACGT

Tags can be written inline when creating a pool with from_seq or from_seqs, or added programmatically with insert_tags or annotate_region.


Targeting operations with region=

Many operations accept a region parameter that restricts the operation to the tagged region. Flanking sequences are left unchanged:

wt      = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
mutants = wt.mutagenize(num_mutations=1, region="cre", mode="sequential")
mutants.print_library(num_seqs=4)
pool[1]: seq_length=16, num_states=24 AAAA<cre>CTCGATCG</cre>TTTT
AAAA<cre>GTCGATCG</cre>TTTT
AAAA<cre>TTCGATCG</cre>TTTT
AAAA<cre>AACGATCG</cre>TTTT ... (24 total)

Only the 8 bases inside <cre> are mutated; the flanking AAAA and TTTT remain intact. See Region Operations for operations that create and manage region tags.


Persistence through the DAG

Region tags persist through the DAG and remain valid even when upstream operations change the content within a region. This means multiple operations can target the same region in series:

wt      = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
mutants = wt.mutagenize(num_mutations=1, region="cre", mode="sequential")
dels    = mutants.deletion_scan(deletion_length=3, region="cre", mode="sequential")
dels.print_library(num_seqs=4)
pool[4]: seq_length=16, num_states=144 AAAA<cre>---GATCG</cre>TTTT
AAAA<cre>C---ATCG</cre>TTTT
AAAA<cre>CT---TCG</cre>TTTT
AAAA<cre>CTC---CG</cre>TTTT ... (144 total)

Here mutagenize produces 24 single-point mutants of the cre region, and deletion_scan then slides a 3-bp deletion across the same region (6 positions per mutant), giving 24 × 6 = 144 total sequences. The cre tag is valid at both steps.


Inspecting regions

Every pool tracks which regions are present in its sequences via the pool.regions property:

wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT<ins/>GGGG")
wt.regions
{Region(name='cre', seq_length=8), Region(name='ins', seq_length=0)}

Each Region object records the region’s name and the length of its content (0 for self-closing tags). See Region in the API Reference for full details.