Sequence Regions
You often want to perform different operations on different parts of a sequence. Regions let you mark specific segments with XML-style tags so that operations can target them by name.
import poolparty as pp
pp.init()
Tag syntax
PoolParty supports two forms of region tag:
Opening/closing pairs enclose a segment of the sequence:
wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
wt.print_library()
Self-closing tags mark a zero-length insertion point:
wt = pp.from_seq("ACGT<ins/>ACGT")
wt.print_library()
Tags can be written inline when creating a pool with from_seq or
from_seqs, or added programmatically with insert_tags
or annotate_region.
Targeting operations with region=
Many operations accept a region parameter that restricts the operation to
the tagged region. Flanking sequences are left unchanged:
wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
mutants = wt.mutagenize(num_mutations=1, region="cre", mode="sequential")
mutants.print_library(num_seqs=4)
AAAA<cre>GTCGATCG</cre>TTTT
AAAA<cre>TTCGATCG</cre>TTTT
AAAA<cre>AACGATCG</cre>TTTT ... (24 total)
Only the 8 bases inside <cre> are mutated; the flanking AAAA and
TTTT remain intact. See Region Operations for the full
list of region-aware operations.
Persistence through the DAG
Region tags persist through the DAG and remain valid even when upstream operations change the content within a region. This means multiple operations can target the same region in series:
wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT")
mutants = wt.mutagenize(num_mutations=1, region="cre", mode="sequential")
dels = mutants.deletion_scan(deletion_length=3, region="cre", mode="sequential")
dels.print_library(num_seqs=4)
AAAA<cre>C---ATCG</cre>TTTT
AAAA<cre>CT---TCG</cre>TTTT
AAAA<cre>CTC---CG</cre>TTTT ... (144 total)
Here mutagenize produces 24 single-point mutants of the cre region,
and deletion_scan then slides a 3-bp deletion across the same region (6
positions per mutant), giving 24 × 6 = 144 total sequences. The cre tag
is valid at both steps.
Inspecting regions
Every pool tracks which regions are present in its sequences via the
pool.regions property:
wt = pp.from_seq("AAAA<cre>ATCGATCG</cre>TTTT<ins/>GGGG")
wt.regions
{Region(name='cre', seq_length=8), Region(name='ins', seq_length=0)}
Each Region object records the region’s name and the
length of its content (0 for self-closing tags). See
Region in the API Reference for full details.