clear_gaps

Remove all gap and non-molecular characters (-, ., spaces, and any other characters outside the DNA alphabet) from sequences. XML region tags are preserved intact; only characters between tags are filtered. Because the output length varies with the number of gaps removed, the resulting pool does not carry a fixed seq_length.

import poolparty as pp
pp.init()

Parameters

Parameter	Type	Default	Description
`pool`	`Pool \| str`	(required)	The Pool (or plain sequence string) to clear gaps from.
`region`	`str \| list \| None`	`None`	Restrict gap removal to a named region or `[start, stop]` pair.
`remove_tags`	`bool \| None`	`None`	When `True` and `region` is a name, strip the constraint region tags from the output.
`iter_order`	`float \| None`	`None`	Dimension-name ordering for downstream multi-pool iteration.
`prefix`	`str \| None`	`None`	Prefix for auto-generated sequence names.

Note

Only the most commonly used parameters are shown above. For the full parameter list, see clear_gaps() in the API Reference.

Examples

Remove gap markers from a deletion_scan result

A deletion_scan replaces deleted bases with - markers. Pipe the result through clear_gaps to produce gapless sequences of varying length.

wt   = pp.from_seq("ATCGATCG")
dels = pp.deletion_scan(wt, deletion_length=2, mode="sequential")
clean = pp.clear_gaps(dels)
clean.print_library()

clean: seq_length=None, num_states=7 CGATCG
AGATCG
ATATCG
ATCTCG
ATCGCG
ATCGAG
ATCGAT

Clear gaps from a manually gapped sequence

Strip dash characters from a sequence that was constructed with explicit alignment gaps.

wt    = pp.from_seq("AT--CG--AT")
clean = pp.clear_gaps(wt)
clean.print_library()

clean: seq_length=None, num_states=1 ATCGAT

Chain clear_gaps with another operation

Remove gaps first, then apply rc to produce gapless reverse-complement sequences ready for downstream analysis.

wt    = pp.from_seq("AT--CG")
clean = pp.clear_gaps(wt)
rev   = pp.rc(clean)
rev.print_library()

rev: seq_length=None, num_states=1 CGAT

See clear_gaps().