clear_gaps

Remove all gap and non-molecular characters (-, ., spaces, and any other characters outside the DNA alphabet) from sequences. XML region tags are preserved intact; only characters between tags are filtered. Because the output length varies with the number of gaps removed, the resulting pool does not carry a fixed seq_length.

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

pool

Pool | str

(required)

The Pool (or plain sequence string) to clear gaps from.

region

str | list | None

None

Restrict gap removal to a named region or [start, stop] pair.

remove_tags

bool | None

None

When True and region is a name, strip the constraint region tags from the output.

iter_order

float | None

None

Enumeration order when combined with other pools.

prefix

str | None

None

Prefix for auto-generated sequence names.


Note

Only the most commonly used parameters are shown above. For the full parameter list, see clear_gaps() in the API Reference.

Examples

Remove gap markers from a deletion_scan result

A deletion_scan replaces deleted bases with - markers. Pipe the result through clear_gaps to produce gapless sequences of varying length.

wt   = pp.from_seq("ATCGATCG")
dels = pp.deletion_scan(wt, deletion_length=2, mode="sequential")
clean = pp.clear_gaps(dels)
clean.print_library()
clean: seq_length=None, num_states=7 CGATCG
AGATCG
ATATCG
ATCTCG
ATCGCG
ATCGAG
ATCGAT

Clear gaps from a manually gapped sequence

Strip dash characters from a sequence that was constructed with explicit alignment gaps.

wt    = pp.from_seq("AT--CG--AT")
clean = pp.clear_gaps(wt)
clean.print_library()
clean: seq_length=None, num_states=1 ATCGAT

Chain clear_gaps with another operation

Remove gaps first, then apply rc to produce gapless reverse-complement sequences ready for downstream analysis.

wt    = pp.from_seq("AT--CG")
clean = pp.clear_gaps(wt)
rev   = pp.rc(clean)
rev.print_library()
rev: seq_length=None, num_states=1 CGAT

See clear_gaps().