insertion_scan
==============
Insert sequences from ``insertion_pool`` at every position along the background
sequence (or within a named region). Unlike :func:`~poolparty.replacement_scan`,
no background bases are removed, so output sequences are longer than the input.
Set ``replace=True`` to replace a window of ``ins_length`` bases at each
position rather than inserting without deletion; output length stays equal
to the background length. This is equivalent to
:func:`~poolparty.replacement_scan`.
.. code-block:: python
import poolparty as pp
pp.init()
----
Parameters
----------
.. list-table::
:header-rows: 1
:widths: auto
* - Parameter
- Type
- Default
- Description
* - ``pool``
- ``Pool | str``
- *(required)*
- The background Pool to scan. Can also be a plain sequence string.
* - ``insertion_pool``
- ``Pool | str``
- *(required)*
- Pool or sequence string whose content is inserted at each scanned
position. An *L*-mer has *L* + 1 valid insertion sites (before
each base and after the last).
* - ``positions``
- ``list[int] | None``
- ``None``
- Explicit list of insertion positions. ``None`` = all valid positions.
* - ``region``
- ``str | list | None``
- ``None``
- Restrict insertions to a named region or ``[start, stop]`` interval.
Flanking sequences are never modified.
* - ``replace``
- ``bool``
- ``False``
- When ``True``, a window of ``ins_length`` bases is replaced at each
position (equivalent to :func:`~poolparty.replacement_scan`). Valid
positions = background length − insert length + 1; output length =
background length.
* - ``style``
- ``str | None``
- ``None``
- Named display style applied to inserted bases.
* - ``prefix``
- ``str | None``
- ``None``
- Prefix for auto-generated sequence names.
* - ``mode``
- ``str``
- ``'random'``
- ``'sequential'`` iterates positions then inserts in order; ``'random'``
shuffles the (position × insert) product.
* - ``num_states``
- ``int | None``
- ``None``
- Number of output states. ``None`` auto-computes in sequential mode
or defaults to 1 in random mode.
* - ``iter_order``
- ``int | None``
- ``None``
- Enumeration order when combined with other pools.
----
.. note::
Only the most commonly used parameters are shown above. For the full
parameter list, see :func:`~poolparty.insertion_scan` in the
:doc:`API Reference `.
Examples
--------
Single-base insertions at every position
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An 8-mer has 9 insertion sites. 9 sites × 4 bases = 36 sequences, each of
length 9.
.. code-block:: python
wt = pp.from_seq("ACGTACGT")
bases = pp.from_seqs(["A", "C", "G", "T"], mode="sequential")
scan = wt.insertion_scan(insertion_pool=bases, mode="sequential", style="red")
scan.print_library()
.. raw:: html
AACGTACGT
AACGTACGT
ACAGTACGT
ACGATACGT
ACGTAACGT
... (36 total)
All-dinucleotide insertions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use ``from_iupac("NN")`` to enumerate all 16 dinucleotide inserts.
9 sites × 16 inserts = 144 sequences, each of length 10.
.. code-block:: python
wt = pp.from_seq("ACGTACGT")
nn = pp.from_iupac("NN", mode="sequential")
scan = wt.insertion_scan(insertion_pool=nn, mode="sequential", style="red")
scan.print_library()
.. raw:: html
AAACGTACGT
AAACGTACGT
ACAAGTACGT
ACGAATACGT
ACGTAAACGT
... (144 total)
Insert-and-replace mode (replace=True)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``replace=True`` replaces a window equal in width to the insert (here 2
bases) at each position. For an 8-mer with a 2-base insert: 8 − 2 + 1 = 7
valid positions; output length stays 8. This is equivalent to calling
:func:`~poolparty.replacement_scan`.
.. code-block:: python
wt = pp.from_seq("ACGTACGT")
bases = pp.from_seqs(["AA", "CC", "GG", "TT"], mode="sequential")
scan = wt.insertion_scan(insertion_pool=bases, replace=True, mode="sequential",
style="red")
scan.print_library()
.. raw:: html
AAGTACGT
AAATACGT
ACAAACGT
ACGAACGT
ACGTAAGT
... (28 total)
Insertion scan within a named region
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Restrict insertion sites to the ``cre`` region. The 8-base region has 9
valid insertion sites; flanks are never altered.
.. code-block:: python
wt = pp.from_seq("AAAAATCGATCGTTTT")
bases = pp.from_seqs(["A", "C", "G", "T"], mode="sequential")
scan = wt.insertion_scan(insertion_pool=bases, region="cre", mode="sequential",
style="red")
scan.print_library()
.. raw:: html
AAAA<cre>AATCGATCG</cre>TTTT
AAAA<cre>AATCGATCG</cre>TTTT
AAAA<cre>ATACGATCG</cre>TTTT
AAAA<cre>ATCAGATCG</cre>TTTT
AAAA<cre>ATCGAATCG</cre>TTTT
... (36 total)
Explicit position list
~~~~~~~~~~~~~~~~~~~~~~~
Limit the scan to specific insertion sites.
.. code-block:: python
wt = pp.from_seq("ACGTACGT")
bases = pp.from_seqs(["A", "C", "G", "T"], mode="sequential")
scan = wt.insertion_scan(insertion_pool=bases, positions=[0, 4, 8],
mode="sequential", style="red")
scan.print_library()
.. raw:: html
AACGTACGT
ACGTAACGT
ACGTACGTA
CACGTACGT
ACGTCACGT
... (12 total)
Random motif insertion (mode="random")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``mode='random'`` draws insertion positions stochastically. Here a degenerate
6-base IUPAC motif (``R`` = A|G, ``Y`` = C|T) is inserted at random
positions along a 12-mer.
.. code-block:: python
wt = pp.from_seq("ACGTACGTACGT")
motif = pp.from_iupac("RRYYYY")
scan = wt.insertion_scan(insertion_pool=motif, mode="random", num_states=5,
style="red")
scan.print_library()
.. raw:: html
ACGTACGTACGTGACCCT
ACGTACGTACGTGATCTT
ACGACCTTGTACGTACGT
ACGAACTTTTACGTACGT
ACGTACAATTCCGTACGT
See :func:`~poolparty.insertion_scan`.