Quickstart Guide ================ This guide introduces the core concepts of PoolParty through practical examples. Installation ------------ .. code-block:: bash pip install poolparty Basic Concepts -------------- PoolParty uses **Pools** to represent collections of DNA sequences. Pools are: - **Lazy**: Sequences are generated on-demand, not stored in memory - **Composable**: Pools can be combined using operations like ``join``, ``+``, and ``*`` - **Stateful**: Each pool tracks its position in a combinatorial space via StateTracker Getting Started --------------- First, import PoolParty and initialize a session: .. code-block:: python import poolparty as pp # Initialize PoolParty (creates a default Party context) pp.init() Creating Pools -------------- From a Single Sequence ~~~~~~~~~~~~~~~~~~~~~~ Create a pool containing a single sequence: .. code-block:: python # Create a pool from a single sequence wt = pp.from_seq("ATCGATCGATCG") # Generate and display df = wt.generate_library() print(df[["seq"]]) From Multiple Sequences ~~~~~~~~~~~~~~~~~~~~~~~ Create a pool that selects from multiple sequences: .. code-block:: python # Create a pool from multiple sequences variants = pp.from_seqs(["AAAA", "CCCC", "GGGG", "TTTT"]) df = variants.generate_library() print(df[["seq"]]) K-mer Pools ~~~~~~~~~~~ Generate all k-mers of a given length. The ``mode="sequential"`` argument tells PoolParty to enumerate every k-mer rather than sampling randomly (see :doc:`operations/modes`). .. code-block:: python kmers = pp.get_kmers(length=3, mode="sequential") # all 64 3-mers df = kmers.generate_library() print(f"Generated {len(df)} sequences") print(df[["seq"]].head(10)) Combining Pools --------------- Concatenation with ``join`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Join pools to create composite sequences: .. code-block:: python # Create components pp.init() # Reset to fresh state promoter = pp.from_seq("ATCG") barcode = pp.get_kmers(length=4, mode="sequential") # all 256 4-mers # Join them together library = pp.join([promoter, barcode]) df = library.generate_library() print(f"Generated {len(df)} sequences") print(df[["seq"]].head(5)) Using the ``+`` Operator ~~~~~~~~~~~~~~~~~~~~~~~~ Pools can also be concatenated with ``+``: .. code-block:: python pp.init() left = pp.from_seq("AAA") middle = pp.from_seqs(["G", "C"]) right = pp.from_seq("TTT") combined = left + middle + right df = combined.generate_library() print(df[["seq"]]) Mutagenesis ----------- Random Mutations ~~~~~~~~~~~~~~~~ Apply random mutations to a sequence. Operations can be called as methods on a Pool — ``wt.mutagenize(...)`` is equivalent to ``pp.mutagenize(wt, ...)``. .. code-block:: python pp.init() # Start with a wild-type sequence wt = pp.from_seq("ATCGATCGATCG") # Create single-mutation variants mutants = wt.mutagenize(num_mutations=1) df = mutants.generate_library() print(f"Generated {len(df)} single mutants") print(df[["seq"]].head(10)) Scan Operations --------------- Scan operations tile across sequence positions. Replacement Scan ~~~~~~~~~~~~~~~~ Replace each position with alternative bases: .. code-block:: python pp.init() wt = pp.from_seq("ATCG") alt = pp.from_seqs(["A", "C", "G", "T"], mode="sequential") # Replace each position with all 4 bases scan = wt.replacement_scan(replacement_pool=alt, mode="sequential") df = scan.generate_library() print(df[["name", "seq"]]) Deletion Scan ~~~~~~~~~~~~~ Systematically delete portions of a sequence: .. code-block:: python pp.init() wt = pp.from_seq("ATCGATCG") # Delete 2-nt windows across the sequence deletions = wt.deletion_scan(deletion_length=2, mode="sequential") df = deletions.generate_library() print(df[["name", "seq"]]) Working with Regions -------------------- PoolParty supports XML-like region tagging for targeting specific parts of sequences. See :doc:`regions` for a full explanation of tag syntax and region behaviour. Tagging Regions ~~~~~~~~~~~~~~~ .. code-block:: python pp.init() # Define a sequence with a tagged region seq = "AAAAATCGATCGTTTT" wt = pp.from_seq(seq) # Apply mutations only to the CRE region mutants = wt.mutagenize(num_mutations=1, region="cre") df = mutants.generate_library() print(f"Generated {len(df)} CRE mutants") print(df[["seq"]].head(5)) Generating Libraries -------------------- The ``generate_library()`` method produces a pandas DataFrame with sequence information: .. code-block:: python pp.init() # Create a simple library promoter = pp.from_seq("ATCG") barcode = pp.from_iupac("MMM", mode="sequential") # M = A or C library = pp.join([promoter, barcode]) # Generate with full metadata df = library.generate_library() print("Columns available:") print(df.columns.tolist()) print() print(df[["name", "seq"]].head()) Initialisation and Context Management -------------------------------------- ``pp.init()`` — persistent context ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``pp.init()`` creates a long-lived Party context that stays active for the rest of the session. This is the recommended approach for notebooks and interactive scripts. .. code-block:: python import poolparty as pp pp.init() pool = pp.from_seq("ACGT") df = pool.generate_library() .. note:: If you need a clean slate — for example, at the top of a new notebook cell block or after an experiment — call ``pp.init()`` again. This tears down the previous context and starts fresh: **all prior pools and operations are discarded**. ``pp.init()`` accepts: - ``genetic_code`` (``str | dict``, default ``"standard"``) — genetic code for ORF operations. - ``log_level`` (``str | None``, default ``None``) — if set, configures logging (``"DEBUG"``, ``"INFO"``, ``"WARNING"``, etc.). ``with pp.Party()`` — scoped context ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For isolation — running independent experiments, writing reusable functions, or testing — use ``with pp.Party()``. The context is cleaned up automatically when the block exits. .. code-block:: python with pp.Party() as party: pool = pp.from_seq("ACGT") df = pool.generate_library() # context closed Contexts nest automatically: .. code-block:: python with pp.Party() as outer: wt = pp.from_seq("ACGT") with pp.Party() as inner: other = pp.from_seq("TTTT") # inner is active # outer is active again; wt is still usable .. list-table:: :widths: 50 25 25 :header-rows: 1 * - Scenario - ``pp.init()`` - ``with pp.Party()`` * - Interactive notebook or REPL - Recommended - * - Multiple independent experiments - - Recommended * - Inside a reusable function - - Recommended * - Quick reset (discard all pools) - Call again - Start a new ``with`` block Configuration ~~~~~~~~~~~~~ These functions apply to whichever Party is currently active: .. list-table:: :widths: 40 60 :header-rows: 1 * - Function - Description * - ``pp.clear_pools()`` - Discard all pools and operations without resetting configuration. * - ``pp.toggle_styles(on=True)`` - Enable or disable inline sequence styling. * - ``pp.toggle_cards(on=True)`` - Enable or disable design card computation. * - ``pp.set_text_progress(on=True)`` - Use text-based progress bars instead of notebook widgets. * - ``pp.configure_logging(level)`` - Set the logging level for ``poolparty`` and ``statetracker``. * - ``pp.set_genetic_code(genetic_code)`` - Change the genetic code (affects ORF operations). .. code-block:: python # Disable cards and styles for a performance-sensitive run pp.init() pp.toggle_cards(on=False) pp.toggle_styles(on=False) pool = pp.from_iupac("NNNNNNNN", mode="sequential") df = pool.to_df(num_cycles=1) # no card columns, no style overhead Next Steps ---------- - Browse the :doc:`operations/index` for the full list of composable operations - See :doc:`pool` for Pool properties and export methods (``to_df``, ``to_file``) - Check out `StateTracker `_ for understanding the underlying state algebra