from_fasta

Extract one or more genomic regions from a FASTA file and create a pool. Coordinates are 0-based half-open intervals [start, stop) following the convention (chrom, start, stop, strand). A single tuple gives a fixed pool; a list of tuples gives a sequential pool that iterates through all extracted sequences.

import poolparty as pp
pp.init()

Parameters

Parameter

Type

Default

Description

fasta_path

str

(required)

Path to the FASTA file. Indexed via pyfaidx on first use; a .fai index file is created automatically if absent.

coordinates

tuple | list[tuple]

(required)

A single tuple (chrom, start, stop, strand) or a list of such tuples. start/stop are 0-based integers. strand is '+' or '-'; '-' triggers automatic reverse complementation.

pool

Pool | None

None

Background pool. When provided with region, the extracted sequence replaces the content of that region.

region

str | None

None

Region to replace in pool. Required when pool is provided.

remove_tags

bool | None

None

If True, strip region tags from the output (single-coordinate mode only).

prefix

str | None

None

Prefix for auto-generated sequence names. Names follow {prefix}_{chrom}:{start}-{stop}({strand}).

style

str | None

None

Display style applied to every extracted sequence.

iter_order

int | None

None

Dimension-name ordering for downstream multi-pool iteration.

cards

dict | list | None

None

Design card columns to include in library output.

Note

For circular genomes, start > stop indicates wrap-around across the origin — the extracted sequence runs from start to the end of the chromosome and continues from the beginning to stop.


Note

Only the most commonly used parameters are shown above. For the full parameter list, see from_fasta() in the API Reference.

Examples

Single region, forward strand

Extract 20 bases from chromosome 1, forward strand.

pool = pp.from_fasta(
    "genome.fa",
    coordinates=("chr1", 1000, 1020, "+"),
)
pool.print_library()
pool: seq_length=20, num_states=1 ATCGATCGATCGATCGATCG

Note

Output above is illustrative — actual sequences depend on the content of the FASTA file.

Reverse-strand extraction

strand='-' automatically returns the reverse complement of the interval — use this for genes encoded on the minus strand.

pool = pp.from_fasta(
    "genome.fa",
    coordinates=("chr2", 5000, 5010, "-"),
)
pool.print_library()
pool: seq_length=10, num_states=1 CGTAGCTAGC

Multiple coordinates — sequential pool

A list of tuples creates a sequential pool. Each sequence is named automatically; prefix prepends a custom label.

coords = [
    ("chr1", 1000, 1010, "+"),
    ("chr2", 5000, 5010, "-"),
    ("chr3", 200,  210,  "+"),
]
pool = pp.from_fasta("genome.fa", coordinates=coords, prefix="enh")
pool.print_library()
pool: seq_length=10, num_states=3 ATCGATCGAT
CGTAGCTAGC
GCATGCATGC

Inserting into a named region

Provide pool and region to place extracted sequences inside a fixed flanking context — useful for tiling genomic sequences into a library vector.

vector = pp.from_seq("GCGCGC<insert>XXXXXXXXXX</insert>GCGCGC")
coords = [("chr1", 1000, 1010, "+"), ("chr1", 2000, 2010, "+")]
pool   = pp.from_fasta("genome.fa", coordinates=coords,
                       pool=vector, region="insert")
pool.print_library()
pool: seq_length=22, num_states=2 GCGCGC<insert>ATCGATCGAT</insert>GCGCGC
GCGCGC<insert>TTGGAACCTA</insert>GCGCGC

See from_fasta().