from_fasta
Extract one or more genomic regions from a FASTA file and create a pool.
Coordinates are 0-based half-open intervals [start, stop) following the
convention (chrom, start, stop, strand). A single tuple gives a fixed
pool; a list of tuples gives a sequential pool that iterates through all
extracted sequences.
import poolparty as pp
pp.init()
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the FASTA file. Indexed via |
|
|
(required) |
A single tuple |
|
|
|
Background pool or sequence string. When provided with |
|
|
|
Region to replace in |
|
|
|
If |
|
|
|
Prefix for auto-generated sequence names. Names follow
|
|
|
|
Display style applied to every extracted sequence. |
|
|
|
Enumeration order when combined with other pools. |
|
|
|
Design card columns to include in library output. |
Note
For circular genomes, start > stop indicates wrap-around across
the origin — the extracted sequence runs from start to the end of
the chromosome and continues from the beginning to stop.
Note
Only the most commonly used parameters are shown above. For the full
parameter list, see from_fasta() in the
API Reference.
Examples
Single region, forward strand
Extract 20 bases from chromosome 1, forward strand.
pool = pp.from_fasta(
"genome.fa",
coordinates=("chr1", 1000, 1020, "+"),
)
pool.print_library()
Note
Output above is illustrative — actual sequences depend on the content of the FASTA file.
Reverse-strand extraction
strand='-' automatically returns the reverse complement of the interval
— use this for genes encoded on the minus strand.
pool = pp.from_fasta(
"genome.fa",
coordinates=("chr2", 5000, 5010, "-"),
)
pool.print_library()
Multiple coordinates — sequential pool
A list of tuples creates a sequential pool. Each sequence is named
automatically; prefix prepends a custom label.
coords = [
("chr1", 1000, 1010, "+"),
("chr2", 5000, 5010, "-"),
("chr3", 200, 210, "+"),
]
pool = pp.from_fasta("genome.fa", coordinates=coords, prefix="enh")
pool.print_library()
CGTAGCTAGC
GCATGCATGC
Inserting into a named region
Provide pool and region to place extracted sequences inside a fixed
flanking context — useful for tiling genomic sequences into a library vector.
vector = pp.from_seq("GCGCGC<insert>XXXXXXXXXX</insert>GCGCGC")
coords = [("chr1", 1000, 1010, "+"), ("chr1", 2000, 2010, "+")]
pool = pp.from_fasta("genome.fa", coordinates=coords,
pool=vector, region="insert")
pool.print_library()
GCGCGC<insert>TTGGAACCTA</insert>GCGCGC
See from_fasta().