Seed Readers
Seed readers are engine-side adapters that turn a configured seed source into tabular seed rows. The engine attaches a SeedSource and secret resolver, asks the reader for column names and dataset size, then streams batches into generation.
Related pages: seeds, Seed Datasets, and Build Your Own.
Core Contracts
SeedReader
Bases: ABC, Generic[SourceT]
Base class for reading a seed dataset.
Seeds are read using duckdb. Reader implementations define duckdb connection setup details
and how to get a URI that can be queried with duckdb (i.e. "... FROM
The Data Designer engine automatically supplies the appropriate SeedSource
and a SecretResolver to use for any secret fields in the config via
attach(...). Subclasses that need per-attachment setup can override
on_attach(...) without needing to call super().
Methods:
| Name | Description |
|---|---|
attach |
Attach a source and secret resolver to the instance. |
create_filesystem_context |
Create a rooted filesystem context for directory-backed seed readers. |
get_column_names |
Returns the seed dataset's column names |
get_seed_type |
Return the seed_type of the source class this reader is generic over. |
on_attach |
Hook for subclasses that need per-attachment setup. |
attach(source, secret_resolver)
Attach a source and secret resolver to the instance.
This is called internally by the engine so that these objects do not need to be provided in the reader's constructor.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
195 196 197 198 199 200 201 202 203 204 | |
create_filesystem_context(root_path)
Create a rooted filesystem context for directory-backed seed readers.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
244 245 246 247 248 | |
get_column_names()
Returns the seed dataset's column names
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
276 277 278 279 280 281 282 | |
get_seed_type()
Return the seed_type of the source class this reader is generic over.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
on_attach()
Hook for subclasses that need per-attachment setup.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
206 207 | |
FileSystemSeedReader
Bases: SeedReader[FileSystemSourceT], ABC
Base class for filesystem-derived seed readers.
Plugin authors implement build_manifest(...) to describe the cheap logical
rows available under the configured filesystem root. Readers that need
expensive enrichment can optionally override hydrate_row(...) to emit one
record dict or an iterable of record dicts per manifest row. When emitted
records change the manifest schema, output_columns must declare the exact
hydrated output schema for each emitted record. The framework owns
attachment-scoped filesystem context reuse, manifest sampling, partitioning,
randomization, batching, and DuckDB registration details.
SeedReaderFileSystemContext
Filesystem and root path available to filesystem seed-reader plugins.
SeedReaderBatch
Bases: Protocol
Batch object returned by seed readers and convertible to a DataFrame.
SeedReaderBatchReader
Bases: Protocol
Reader that yields seed batches until exhausted.
PandasSeedReaderBatch
Seed-reader batch backed by an in-memory pandas DataFrame.
Methods:
| Name | Description |
|---|---|
to_pandas |
Return the batch as a pandas DataFrame. |
to_pandas()
Return the batch as a pandas DataFrame.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
78 79 80 | |
create_seed_reader_output_dataframe
Create a DataFrame and verify hydrated records match the declared output schema.
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | |
Built-In Readers
LocalFileSeedReader
Bases: SeedReader[LocalFileSeedSource]
HuggingFaceSeedReader
Bases: SeedReader[HuggingFaceSeedSource]
DataFrameSeedReader
Bases: SeedReader[DataFrameSeedSource]
DirectorySeedReader
Bases: FileSystemSeedReader[DirectorySeedSource]
FileContentsSeedReader
Bases: FileSystemSeedReader[FileContentsSeedSource]
AgentRolloutSeedReader
Bases: FileSystemSeedReader[AgentRolloutSeedSource]
Registry and Errors
SeedReaderRegistry
Source code in packages/data-designer-engine/src/data_designer/engine/resources/seed_reader.py
661 662 663 664 | |
SeedReaderError
Bases: DataDesignerError