0.0.4 • Published 7 months ago
@cli-upkaran/dataprep-core v0.0.4
@cli-upkaran/dataprep-core
Core data preparation logic, types, and interfaces for cli-upkaran
.
This package provides the shared foundation for data preparation pipelines used by various cli-upkaran
commands. It defines the structure for adapters (data sources) and transformers (data modifiers).
Features
- Defines core data structures (e.g.,
DataSource
,Document
). - Defines interfaces for Adapters and Transformers.
- Provides utilities for common tasks like text splitting, whitespace removal, and token counting (using
js-tiktoken
). - Orchestrates the data preparation pipeline: Adapters -> Transformers -> Formatters.
Installation
This package is intended as a dependency for cli-upkaran
commands and adapters/transformers.
pnpm add @cli-upkaran/dataprep-core
Concepts
- Adapters: Responsible for fetching raw data from a source (e.g., filesystem, website, database) and converting it into a stream of
DataSource
objects. - Transformers: Modify the content or metadata of
Document
objects withinDataSource
s (e.g., chunking, cleaning, metadata extraction). - Formatters: Take the final processed
DataSource
objects and format them for output (e.g., Markdown, JSON).
Usage
Command plugins interact with this core library to execute data preparation pipelines.
import {
runDataPrepPipeline,
type DataPrepAdapterOptions,
// ... other types
} from '@cli-upkaran/dataprep-core';
async function runMyDataCommand(options: MyDataCommandOptions) {
// Configure adapter(s)
const adapterOptions: DataPrepAdapterOptions = {
adapterType: 'filesystem', // or 'website', etc.
// ... adapter-specific config from options ...
};
// Configure transformer(s)
const transformerOptions = { /* ... */ };
// Configure formatter
const formatterOptions = { format: options.outputFormat };
// Execute the pipeline
await runDataPrepPipeline({
adapterConfigs: [adapterOptions],
transformerConfigs: [transformerOptions],
formatterConfig: formatterOptions,
outputFile: options.outputFile,
});
}
(Note: The exact API (runDataPrepPipeline
) is illustrative and may differ based on actual implementation.)
Contributing
See the main CONTRIBUTING.md in the root of the repository.
License
MIT - See the main LICENSE file in the root of the repository.
0.0.4
7 months ago
0.0.2
7 months ago
0.0.2-latest.2
7 months ago
0.0.2-beta.1
7 months ago
0.0.2-beta.0
7 months ago