1.0.0 ⢠Published 7 months ago
@docshield/modern-migration v1.0.0
Modern Migration
Modern Migration is a Node.js package that helps transform unstructured data to structured data using LLMs with human review. It's designed for data migration scenarios where you need to standardize values with human oversight.
š Features
- LLM-powered transformations with Claude 3.5 Sonnet
- Interactive review interface for human verification
- Batch processing with configurable sizes
- Rate limiting to respect API constraints
- Type-safety with comprehensive TypeScript types
- ID tracking to maintain relationships with source records
- Real-time progress tracking
- Audit trail of all transformations
š Table of Contents
- Installation
- Quick Start
- Core Concepts
- Configuration
- Examples
- API Reference
- Development
- Contributing
- License
š Installation
npm install modern-migration
Prerequisites
- Node.js >= 18.0.0
- An Anthropic API key for Claude access
šÆ Quick Start
import { ModernMigration } from 'modern-migration'
// 1. Create a migration instance
const migration = new ModernMigration({
migrationName: 'state-codes',
options: ['AL', 'AK', 'AZ', 'AR', 'CA' /* ... other codes */],
prompt: 'Convert state name to two-letter code',
confidenceThreshold: 0.95, // Optional, defaults to 0.95
apiKey: process.env.ANTHROPIC_API_KEY,
})
// 2. Prepare your data with IDs
const items = [
{ id: '1', value: 'California' },
{ id: '2', value: 'New York' },
{ id: '3', value: 'Texas' },
]
// 3. Transform the values
const results = await migration.transform(items)
// Results:
// [
// { id: '1', original: 'California', transformed: 'CA' },
// { id: '2', original: 'New York', transformed: 'NY' },
// { id: '3', original: 'Texas', transformed: 'TX' }
// ]
// 4. Use the results to update your database
await db.collection('states').bulkWrite(
results.map(({ id, transformed }) => ({
updateOne: {
filter: { _id: id },
update: { $set: { stateCode: transformed } },
},
})),
)
š§© Core Concepts
Transform Pipeline
- Input with IDs: Provide an array of objects with
id
andvalue
properties - Batch Processing: Values are processed in configurable batches
- LLM Transformation: Claude processes each value with the provided prompt
- ID Association: Results are associated with original input IDs
- Review Interface: A web interface opens for human review of transformations
- Output: Returns results with original IDs, input values, and transformed values
Review Process
- The review interface automatically opens in your browser (default port: 3001)
- Review each batch of transformations
- Approve the batch when satisfied
- After all batches are approved, the transform method resolves with final results
āļø Configuration
Basic Configuration
interface TransformConfig {
migrationName: string // Unique identifier for this migration
options: string[] // Valid output options
prompt: string // LLM prompt for transformation
confidenceThreshold?: number // Default: 0.95
apiKey?: string // Anthropic API key
}
Advanced Configuration
Create a modernmigration.config.js
:
module.exports = {
outputDir: './migration-output',
rateLimit: {
requestsPerMinute: 50, // API rate limit
retryAttempts: 3, // Retry attempts on failure
},
review: {
port: 3001, // Web interface port
reviewBatchSize: 10, // Number of items to show in review UI
transformBatchSize: 10, // Number of items in each LLM request
},
}
š Examples
The project includes several examples demonstrating common use cases:
Running Examples
- Clone the repository:
git clone https://github.com/yourusername/modern-migration.git
cd modern-migration
- Install dependencies:
npm install
- Create a
.env
file:
ANTHROPIC_API_KEY=your-api-key-here
- Run examples:
# Run the basic state codes example
npm run example:state-codes
# Run other specialized examples
npm run example:company-names
npm run example:product-categorization
npm run example:address-parser
npm run example:transaction-categorizer
# Or run the default example with instructions
npm run examples
Available Examples
Modern Migration includes examples for various real-world use cases:
- State Code Transformation: Convert state names to two-letter codes
- Company Name Standardization: Normalize company names to official versions
- Product Categorization: Assign taxonomy codes to product descriptions
- Address Parsing: Parse and classify address formats
- Transaction Categorization: Categorize financial transactions
Example: State Code Transformation
import { ModernMigration } from 'modern-migration'
const migration = new ModernMigration({
migrationName: 'state-codes',
options: ['AL', 'AK', 'AZ' /* ... */],
prompt: 'Convert state name to two-letter code',
confidenceThreshold: 0.95,
apiKey: process.env.ANTHROPIC_API_KEY,
})
const stateNames = [
'New York',
'California',
'Mass.',
'Fla.',
'Washington, D.C.', // Will be flagged for review
]
const results = await migration.transform(stateNames)
Example: Company Name Standardization
import { ModernMigration } from 'modern-migration'
const migration = new ModernMigration({
migrationName: 'company-names',
options: ['Apple', 'Microsoft', 'Google', 'IBM', 'Other' /* ... */],
prompt: 'Standardize company names to their official names',
confidenceThreshold: 0.92,
apiKey: process.env.ANTHROPIC_API_KEY,
})
const companyNames = [
'Apple Inc.',
'MSFT',
'International Business Machines',
'The Adobe Company',
]
const results = await migration.transform(companyNames)
More examples can be found in the examples directory.
š§ Development
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Start development server
npm run dev
# Format code
npm run format
# Run linter
npm run lint
Project Structure
modern-migration/
āāā src/
ā āāā config/ # Configuration management
ā āāā transform/ # Core transformation logic
ā āāā web/ # Web review interface
ā āāā index.ts # Main entry point
āāā examples/ # Example implementations
āāā tests/ # Test suite
āāā dist/ # Compiled output
š¤ Contributing
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
Development Process
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
š License
This project is licensed under the ISC License - see the LICENSE file for details.
š Acknowledgments
- Anthropic for Claude API
- LangChain for LLM tooling
- All contributors who have helped with code, bug reports, and suggestions
Built with ā¤ļø using Claude
1.0.0
7 months ago