@rvanbaalen/pdf-renamer v1.0.0
PDF Renamer
A Node.js tool that intelligently reads the contents of invoice PDFs and renames them with a consistent, human-readable format:
yyyy-mm-dd - {Company name} - Invoice {invoice number}.pdf
Where yyyy-mm-dd is the date of the invoice, {Company name} is the name of the company that issued the invoice, and {invoice number} is the invoice number.
Features
- Extracts relevant information from various PDF invoice types
- Renames files with a consistent naming pattern
- Processes single files or entire directories
- Extensible architecture with modular rules for different PDF types
Requirements
- Node.js v22 or higher
- macOS (uses macOS-specific
datecommand) - pdftotext (can be installed via Homebrew:
brew install poppler)
Usage
You can use pdf-renamer without installation using npx:
Single file
npx @rvanbaalen/pdf-renamer /path/to/invoice.pdfMultiple files
# Renames all PDF files in the current directory
npx @rvanbaalen/pdf-renamer .
# Renames all PDF files in the specified directory
npx @rvanbaalen/pdf-renamer /path/to/directoryOptions
# Show help
npx @rvanbaalen/pdf-renamer --help
# Show version
npx @rvanbaalen/pdf-renamer --version
# List all available rule extractors (add-ons)
npx @rvanbaalen/pdf-renamer --addonsInstallation
Install globally (optional)
If you prefer, you can install the tool globally:
npm install -g @rvanbaalen/pdf-renamerThen use it without the npx prefix:
pdf-renamer /path/to/invoice.pdfSupported PDF Types
PDF Renamer includes extractors for various invoice types. To see all available extractors:
npx @rvanbaalen/pdf-renamer --addonsCurrently, PDF Renamer supports the following invoice types:
- Paddle.com invoices and remittance advice
- LanguageTooler invoices
- Stripe invoices
- And more...
More invoice types can be added by creating custom extractors.
Extending with Custom Rules
PDF Renamer uses a modular architecture that makes it easy to add support for new PDF types. Each PDF type has its own extractor class that handles the extraction of information from that specific format.
Creating a Custom Extractor
- Create a new file in the
rules/directory, e.g.,rules/MyCompanyExtractor.js - Extend the
BaseExtractorclass and implement the required methods - Add your extractor to the list in
rules/index.js
Here's an example of a custom extractor:
/**
* MyCompanyExtractor class
*
* Handles extraction for MyCompany PDF invoices
*/
import { BaseExtractor } from './BaseExtractor.js';
export class MyCompanyExtractor extends BaseExtractor {
// Provide a description of what this extractor handles
getDescription() {
return 'Handles MyCompany invoice PDFs';
}
// Determine if this extractor can handle this PDF
canHandle() {
return this.content.includes('MyCompany') ||
this.content.includes('specific text that identifies this PDF type');
}
// Extract the date from the PDF
getDate() {
const layoutContent = this.extractContentWithLayout();
const dateMatch = layoutContent.match(/Invoice Date:\s*(.*)/);
return dateMatch ? dateMatch[1].trim() : '';
}
// Specify the date format for conversion
getDateFormat() {
return '%B %d, %Y'; // For dates like "January 1, 2023"
}
// Get the prefix for the new filename
getFilenamePrefix() {
return 'MyCompany - ';
}
// Get the invoice details for the new filename
getInvoiceDetails() {
const invoiceNumber = this.getInvoiceNumber();
return invoiceNumber ? `Invoice ${invoiceNumber}` : '';
}
// Helper method to extract invoice number
getInvoiceNumber() {
const layoutContent = this.extractContentWithLayout();
const invoiceMatch = layoutContent.match(/Invoice #:\s*(.*)/);
return invoiceMatch ? invoiceMatch[1].trim() : '';
}
}Registering Your Custom Extractor
After creating your extractor, add it to the list in rules/index.js:
import { MyCompanyExtractor } from './MyCompanyExtractor.js';
// Add to the EXTRACTORS array
export const EXTRACTORS = [
// ...existing extractors
MyCompanyExtractor,
];BaseExtractor API
The BaseExtractor class provides the following methods:
- extractContent(): Extracts the text content from the PDF
- extractContentWithLayout(): Extracts the text content with layout preservation
- executeCommand(command): Executes a command and returns the result
- getDescription(): Returns a description of what this extractor handles
- canHandle(): Determines if this extractor can handle the given PDF
- getDate(): Gets the date from the PDF
- getDateFormat(): Gets the date format string for date conversion
- getFilenamePrefix(): Gets the prefix for the new filename
- getInvoiceDetails(): Gets the invoice details for the new filename
You must implement the last five methods in your custom extractor.
Development
To contribute to the project:
# Clone the repository
git clone https://github.com/rvanbaalen/pdf-renamer.git
cd pdf-renamer
# Install dependencies
npm install
# Run the script locally
node pdf-renamer.js /path/to/your/invoice.pdfTroubleshooting
If the script fails to rename a file, check the following:
- Make sure the PDF is a recognized invoice type (use
--addonsto see supported types) - Verify that pdftotext is installed and working correctly (
brew install poppler) - Check if the PDF content is extractable (not scanned or image-based)
- Try adding a custom extractor for your specific PDF format
License
MIT
9 months ago