0.0.1 • Published 5 months ago

@jankhoj/maloomscan v0.0.1

Weekly downloads
-
License
Apache-2.0
Repository
github
Last release
5 months ago

maloomscan: Transcripts to Intelligent Notes

Note: This utility is currently in pre-release status. Features and interfaces may change before the final release.

maloomscan is a powerful command-line utility that transforms audio recordings into intelligent, context-enhanced notes. It uses AI to transcribe, classify, and enhance audio content, making it more useful and actionable.

How It Works

maloomscan processes each audio file through three distinct phases:

1. Locate Phase

The locate phase handles the initial setup for processing:

  • Extracts the creation time from the audio file metadata
  • Calculates a unique hash identifier for the file
  • Determines the appropriate output directory based on configured structure
  • Constructs the base filename for the output files

2. Classify Phase

The classify phase transforms audio into structured data:

  • Transcribes the audio file using OpenAI's Whisper model
  • Sends the transcription to an AI model with a "classifier" persona
  • Analyzes the content to identify the note type (meeting, email, call, etc.)
  • Extracts key metadata like subject, attendees, and sections
  • Stores the classification results as a JSON file for reference

3. Compose Phase

The compose phase creates the final intelligent note:

  • Takes the classified transcription from the previous phase
  • Selects type-specific instructions based on classification (meeting, email, etc.)
  • Applies a "you" persona to represent the speaker's voice
  • Generates a well-structured, enhanced markdown note
  • Formats the content according to the note type's requirements

Each phase builds on the previous one, gradually transforming raw audio into a useful, structured note. If you run in debug mode, you can examine the intermediate files created during each phase.

Features

  • Simple command-line interface
  • Support for multiple audio formats
  • Automatic content classification
  • Type-specific note formatting
  • Recursive directory processing
  • Configurable AI models

Installation

Requirements

  • Node.js
  • OpenAI API key (set in .env file)

Option 1: Install from npm (Recommended)

# Install globally
npm install -g @jafarisimran/maloomscan

# Create a .env file with your OpenAI API key in your working directory
echo "OPENAI_API_KEY=your-api-key" > .env

Option 2: Install from Source

# Clone the repository
git clone https://github.com/jafarisimran/maloomscan.git
cd maloomscan

# Install dependencies
npm install

# Build the project
npm run build

# Create a .env file with your OpenAI API key
echo "OPENAI_API_KEY=your-api-key" > .env

Usage

If installed globally:

# Run directly
maloomscan --input-directory ./recordings --output-directory ./notes

# Or run with npx
npx @jafarisimran/maloomscan --input-directory ./recordings --output-directory ./notes

If not installed globally:

# Run with npx without installing
npx @jafarisimran/maloomscan --input-directory ./recordings --output-directory ./notes

Additional options:

# Process files recursively
maloomscan --input-directory ./recordings --output-directory ./notes --recursive

# Enable verbose logging
maloomscan --input-directory ./recordings --output-directory ./notes --verbose

# Specify custom AI model
maloomscan --input-directory ./recordings --output-directory ./notes --model gpt-4

Output Files

maloomscan generates two output files for each processed audio file:

  1. JSON Classification File (filename.json):

    • Contains structured data about the transcript
    • Includes classification of the note type (meeting, call, email, etc.)
    • Stores extracted metadata such as:
      • Meeting attendees
      • Subject/topic
      • Conference tool used (Zoom, Teams, etc.)
      • Recipients (for email type)
      • Tasks and their urgency/status
      • Content sections
    • Preserves the original transcript text
  2. Markdown Note File (filename.md):

    • Contains the enhanced, formatted version of the transcript
    • Organized according to the note type
    • Includes relevant sections, headers, and formatting
    • Ready for use in note-taking applications or knowledge bases

The output files are saved to the directory specified with the --output-directory option.

Command Line Options

maloomscan provides a variety of command line options to customize its behavior:

OptionDescriptionDefault
-i, --input-directory <dir>Input directory containing audio files./recordings
-o, --output-directory <dir>Output directory for generated files./notes
-r, --recursiveProcess files recursively in input directoryfalse
-a, --audio-extensions [ext...]Audio extensions to processmp3,mp4,wav,m4a
--model <model>OpenAI model to use for all operationsgpt-4
--classify-model <model>Specific model for classification phaseSame as --model
--compose-model <model>Specific model for composition phaseSame as --model
--transcription-model <model>OpenAI transcription modelwhisper-1
--output-structure <type>Output directory structurenone
--filename-options [options...]Filename format optionsdate,time,subject
--context-directories [dirs...]Directories to search for context files[]
--config-dir <dir>Configuration directory~/.maloomscan
--overridesAllow overrides of default configurationfalse
--openai-api-key <key>OpenAI API keyFrom env var
--timezone <timezone>Timezone for date calculationsUTC
--dry-runPerform a dry run without saving filesfalse
--verboseEnable verbose loggingfalse
--debugEnable debug loggingfalse

Examples

Process all audio files in the current directory:

maloomscan --input-directory . --output-directory ./notes

Process files recursively with verbose logging:

maloomscan --input-directory ./recordings --output-directory ./notes --recursive --verbose

Specify different models for classification and composition phases:

maloomscan --classify-model gpt-4 --compose-model gpt-3.5-turbo --input-directory ./recordings

Use a single model for all AI operations:

maloomscan --model gpt-4-turbo --input-directory ./recordings

Organize output files by date structure:

maloomscan --input-directory ./recordings --output-structure month

Customize filename format:

maloomscan --input-directory ./recordings --filename-options "time subject"

Use a custom configuration directory:

maloomscan --input-directory ./recordings --config-dir ~/my-maloomscan-config

Add context from existing knowledge:

maloomscan --input-directory ./recordings --context-directories ./my-notes ./project-docs

Debugging and Verbose Output

maloomscan provides options to control the level of output detail:

Verbose Mode

Use --verbose when you need more detailed information about the processing:

maloomscan --input-directory ./recordings --verbose

Verbose mode provides additional information about each step of the process, including:

  • File discovery and validation
  • Transcription progress
  • Classification decisions
  • Composition details

Debug Mode

Use --debug when you need to inspect the actual prompts and responses sent to and received from the AI models:

maloomscan --input-directory ./recordings --debug

Important: Debug mode creates additional files in your output directory:

  • filename.request.json: The prompts sent to the AI models
  • filename.response.json: The raw responses received from the AI models

Debug mode is particularly useful for:

  • Troubleshooting issues with classification or composition
  • Understanding how the AI interprets your audio content
  • Customizing or extending maloomscan's functionality

Output Organization

maloomscan provides flexible options for organizing your output files using the --output-structure and --filename-options parameters.

Output Directory Structure

The --output-structure option determines how files are organized in subdirectories:

OptionDescriptionExample Path
noneAll files in the output directory (default)./notes/meeting.md
yearOrganize by year./notes/2023/meeting.md
monthOrganize by year and month./notes/2023/07/meeting.md
dayOrganize by year, month, and day./notes/2023/07/15/meeting.md

Examples:

# All files in a flat structure
maloomscan --output-structure none

# Organize by year
maloomscan --output-structure year

# Organize by year and month
maloomscan --output-structure month

# Organize by year, month, and day
maloomscan --output-structure day

Important Note on Filenames: All output files include a mandatory hash code in the filename that uniquely identifies and relates the output to the original audio file input. This hash ensures that files can be properly tracked and associated with their source recordings.

For example, an actual filename might look like:

2023-07-15_143027_weekly_standup_a7f3b2c1.md

Where:

  • 2023-07-15 is the date (if date option enabled)
  • 143027 is the time (if time option enabled)
  • weekly_standup is the subject (if subject option enabled)
  • a7f3b2c1 is the hash code (always included)

The examples below use simplified filenames for clarity.

Filename Options

The --filename-options parameter controls what information is included in filenames:

OptionDescriptionExample
dateInclude date (YYYY-M-D)2023-07-15-2d4ee3.md
timeInclude time (HHmm)1430-2d4ee3.md
subjectInclude subject from classification2d4ee3-weekly_standup.md
date, subjectInclude both the date and the subject15-2d4ee3-weekly_standup.md

You can combine these options in any order:

# Include only date in filename
maloomscan --filename-options date

# Include date and subject
maloomscan --filename-options date subject

# Include all options
maloomscan --filename-options date time subject

Important Note on Date Format: When using the date filename option, the date format changes based on the selected --output-structure to avoid redundant information:

Output StructureDate Format in FilenameExample
noneYYYY-M-D2023-7-15-2d4ee3.md
yearM-D7-15-2d4ee3.md
monthD15-2d4ee3.md
day(date option disabled)This will cause an error

This ensures that date information already represented in the directory structure is not duplicated in the filename.

Important Note on Filenames: All output files include a mandatory hash code in the filename that uniquely identifies and relates the output to the original audio file input. This hash ensures that files can be properly tracked and associated with their source recordings.

For example, an actual filename might look like:

2023-07-15_143027_weekly_standup_a7f3b2c1.md

Where:

  • 2023-07-15 is the date (if date option enabled)
  • 143027 is the time (if time option enabled)
  • weekly_standup is the subject (if subject option enabled)
  • a7f3b2c1 is the hash code (always included)

The examples below use simplified filenames for clarity.

Example Combinations

Here are some examples of how different combinations will structure your files:

Example 1: Flat structure with date and subject

maloomscan --output-structure none --filename-options date subject

Result:

./notes/2023-07-15_weekly_standup.md
./notes/2023-07-16_client_meeting.md

Example 2: Monthly organization with time and subject

maloomscan --output-structure month --filename-options time subject

Result:

./notes/2023/07/143027_weekly_standup.md
./notes/2023/07/093045_client_meeting.md
./notes/2023/08/113012_planning_session.md

Example 3: Daily organization with just subject

maloomscan --output-structure day --filename-options subject

Result:

./notes/2023/07/15/weekly_standup.md
./notes/2023/07/16/client_meeting.md
./notes/2023/08/02/planning_session.md

Note: When using --output-structure day, the date filename option becomes redundant and is automatically disabled.

Context-Enhanced Notes

maloomscan can enhance your notes with relevant context from existing files using the --context-directories option. This feature allows the AI to access and reference information from your knowledge base when processing audio recordings.

How Context Directories Work

When you specify one or more context directories, maloomscan will: 1. Search those directories for relevant files based on the content of your recording 2. Extract information from the most relevant files 3. Use this information to provide additional context for the AI when composing your note 4. Create more informed, connected notes that reference your existing knowledge

This is particularly useful for:

  • Meeting notes that reference previous meetings
  • Project updates that need historical context
  • Ideas that build on previous concepts
  • Any recording that would benefit from connection to your existing notes

Examples

Basic usage with a single context directory:

maloomscan --input-directory ./recordings --context-directories ./my-notes

Using multiple context directories:

maloomscan --input-directory ./recordings --context-directories ./my-notes ./project-docs ./reference-materials

Combined with other options:

maloomscan --input-directory ./recordings --output-directory ./enhanced-notes --context-directories ./my-notes --model gpt-4

Best Practices

For optimal results with context directories:

  • Organize your context files in a way that makes semantic sense
  • Use descriptive filenames and clear content in your context files
  • Consider using the --verbose flag to see which context files are being used
  • Start with smaller context directories before scaling to larger knowledge bases

Configuration and Customization

maloomscan can be customized using a configuration directory.

Configuration Directory

By default, maloomscan looks for configuration files in ./.maloomscan. You can specify a different location:

maloomscan --config-dir ~/my-maloomscan-config

Customizing Instructions

maloomscan uses AI instructions to classify and process your audio recordings. You can customize these instructions in three ways:

  1. Append content - Add additional instructions at the end
  2. Prepend content - Add additional instructions at the beginning
  3. Override content - Completely replace the default instructions (requires --overrides flag)

File Naming Convention

For any instruction file, three variations are supported:

VariationFilename FormatExampleEffect
Overridefilename.mdemail.mdCompletely replaces default content
Prependfilename-pre.mdemail-pre.mdAdds content before default content
Appendfilename-post.mdemail-post.mdAdds content after default content

Example: Customizing Email Type Instructions

To customize how maloomscan processes recordings identified as "email" type:

  1. Create a directory structure in your config directory:

    .maloomscan/
    └── instructions/
        └── types/
            ├── email.md           # Complete override (requires --overrides)
            ├── email-pre.md       # Content to prepend
            └── email-post.md      # Content to append
  2. Run maloomscan with the --overrides flag if using complete overrides:

    maloomscan --input-directory ./recordings --overrides

Example: Customizing You Persona

The "you" persona represents how the system interprets your recordings. To customize:

  1. Create the following structure:

    .maloomscan/
    └── personas/
        └── you/
            ├── traits.md           # Override traits
            ├── traits-pre.md       # Prepend traits
            ├── traits-post.md      # Append traits
            ├── instructions.md     # Override instructions
            ├── instructions-pre.md # Prepend instructions
            └── instructions-post.md # Append instructions
  2. The traits files allow you to customize how the system views the persona of the speaker (you):

    • By default, it assumes you are summarizing your own notes
    • It values everything you record as important
    • It tries to be thorough and accurate in processing notes
  3. The instructions files allow you to customize how the system processes your recordings:

    • How to handle unclear content
    • How to format and structure the output
    • Special rules for processing certain types of content

Warning About Overrides

Complete overrides (using filename.md without the -pre or -post suffix) will replace core functionality and require the --overrides flag. Use with caution:

maloomscan --input-directory ./recordings --overrides

Without this flag, maloomscan will refuse to run if override files exist to prevent accidental changes to core functionality.

Available Customization Files

maloomscan supports customization for the following components:

Personas

PersonaDescriptionFile Paths in configDir
classifierDetermines the type of content in your recordings, this is the person used when analyzing and classifying a raw transcript/personas/classifier/traits.md/personas/classifier/instructions.md
youRepresents the speaker/recorder of the content, this is the persona used when composing a final version of a note./personas/you/traits.md/personas/you/instructions.md

Instructions

InstructionDescriptionFile Path in configDir
classifyInstructions for classifying content/instructions/classify.md
composeInstructions for composing enhanced notes/instructions/compose.md

Note Types

TypeDescriptionFile Path in configDir
meetingMeeting notes and summaries/instructions/types/meeting.md
callCall transcripts and summaries/instructions/types/call.md
updateStatus updates and reports/instructions/types/update.md
ideaPersonal ideas and brainstorming/instructions/types/idea.md
emailEmail drafts and correspondence/instructions/types/email.md
documentDocument outlines and drafts/instructions/types/document.md
otherMiscellaneous content types/instructions/types/other.md

Remember that for each file path above, you can create three versions:

  • The base file (e.g., /personas/you/traits.md) for complete override
  • A "-pre" version (e.g., /personas/you/traits-pre.md) to prepend content
  • A "-post" version (e.g., /personas/you/traits-post.md) to append content

All paths are relative to your configuration directory (default: ./.maloomscan).

Customization Examples

Here are practical examples of how to customize maloomscan for specific needs:

Example 1: Pirate Language for Email Notes

Let's say you want all your email notes to be composed in pirate language. Create the following file:

.maloomscan/
└── personas/
    └── you/
        └── traits-pre.md

Add this content to the traits-pre.md file:

Everything you generate is using Pirate language. And you have to say "arrrr" every few sentences. You are a pirate captain on the Black Flag.

When maloomscan processes an audio file classified as "email" type, the output will now use pirate language.

Example 2: Spanish Translation for Ideas

To have all "idea" type notes translated to Spanish, create:

.maloomscan/
└── instructions/
    └── types/
        └── idea-post.md

Add this content to the idea-post.md file:

Translate all the output to Spanish, maintaining the same markdown formatting structure.

Verifying Customizations

To verify your customizations are being applied correctly:

  1. Run maloomscan in debug mode:

    maloomscan --input-directory ./recordings --debug
  2. Check the output directory for the .request.json files to see if your customized instructions are included in the prompts sent to the AI.

  3. Examine the original source code (especially the files in src/prompt/) to understand the default configurations and how your changes might affect them.

⚠️ Important Caution

Customizing instructions can be powerful but comes with risks:

  • Overriding default instructions may introduce errors or unexpected behavior
  • Complex customizations could conflict with core processing logic
  • Changes to key instructions might reduce the quality of transcription classification
  • Significant modifications might require adjustments to the --model parameter

Start with small changes to the -pre.md and -post.md files before attempting complete overrides, and always test thoroughly with the --debug flag to monitor effects.

Requirements

  • Node.js
  • OpenAI API key (set in .env file)