0.3.0 • Published 4 months ago

@udx/mcurl v0.3.0

Weekly downloads
-
License
MIT
Repository
github
Last release
4 months ago

mCurl

curl but in markdown - fetches content from URLs and converts to markdown.

By default, mCurl returns the body/article/content of any URL that's HTML and converts it to valid Markdown. If the response is in JSON, it also converts it to Markdown in a readable format.

Features

  • Converts HTML to Markdown, extracting main content by default
  • Converts JSON to Markdown in a structured, readable format
  • Pluggable handlers for different content types (extensible architecture)
  • Selector support for extracting specific parts of HTML
  • Global config file support (~/.udx/mcurl.yml) for domain-specific settings
  • Options to include site's menu/header/footer
  • Designed for AI tool integration and chaining with SERP API responses
  • Customizable headers, user agents, and authentication

Installation

# Install globally
npm install -g @udx/mcurl
# After installation, the command is available as 'mcurl'

After installation, you can use the command mcurl directly.

Usage

# Basic usage - fetch a URL and convert to markdown
mcurl https://udx.io

# Extract specific content using CSS selector
mcurl --selector "article.main-content" https://udx.io/about

# Include header and footer
mcurl --include header,footer https://udx.io

# Use custom headers
mcurl --header "Authorization: Bearer token" https://api.example.com

# Save output to file
mcurl --output result.md https://udx.io

# Use custom user agent
mcurl --user-agent "MyBot/1.0" https://udx.io

# Verbose output
mcurl --verbose https://udx.io

# Use like curl with JSON response converted to markdown
mcurl -X POST "https://api.example.com/endpoint" -H "Content-Type: application/json" -d '{"query": "example"}'

# WARNING: Never use internal development ports (like localhost:3456) in production or share them externally
# Always use public API endpoints for production use

# Example with Elasticsearch-like query (for development environments)
mcurl -X POST "http://localhost:3456/harvest-v1/_search" -H "Content-Type: application/json" -d '{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {"range": {"date": {"gte": "2024-01-01", "lte": "2024-12-31"}}}
      ]
    }
  },
  "aggs": {
    "by_billable": {
      "terms": {
        "field": "billable"
      },
      "aggs": {
        "hours_sum": {"sum": {"field": "hours"}},
        "cost_estimate": {"sum": {"script": {"source": "doc[\"hours\"].value * 85"}}},
        "revenue": {"sum": {"field": "billableAmount"}}
      }
    }
  }
}'

# Complex Elasticsearch-like queries (for production)
mcurl -X POST "https://api.example.com/search" -H "Content-Type: application/json" -d '{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {"range": {"date": {"gte": "2024-01-01", "lte": "2024-12-31"}}}
      ]
    }
  },
  "aggs": {
    "by_billable": {
      "terms": {
        "field": "billable"
      },
      "aggs": {
        "hours_sum": {"sum": {"field": "hours"}},
        "cost_estimate": {"sum": {"script": {"source": "doc[\"hours\"].value * 85"}}},
        "revenue": {"sum": {"field": "billableAmount"}}
      }
    }
  }
}'

Configuration File

mCurl supports a global configuration file at ~/.udx/mcurl.yml that allows you to specify default options and domain-specific settings:

# Global defaults
user-agent: "mCurl/1.0"
timeout: 30000
format: markdown

# Domain-specific settings
domains:
  "udx.io":
    headers:
      - "Cookie: session=abc123"
    selector: "main.content"
  
  "api.example.com":
    headers:
      - "Authorization: Bearer YOUR_TOKEN"
    user-agent: "CustomBot/2.0"

Examples

HTML Example

# Fetch and convert HTML to markdown
mcurl https://udx.io

Output:

# UDX - Digital Experience Agency

We build digital experiences that transform businesses and drive growth.

## Our Services

- Web Development
- Digital Marketing
- UX Design
- Cloud Solutions

JSON Example

# Fetch and convert JSON to markdown
mcurl https://udx.io/wp-json/udx/v2/works/search?query=&page=0

Output:

# JSON Response

## results

| id | title | excerpt | ... |
| --- | --- | --- | --- |
| 123 | Project A | This is project A | ... |
| 456 | Project B | This is project B | ... |
...

## pagination

```json
{
  "currentPage": 0,
  "totalPages": 5,
  "totalItems": 48
}
## API

mCurl can be used programmatically:

```javascript
import { registerContentHandler } from 'mcurl';

// Register a custom content handler
registerContentHandler('application/pdf', async (response, options) => {
  // Custom PDF handling logic
  return '# PDF Content\n\nPDF content extracted...';
});

Integration with AI Tools

mCurl is designed to be easily integrated with AI tools like GPT, Cursor, or Devin:

# Fetch content and pipe to AI tools
mcurl https://udx.io | cursor prompt "Summarize this content"

# Chain with SERP API for knowledge building
curl "https://serpapi.com/search?q=UDX+Digital+Agency&api_key=$SERPAPI_API_KEY" | \
  jq -r '.organic_results[0].link' | \
  xargs mcurl | \
  cursor prompt "Extract key information about UDX"

See the examples directory for more advanced integration examples, including:

  • SERP API knowledge building
  • Custom content handlers
  • Configuration examples
  • Automated documentation generation

More Examples

For more detailed examples and use cases, check out the examples directory:

# View examples
cd examples
node serp-knowledge-builder.js "Your search query" output.md

License

MIT