0.3.0 • Published 4 months ago
@udx/mcurl v0.3.0
mCurl
curl but in markdown - fetches content from URLs and converts to markdown.
By default, mCurl returns the body/article/content of any URL that's HTML and converts it to valid Markdown. If the response is in JSON, it also converts it to Markdown in a readable format.
Features
- Converts HTML to Markdown, extracting main content by default
- Converts JSON to Markdown in a structured, readable format
- Pluggable handlers for different content types (extensible architecture)
- Selector support for extracting specific parts of HTML
- Global config file support (~/.udx/mcurl.yml) for domain-specific settings
- Options to include site's menu/header/footer
- Designed for AI tool integration and chaining with SERP API responses
- Customizable headers, user agents, and authentication
Installation
# Install globally
npm install -g @udx/mcurl
# After installation, the command is available as 'mcurl'
After installation, you can use the command mcurl
directly.
Usage
# Basic usage - fetch a URL and convert to markdown
mcurl https://udx.io
# Extract specific content using CSS selector
mcurl --selector "article.main-content" https://udx.io/about
# Include header and footer
mcurl --include header,footer https://udx.io
# Use custom headers
mcurl --header "Authorization: Bearer token" https://api.example.com
# Save output to file
mcurl --output result.md https://udx.io
# Use custom user agent
mcurl --user-agent "MyBot/1.0" https://udx.io
# Verbose output
mcurl --verbose https://udx.io
# Use like curl with JSON response converted to markdown
mcurl -X POST "https://api.example.com/endpoint" -H "Content-Type: application/json" -d '{"query": "example"}'
# WARNING: Never use internal development ports (like localhost:3456) in production or share them externally
# Always use public API endpoints for production use
# Example with Elasticsearch-like query (for development environments)
mcurl -X POST "http://localhost:3456/harvest-v1/_search" -H "Content-Type: application/json" -d '{
"size": 0,
"query": {
"bool": {
"must": [
{"range": {"date": {"gte": "2024-01-01", "lte": "2024-12-31"}}}
]
}
},
"aggs": {
"by_billable": {
"terms": {
"field": "billable"
},
"aggs": {
"hours_sum": {"sum": {"field": "hours"}},
"cost_estimate": {"sum": {"script": {"source": "doc[\"hours\"].value * 85"}}},
"revenue": {"sum": {"field": "billableAmount"}}
}
}
}
}'
# Complex Elasticsearch-like queries (for production)
mcurl -X POST "https://api.example.com/search" -H "Content-Type: application/json" -d '{
"size": 0,
"query": {
"bool": {
"must": [
{"range": {"date": {"gte": "2024-01-01", "lte": "2024-12-31"}}}
]
}
},
"aggs": {
"by_billable": {
"terms": {
"field": "billable"
},
"aggs": {
"hours_sum": {"sum": {"field": "hours"}},
"cost_estimate": {"sum": {"script": {"source": "doc[\"hours\"].value * 85"}}},
"revenue": {"sum": {"field": "billableAmount"}}
}
}
}
}'
Configuration File
mCurl supports a global configuration file at ~/.udx/mcurl.yml
that allows you to specify default options and domain-specific settings:
# Global defaults
user-agent: "mCurl/1.0"
timeout: 30000
format: markdown
# Domain-specific settings
domains:
"udx.io":
headers:
- "Cookie: session=abc123"
selector: "main.content"
"api.example.com":
headers:
- "Authorization: Bearer YOUR_TOKEN"
user-agent: "CustomBot/2.0"
Examples
HTML Example
# Fetch and convert HTML to markdown
mcurl https://udx.io
Output:
# UDX - Digital Experience Agency
We build digital experiences that transform businesses and drive growth.
## Our Services
- Web Development
- Digital Marketing
- UX Design
- Cloud Solutions
JSON Example
# Fetch and convert JSON to markdown
mcurl https://udx.io/wp-json/udx/v2/works/search?query=&page=0
Output:
# JSON Response
## results
| id | title | excerpt | ... |
| --- | --- | --- | --- |
| 123 | Project A | This is project A | ... |
| 456 | Project B | This is project B | ... |
...
## pagination
```json
{
"currentPage": 0,
"totalPages": 5,
"totalItems": 48
}
## API
mCurl can be used programmatically:
```javascript
import { registerContentHandler } from 'mcurl';
// Register a custom content handler
registerContentHandler('application/pdf', async (response, options) => {
// Custom PDF handling logic
return '# PDF Content\n\nPDF content extracted...';
});
Integration with AI Tools
mCurl is designed to be easily integrated with AI tools like GPT, Cursor, or Devin:
# Fetch content and pipe to AI tools
mcurl https://udx.io | cursor prompt "Summarize this content"
# Chain with SERP API for knowledge building
curl "https://serpapi.com/search?q=UDX+Digital+Agency&api_key=$SERPAPI_API_KEY" | \
jq -r '.organic_results[0].link' | \
xargs mcurl | \
cursor prompt "Extract key information about UDX"
See the examples
directory for more advanced integration examples, including:
- SERP API knowledge building
- Custom content handlers
- Configuration examples
- Automated documentation generation
More Examples
For more detailed examples and use cases, check out the examples
directory:
# View examples
cd examples
node serp-knowledge-builder.js "Your search query" output.md
License
MIT