@mingzilla/llm-stream-processor-js v1.0.4
llm-stream-processor-js
A lightweight utility for processing streaming responses from Large Language Models (LLMs), with special handling for <think> blocks and content parsing.
Features
- Process streaming LLM responses with callback-based event handling
- Intelligently parse and separate
<think>blocks from final content - Automatic JSON detection in content responses
- Support for chunk prefixes and end delimiters common in SSE streams
- Zero dependencies
- Works directly in the browser without bundling
- TypeScript declarations included
Installation
Direct inclusion in HTML
<script src="https://cdn.jsdelivr.net/gh/mingzilla/llm-stream-processor-js@latest/llm-stream-processor.js"></script>TypeScript Support for Direct Inclusion
If you're using TypeScript with direct script inclusion, you can reference the type definitions in one of these ways:
Download the definition file and place it in your project, then reference it in your
tsconfig.json:{ "compilerOptions": { "typeRoots": ["./typings", "./node_modules/@types"] } }And create a folder structure:
your-project/ ├── typings/ │ └── llm-stream-processor-js/ │ └── index.d.ts // Copy contents from llm-stream-processor.d.tsReference the declaration file directly using a triple-slash directive:
/// <reference path="./typings/llm-stream-processor.d.ts" />Use the CDN for the declaration file:
// In your TypeScript file declare module 'llm-stream-processor-js'; // Then add a reference in your HTML // <script src="https://cdn.jsdelivr.net/gh/mingzilla/llm-stream-processor-js@latest/llm-stream-processor.js"></script>
NPM
npm install @mingzilla/llm-stream-processor-jsUsage
Basic Usage with Streaming API
It works well with api-client-js.
// Create a stream processor instance
const processor = LlmStreamProcessor.createInstance({
chunkPrefix: "data: ", // Optional: Strip this prefix from each chunk (common in SSE)
endDelimiter: "[DONE]" // Optional: String that signals the end of the stream
});
// Process streaming response from an LLM API
let contentWithoutThinkBlock;
ApiClient.stream(
ApiClientInput.postJson('https://api.example.com/llm/generate', {
prompt: "Explain quantum computing. <think>I should start with the basics.</think>"
}, {
'Accept': 'text/event-stream'
}),
() => console.log('Stream started'), // onStart
(chunk) => {
// Process each chunk through the LLM processor
processor.processChunk(
chunk,
() => console.log('Processing started'),
() => console.log('Think block started'),
(thinkChunk) => console.log('Think chunk:', thinkChunk),
(fullThinkText) => console.log('Think complete:', fullThinkText),
() => console.log('Content started'),
(contentChunk) => {
console.log('Content chunk:', contentChunk);
// Update UI with new content
document.getElementById('response').innerText += contentChunk;
},
(fullContent, parsedJson) => {
console.log('Content complete:', fullContent);
contentWithoutThinkBlock = fullContent;
},
(fullThink, fullContent, parsedJson) => console.log('All complete'),
(error) => console.error('Error:', error)
);
},
(fullResponse) => {
// When the stream is complete, finalize processing. This triggers 'Content complete' to be executed
processor.finalize();
// if you want to exclude the <think> block from the fullResonse, do the below
fullResponse.body = contentWithoutThinkBlock;
// ...
},
(error) => {
processor.finalize(); // if you want the error case to also trigger completion.
console.error('Stream error:', error);
}
);Handling Server-Sent Events (SSE)
Many LLM APIs use Server-Sent Events (SSE) for streaming. The processor can handle SSE format:
const processor = LlmStreamProcessor.createInstance({
chunkPrefix: "data: ", // Remove "data: " prefix from SSE events
endDelimiter: "[DONE]" // Common end signal in SSE streams
});
// Now process chunks as they come in...Extracting JSON From Responses
The processor automatically attempts to parse JSON in the content:
processor.processChunk(
chunk,
// ...other callbacks...
(fullContent, parsedJson) => {
if (parsedJson) {
// The response contained valid JSON
console.log('Parsed JSON:', parsedJson);
// For example, extracting choices from an OpenAI-like response
if (parsedJson.choices && parsedJson.choices[0]) {
const generatedText = parsedJson.choices[0].message.content;
document.getElementById('response').innerText = generatedText;
}
}
},
// ...other callbacks...
);API Reference
LlmStreamProcessor
The main class for processing LLM streaming responses.
Static Methods
createInstance(options): Create a new processor instance with optional configuration
Instance Methods
processChunk(rawChunk, callbacks...): Process raw server response that may contain multiple JSON-formatted messagesread(chunk, callbacks...): Process a chunk of plain text content, not raw JSON responses.finalize(): Finalize processing and trigger completion callbacks
Configuration Options
When creating a processor with createInstance(), you can provide:
chunkPrefix: String prefix to strip from each chunk (e.g., "data: " for SSE)endDelimiter: String that signals the end of the stream (e.g., "DONE")
Callback Parameters for processChunk and read()
onStart: Called when processing beginsonThinkStart: Called when a think block startsonThinkChunk: Called with each chunk inside a think blockonThinkFinish: Called when a think block completesonContentStart: Called when content outside think blocks startsonContentChunk: Called with each chunk outside think blocksonContentFinish: Called when content is finished, with optional parsed JSONonFinish: Called when all processing is completeonFailure: Called if an error occurs
How It Works
- The processor identifies
<think>and</think>tags in the stream - Content inside these tags is separated and provided in think-related callbacks
- Content outside these tags is treated as the actual response
- When the stream completes, the processor attempts to parse any JSON in the content
- All accumulated content is provided to completion callbacks
License
MIT
Author
Ming Huang (mingzilla)