0.0.7 • Published 5 months ago
@cheepcode/ask-screen v0.0.7
ask-screen
TS library to ask AI about what is on the browser screen. Helps agents author end-to-end tests and interact with websites.
Uses OpenAI's API to ask questions and get answers. Requires an OpenAI API key. Works with Playwright or as a standalone browser library.
Installation
npm install ask-screenUsage
Initialize the AskScreen instance:
import { AskScreen } from "ask-screen";
const askScreen = new AskScreen({
openaiApiKey: "your-openai-api-key",
openaiModel: "o4-mini", // optional, defaults to "o4-mini"
scale: 0.75, // optional, defaults to 0.75
page: playwrightPage, // optional Playwright page instance
});Get a description of the screen:
const description = await askScreen.description();
console.log(description);
// Or provide your own image
const description = await askScreen.description({
imageUrlBase64: "data:image/png;base64,...",
});Ask a single yes/no question about what is on the screen:
const answer = await askScreen.boolean({
question: 'Is there a button with text "Click me" on the screen?',
});
console.log(answer);
// Or provide your own image
const answer = await askScreen.boolean({
question: 'Is there a button with text "Click me" on the screen?',
imageUrlBase64: "data:image/png;base64,...",
});Ask a numeric question about what is on the screen:
const answer = await askScreen.numeric({
question: "How many buttons are on the screen?",
});
console.log(answer);Ask a multiple choice question about what is on the screen:
const answer = await askScreen.multipleChoice({
question: "Which of the following text elements do you see on the screen?",
options: [
'A button with text "Click me"',
'A text input with placeholder "Enter your name"',
'A checkbox with label "I agree to the terms and conditions"',
],
});
// Returns the 0-based index of the selected option
console.log(answer);Ask an open-ended question about what is on the screen:
const answer = await askScreen.open({
question: "What are the top stories on the homepage?",
});
console.log(answer);License
This project is licensed under the MIT License. See the LICENSE file for details.
Copyright 2025 Lovetap, LLC.