0.0.7 • Published 5 months ago

@cheepcode/ask-screen v0.0.7

Weekly downloads
-
License
MIT
Repository
-
Last release
5 months ago

ask-screen

TS library to ask AI about what is on the browser screen. Helps agents author end-to-end tests and interact with websites.

Uses OpenAI's API to ask questions and get answers. Requires an OpenAI API key. Works with Playwright or as a standalone browser library.

Installation

npm install ask-screen

Usage

Initialize the AskScreen instance:

import { AskScreen } from "ask-screen";

const askScreen = new AskScreen({
  openaiApiKey: "your-openai-api-key",
  openaiModel: "o4-mini", // optional, defaults to "o4-mini"
  scale: 0.75, // optional, defaults to 0.75
  page: playwrightPage, // optional Playwright page instance
});

Get a description of the screen:

const description = await askScreen.description();
console.log(description);

// Or provide your own image
const description = await askScreen.description({
  imageUrlBase64: "data:image/png;base64,...",
});

Ask a single yes/no question about what is on the screen:

const answer = await askScreen.boolean({
  question: 'Is there a button with text "Click me" on the screen?',
});
console.log(answer);

// Or provide your own image
const answer = await askScreen.boolean({
  question: 'Is there a button with text "Click me" on the screen?',
  imageUrlBase64: "data:image/png;base64,...",
});

Ask a numeric question about what is on the screen:

const answer = await askScreen.numeric({
  question: "How many buttons are on the screen?",
});
console.log(answer);

Ask a multiple choice question about what is on the screen:

const answer = await askScreen.multipleChoice({
  question: "Which of the following text elements do you see on the screen?",
  options: [
    'A button with text "Click me"',
    'A text input with placeholder "Enter your name"',
    'A checkbox with label "I agree to the terms and conditions"',
  ],
});
// Returns the 0-based index of the selected option
console.log(answer);

Ask an open-ended question about what is on the screen:

const answer = await askScreen.open({
  question: "What are the top stories on the homepage?",
});
console.log(answer);

License

This project is licensed under the MIT License. See the LICENSE file for details.

Copyright 2025 Lovetap, LLC.

0.0.7

5 months ago

0.0.6

5 months ago

0.0.5

5 months ago

0.0.4

5 months ago

0.0.3

5 months ago

0.0.2

5 months ago

0.0.1

5 months ago