0.0.7 • Published 10 months ago

@cheepcode/ask-screen v0.0.7

Weekly downloads
-
License
MIT
Repository
-
Last release
10 months ago

ask-screen

TS library to ask AI about what is on the browser screen. Helps agents author end-to-end tests and interact with websites.

Uses OpenAI's API to ask questions and get answers. Requires an OpenAI API key. Works with Playwright or as a standalone browser library.

Installation

npm install ask-screen

Usage

Initialize the AskScreen instance:

import { AskScreen } from "ask-screen";

const askScreen = new AskScreen({
  openaiApiKey: "your-openai-api-key",
  openaiModel: "o4-mini", // optional, defaults to "o4-mini"
  scale: 0.75, // optional, defaults to 0.75
  page: playwrightPage, // optional Playwright page instance
});

Get a description of the screen:

const description = await askScreen.description();
console.log(description);

// Or provide your own image
const description = await askScreen.description({
  imageUrlBase64: "data:image/png;base64,...",
});

Ask a single yes/no question about what is on the screen:

const answer = await askScreen.boolean({
  question: 'Is there a button with text "Click me" on the screen?',
});
console.log(answer);

// Or provide your own image
const answer = await askScreen.boolean({
  question: 'Is there a button with text "Click me" on the screen?',
  imageUrlBase64: "data:image/png;base64,...",
});

Ask a numeric question about what is on the screen:

const answer = await askScreen.numeric({
  question: "How many buttons are on the screen?",
});
console.log(answer);

Ask a multiple choice question about what is on the screen:

const answer = await askScreen.multipleChoice({
  question: "Which of the following text elements do you see on the screen?",
  options: [
    'A button with text "Click me"',
    'A text input with placeholder "Enter your name"',
    'A checkbox with label "I agree to the terms and conditions"',
  ],
});
// Returns the 0-based index of the selected option
console.log(answer);

Ask an open-ended question about what is on the screen:

const answer = await askScreen.open({
  question: "What are the top stories on the homepage?",
});
console.log(answer);

License

This project is licensed under the MIT License. See the LICENSE file for details.

Copyright 2025 Lovetap, LLC.

0.0.7

10 months ago

0.0.6

10 months ago

0.0.5

10 months ago

0.0.4

10 months ago

0.0.3

10 months ago

0.0.2

10 months ago

0.0.1

10 months ago