0.0.4 • Published 11 years ago

openbookprices-fetchers v0.0.4

Weekly downloads
3
License
AGPLv3
Repository
github
Last release
11 years ago

OpenBookPrices Price Fetchers

Build Status

This repository contains the code that is used to fetch the price and other information for a particular book from a particular vendor.

How the price is fetched varies from an API call to scraping the vendor's website.

The extracted information is returned in a standard form that can then be cleaned up as needed.

Return format

The response is a hash, looking something like this:

{
  "args": {
    "country": "GB",
    "currency": "GBP",
    "isbn": "9780330508537",
    "vendor": "foyles"
  },
  "authors": [
    "Douglas Adams"
  ],
  "title": "The Hitchhiker's Guide to the Galaxy",
  "url": "http://www.foyles.co.uk/witem/fiction-poetry/the-hitchhikers-guide-to-the-galaxy,douglas-adams-9780330508537",
  "entries": [
    {
      "countries": [
        "GB"
      ],
      "currency": "GBP",
      "offers": {
        "new": {
          "availabilityNote": "Despatched in 1 business day.",
          "price": 5.83,
          "shipping": 2.5,
          "shippingNote": "Free second class delivery in the UK for orders over £10",
          "total": 8.33
        }
      },
      "isbn": "9780330508537",
      "timestamp": 1371993232,
      "ttl": 86400,
      "url": "http://www.foyles.co.uk/witem/fiction-poetry/the-hitchhikers-guide-to-the-galaxy,douglas-adams-9780330508537",
      "vendor": "foyles"
    },
    ...
  ]
}

Guide to the keys

authors and title

Text. These fields are optional, but it is nice if they are included.

entries

Array. The entries array should contain an entry for each combination of country and currency (note that countries is an array, as often lots of countries will have the same price). The availability flag is true if the vendor can supply this book, false otherwise. What canSell means is something that each scrapers needs to determine, but generally true mean that they can sell it, and false means they can (don't stock it, book not found on their site, out of stock etc).

The 'shipping' and hence 'total' returned assume that you are only buying the one book. The shippingNote should be used to clarify if there are discounts to be had for buying more (as in the above example).

The url is to the page on the vendor's site for this book.

Note that the format is similar to, but different from, the format returned by the OpenBookPrices API.

Proxy

During development it is convenient to run a proxy that cache all requests so that the time taken to run a scraper is much shorter, and it is politer to the target site too. Polipo works well for this:

# install using your package manager of choice, in this case brew
brew install polipo

# run polipo in a separate terminal telling it to cache everything once fetched
polipo -- relaxTransparency=true logLevel=0xFF idleTime=1s

# in the terminal where you run the scripts set the env variable
export http_proxy=http://localhost:8123/

# When you want to clear the cache just delete the files (adapt to your system)
rm -r /usr/local/var/cache/polipo/*