1.23.0 • Published 2 years ago

homelike-feed-processor v1.23.0

Weekly downloads
-
License
-
Repository
-
Last release
2 years ago

Overview

Feed Manager Service allows providing files with information about our apartments to partners services. The service every hour generates files with information about the apartments in the format required for the partner, uploads files with this information to AWS S3 Bucket and provides endpoints for downloading these files via api. Also, in some cases, the service independently uploads information about our apartments to partner services.

Installation

cmd: npm i
cmd: npm start

cmd: pm2 logs to check logs
cmd: pm2 kill/restart to kill or restart the process

Test Status

cmd: npm test

Deployment / CI

Automatic production deploy when pushed to production branch.

Automatic staging deploy when pushed to staging branch.

Please check config folder for configs for different environments.

Monitoring

cmd: awslogs get -S -w homelike-feed-processor --profile hlnm-staging --aws-region eu-central-1 -SG --query log

Note that you have to have hlnm-staging aws profile in your ~/.aws/credentials file.

Code structure

Path: /

FileDescription
process.debug.ymlcontains env variables for development
process.ymlcontains env variables for staging and production

Path: /.circleci

FileDescription
config.ymlcontains circleci deployment config

Path: /deploy

FileDescription
values-production.yamlcontains deployment config values for production
values.yamlcontains deployment config values for develop and staging

Path: /jest

FileDescription
setup.jscontains jest test suite config

Path: /seeds

FileDescription
./contains seeded data files for feeds
configuration-feed.seed.jsoncontains default queries for feeds. Saved into the feedConfiguration collection if it is empty.

Path: /src

FileDescription
/feedscontains folders for all feeds with their configuration files, exclusive transformers, loaders, filters, etc.
/feeds/index.jscontains exportable collection of feeds configurations
feed-manager.js**

Unused**. Contains methods for feed processing | feed-manager.streamable.js | contains methods for feed processing using streams

Path: /transformers

FileDescription
./contains transformers for feeds
index.jscontains exportable collection of the transformers
apartment.transformer.jscontains base transformer that is used in every feed.
description-energy-certificate.transformer.jscontains a transformer that is used to add energy certificate information into description field.
furnishing.transformer.jscontains a transformer that is used to add furnishing information into description field.
image.transformer.jscontains a transformer that is used to add links to media (photos, floor_plans, tours).
json.transformer.jscontains a transformer that is used to convert items to strings in xml format.
xml.transformer.jscontains a transformer that is used to convert items to strings in xml format.

Path: /filters

FileDescription
./contains filters for feeds
index.jscontains exportable collection of the filters
/schemascontains schemas for filters

Path: /processors

FileDescription
./contains processors for feeds
index.jscontains exportable collection of the processors
csv.processor.jscontains a processor for saving the feed result file in csv format
json.processor.jscontains a processor for saving the feed result file in json format
xml.processor.jscontains a processor for saving the feed result file in xml format
openimmo-xml.processor.jscontains a processor for saving the feed result file in openimmo xml format
sublet-xml.processor.jscontains a processor for saving the feed result file in sublet xml format

Path: /jobs

FileDescription
feed-manager.job.jscontains cron job that runs feed processing every hour
multi-feed-manager.runner.jscontains a manually called job that runs processing for all feeds
single-feed-manager.runner.jscontains a manually called job that runs processing for single feed

Path: /endpoints

FileDescription
index.jscontains configuration for all endpoints
/routers/base.router.jscontains routers for getting feeds, running feed processing and ping
/routers/adwords.router.jscontains routers for adwords feed
/routers/configuration.router.jscontains routers for managing feeds configuration
/middlewarescontains middlewares for endpoints

Path: /uploaders

FileDescription
./contains uploaders for feeds

Path: /services

FileDescription
./contains utilities for working with external services
apartments.service.jscontains utilities for getting data from homelike db
auth.service.jscontains utilities for api authentications
cloudwatch.service.jscontains utilities for working with cloudwatch service
feed-configuration.service.jscontains utilities for working with feedConfiguration collection
ftp.service.jscontains utilities for working with ftp servers
logger.jscontains configured logger
s3.service.jscontains utilities for working with s3 service
translation.service.jscontains utilities for working with translation service

Feed processing

All feed processing initializes every hour through jobs/feed-manager.job.js cron job and starts if isProcessable() function from feed configurations returns true value. Also, the processing of all feeds can be called manually through the /feeds/run endpoint or for one feed through the /feeds/run/:feed endpoint.

Feeds are processed in functions processFeeds() and processFeed() from /src/feed-manager.streamable

Configuration

The most important for feed processing are the channel configurations. Each feed configuration must be placed in src/feeds/feed_name/index.js. This configuration contains the following fields:

FieldTypeMandatoryDescription
nameStringFeed name.
isProcessableFunctionFunction that starts before starting feed processing and shows whether to run feed processing. Should return boolean value. Not checked when feed processing is manually started
authObjectContains authorization config. Fields: basic - mandatory field that contains _

username and password_ fields used to receive feed files. ssl - path to ssl folderftp - credentials for ftp servers | | lang | String | ✅ | Feed language. Used to get description | | format | String | | Format of result file | | query | Function | ✅ | Function that should return an object for the mongodb find request. Unused when there is a query for this feed in the feedConfiguration collection | | options | Object | | Additional data for feed | | load | Function | | Function that runs before feed processing instead of the usual load by query from the database. Must return a stream with data | | preFilters | Array | | List of functions that run before transformations. Items are removed from the data stream if at least one pre-filter returns false | | transformers | Array | | List of functions for data transformations | | postFilters | Array | | List of functions that run after transformations. Items are removed from the data stream if at least one pre-filter returns false | | processor | Function | | Function that converts transformed data to the needed format (csv, xml, json) for saving to S3 | | upload | Function | | Function that runs after data has been uploaded to S3 |

Each feed config should be wrapped using feedDefaults() function from /utils/feed.utils.js. This function provides default feed configuration that can nevertheless be overwritten.

Each index.js of feeds must return the wrapped result using applyExtensions() function from /utils/feed.utils.js that takes the base config (def) and list of extension f it needed (extensions). Last one allows you to duplicate feed config and override any specific fields.

All configs are contained in exportable collection in /src/feeds/index.js.

Example config:

  const { feedDefaults } = require('../../../utils/feed.utils');

  const defaults = feedDefaults({
    name: 'my_feed_name',
    isProcessable: () => true,
    auth: {
      basic: { username: 'username', password: 'password' }
    },
    lang: 'en',
    query: () => ({ isPublished: true }),
    format: 'csv',
    options: {
      image: { height: 600 }
    },
    load: loader,
    preFilters: [
      preFilterOne,
      preFilterTwo
      // list of the filters that are executed before any transformations
    ],
    postFilters: [
      postFilterOne,
      postFilterTwo
      // list of the filters that are executed after all transformations
    ],
    transformers: [
        toBaseApartment,
        toLocalizedDescriptions,
      // list of transformers  
    ],
    processor: toCsvStream,
    upload: uploader
  });
  
  module.exports = applyExtensions({
    defaults,
    extensions: [ // list of extension (if needed), it allows you to duplicate feed config and override any specific fields
      { lang: 'de' },
      { lang: 'es' }
    ]
  });

Loaders

By default, data for processing in feeds is got from the database through getApartmentsStream() function from /services/apartments.service.js. But it is possible to specify a special loader instead of the standard one in which you need to return a stream with data for processing.

For now all the loaders are stored in folders of specific feeds.

Example loader:

 module.exports = feed => {
   ... your loader body here
   
   map(it => stream.push(it), apartmentsList);
   
   stream.push(null);
   
   return stream;
 }

Filters

preFilters and postFilters can be used to validate each record being processed. If the filter returns false after its execution, then the checked record will be excluded from current feed processing.

Some filters are stored in next the /filters folder. The rest are stored in folders of specific feeds.

Example filter:

 module.exports = feed => apartment => {
   ... your filter body here
   
   return isValid(apartment); // true OR false
 }

Transformers

Transformers are used to transform data to the required format.

Some transformers are stored in the /transformers folder. The rest are stored in folders of specific feeds.

Example transformer:

 module.exports = feed => apartment => {
   ... your transformations here
   
   return newApartment;
 }

Processors

Processors are used to convert transformed data to the needed file format (csv, xml, json, etc.).
Should return a list of objects that contain:

  • stream - stream with resulting data to save to S3 bucket.
  • suffix - string with the file format in which the resulting file will be saved.

Some processors are stored in the /processors folder. The rest are stored in folders of specific feeds.

Example processor:

module.exports = (feed) => (stream) => {
  const writer = ... // your stream writer

  return [{ stream: stream.pipe(writer), suffix: `.${feed.format}` }];
};

Uploads

Uploaders start after saving the resulting feed file to S3 and can be used to upload the resulting data to other sources like FTP server, partner api, etc.

Some uploaders are stored in the /uploaders folder. The rest are stored in folders of specific feeds.

Example uploader:

const { getLatestFeed } = require('../services/s3.service');

module.exports = feed => async (countOfItems) {
 
  const { fileStream } = await getLatestFeed(feed.name, feed.lang, feed.format);
   
  ... your uploader body here
}

Get Feed Result File

We can get feed result file using /feeds/:feed endpoint.
:feed should be in next format: "{name_of_feed}_{land}.{format}".

Examples: /feeds/facebook_de.csv, /feeds/idealista_es.json, etc.

Jobs

jobs/feed-manager.job.js

Main feed manager job that runs processing for all feeds every hour.

Endpoints

Authorization

Authorization of most requests is made by Basic Auth

managerAuth: | | URL | Username | Password | | ------ | ------ | ------ | ------ | | local | http://localhost:8080 | manager | 19da< R0adm2!@ASteS4$@s | | staging | https://staging-feed-manager-ui.homelike.xyz/ | manager | 19da<R0adm2!@ASteS4$@s | production | https://feeds-ui.services.thehomelike.com/ | manager | 19da<R0adm2!@ASteS4$@s

adwordsAuth: | | URL | Username | Password | | ------ | ------ | ------ | ------ | | local | http://localhost:8080 | devs@homelike.cc | Homelikedev2017 | | staging | https://staging-feed-manager-ui.homelike.xyz/ | devs@homelike.cc | Homelikedev2017 | production | https://feeds-ui.services.thehomelike.com/ | devs@homelike.cc | Homelikedev2017

Ping

The endpoint to check service status.

Method: GET
URL: '/ping'

Jobs force start

Endpoints to force start feed processing.

Method: GET;
URLs: '/feeds/run', '/feeds/run/:feed_name'
Authentication: managerAuth

Get Feed Result File

Endpoint for getting feed result file. :feed_name should be in next format: "{name_of_feed}_{land}.{format}".

Method: GET;
URLs: '/feeds/:feed_name'
Authentication: depends on feed auth configuration

Get Adwords Feed Result Files

Get links for getting Adwords Feed Result Files:

Method: GET;
URLs: '/adwords'
Authentication: adwordsAuth

Get Adwords Feed Result File by country and city:

Method: GET;
URLs: '/adwords/:country/:city.csv'
Authentication: adwordsAuth

Get Adwords Feed customizer:

Method: GET;
URLs: '/adwords/customizer.csv'
Authentication: adwordsAuth

Get Adwords Feed price-extension:

Method: GET;
URLs: '/adwords/price-extension.csv'
Authentication: adwordsAuth

Get feed config

Get feed config from feedConfiguration collection.

Method: GET;
URLs: '/feeds/config', '/feeds/config/:feed_name'
Authentication: managerAuth

Update feed config

Update feed config in feedConfiguration collection.

Method: PATCH;
URLs: '/feeds/config'
Authentication: managerAuth

Payload body:

{
  "feed_name": "facebook_en",
  "query": {
    "removed": {
      "$exists": false
    }
  }
}

Payload body fields:

  • feed_name string: name of feed with language
  • query object: query object that will be used to get data from the homelike database

Get feed preview file

Get feed config preview result file generated by provided query

Method: POST;
URLs: '/feeds/config'
Authentication: managerAuth

Payload body:

{
  "feed_name": "facebook_en",
  "query": {
    "removed": {
      "$exists": false
    }
  },
  "count": 10
}

Payload body fields:

  • feed_name string: name of feed with language
  • query object: query object that will be used to get data from the homelike database
  • count number: number of items that will be taken for preview file

Count the number of found items

Get number of found items by provided query

Method: POST;
URLs: '/feeds/config/count'
Authentication: managerAuth

Payload body:

{
  "query": {
    "removed": {
      "$exists": false
    }
  }
}

Payload body fields:

  • query object: query object that will be used to get data from the homelike database

Database

feedConfiguration

{
  "_id": "String",
  "feed_name": "String",
  "query": "Object"
}
  • feedname - name of feed with language. Should look like `{feed_name}{lang}`
  • query - query object that will be used to get data from the homelike database. Moment dates in queries should be with following fields:
    • method - calculation operation from the current date. Can be only "add" or "subtract"
    • period - period of calculation
    • count - number for calculation

Master list

These are the master query properties that needs to be in all the feeds. Right now we don't have a mechanism to include master list in all the feeds. So, whenever we create a new feed, make sure to include these queries in the query(). This list is Incomplete

const query = {
  ownerAccountId: {
    $nin: [ 
      "dbdf2bd9a63beb465435", // blueground
    ]
  }
}