homelike-feed-processor v1.23.0
Overview
Feed Manager Service allows providing files with information about our apartments to partners services. The service every hour generates files with information about the apartments in the format required for the partner, uploads files with this information to AWS S3 Bucket and provides endpoints for downloading these files via api. Also, in some cases, the service independently uploads information about our apartments to partner services.
Installation
cmd: npm i
cmd: npm start
cmd: pm2 logs
to check logs
cmd: pm2 kill/restart
to kill or restart the process
Test Status
cmd: npm test
Deployment / CI
Automatic production
deploy when pushed to production
branch.
Automatic staging
deploy when pushed to staging
branch.
Please check config
folder for configs for different environments.
Monitoring
cmd: awslogs get -S -w homelike-feed-processor --profile hlnm-staging --aws-region eu-central-1 -SG --query log
Note that you have to have hlnm-staging
aws profile in your ~/.aws/credentials
file.
Code structure
Path: /
File | Description |
---|---|
process.debug.yml | contains env variables for development |
process.yml | contains env variables for staging and production |
Path: /.circleci
File | Description |
---|---|
config.yml | contains circleci deployment config |
Path: /deploy
File | Description |
---|---|
values-production.yaml | contains deployment config values for production |
values.yaml | contains deployment config values for develop and staging |
Path: /jest
File | Description |
---|---|
setup.js | contains jest test suite config |
Path: /seeds
File | Description |
---|---|
./ | contains seeded data files for feeds |
configuration-feed.seed.json | contains default queries for feeds. Saved into the feedConfiguration collection if it is empty. |
Path: /src
File | Description |
---|---|
/feeds | contains folders for all feeds with their configuration files, exclusive transformers, loaders, filters, etc. |
/feeds/index.js | contains exportable collection of feeds configurations |
feed-manager.js | ** |
Unused**. Contains methods for feed processing | feed-manager.streamable.js | contains methods for feed processing using streams
Path: /transformers
File | Description |
---|---|
./ | contains transformers for feeds |
index.js | contains exportable collection of the transformers |
apartment.transformer.js | contains base transformer that is used in every feed. |
description-energy-certificate.transformer.js | contains a transformer that is used to add energy certificate information into description field. |
furnishing.transformer.js | contains a transformer that is used to add furnishing information into description field. |
image.transformer.js | contains a transformer that is used to add links to media (photos, floor_plans, tours). |
json.transformer.js | contains a transformer that is used to convert items to strings in xml format. |
xml.transformer.js | contains a transformer that is used to convert items to strings in xml format. |
Path: /filters
File | Description |
---|---|
./ | contains filters for feeds |
index.js | contains exportable collection of the filters |
/schemas | contains schemas for filters |
Path: /processors
File | Description |
---|---|
./ | contains processors for feeds |
index.js | contains exportable collection of the processors |
csv.processor.js | contains a processor for saving the feed result file in csv format |
json.processor.js | contains a processor for saving the feed result file in json format |
xml.processor.js | contains a processor for saving the feed result file in xml format |
openimmo-xml.processor.js | contains a processor for saving the feed result file in openimmo xml format |
sublet-xml.processor.js | contains a processor for saving the feed result file in sublet xml format |
Path: /jobs
File | Description |
---|---|
feed-manager.job.js | contains cron job that runs feed processing every hour |
multi-feed-manager.runner.js | contains a manually called job that runs processing for all feeds |
single-feed-manager.runner.js | contains a manually called job that runs processing for single feed |
Path: /endpoints
File | Description |
---|---|
index.js | contains configuration for all endpoints |
/routers/base.router.js | contains routers for getting feeds, running feed processing and ping |
/routers/adwords.router.js | contains routers for adwords feed |
/routers/configuration.router.js | contains routers for managing feeds configuration |
/middlewares | contains middlewares for endpoints |
Path: /uploaders
File | Description |
---|---|
./ | contains uploaders for feeds |
Path: /services
File | Description |
---|---|
./ | contains utilities for working with external services |
apartments.service.js | contains utilities for getting data from homelike db |
auth.service.js | contains utilities for api authentications |
cloudwatch.service.js | contains utilities for working with cloudwatch service |
feed-configuration.service.js | contains utilities for working with feedConfiguration collection |
ftp.service.js | contains utilities for working with ftp servers |
logger.js | contains configured logger |
s3.service.js | contains utilities for working with s3 service |
translation.service.js | contains utilities for working with translation service |
Feed processing
All feed processing initializes every hour through jobs/feed-manager.job.js
cron job and starts if isProcessable()
function from feed configurations returns true value. Also, the processing of all feeds can be called manually through
the /feeds/run
endpoint or for one feed through the /feeds/run/:feed
endpoint.
Feeds are processed in functions processFeeds() and processFeed() from /src/feed-manager.streamable
Configuration
The most important for feed processing are the channel configurations. Each feed configuration must be placed in src/feeds/feed_name/index.js. This configuration contains the following fields:
Field | Type | Mandatory | Description |
---|---|---|---|
name | String | ✅ | Feed name. |
isProcessable | Function | Function that starts before starting feed processing and shows whether to run feed processing. Should return boolean value. Not checked when feed processing is manually started | |
auth | Object | ✅ | Contains authorization config. Fields: basic - mandatory field that contains _ |
username and
password_ fields used to receive feed files. ssl - path to ssl folderftp - credentials for ftp
servers | | lang | String | ✅ | Feed language. Used to get description | | format | String | | Format of
result file | | query | Function | ✅ | Function that should return an object for the mongodb find request.
Unused when there is a query for this feed in the feedConfiguration
collection | | options | Object | | Additional
data for feed | | load | Function | | Function that runs before feed processing instead of the usual load by query from
the database. Must return a stream with data | | preFilters | Array | | List of functions that run before
transformations. Items are removed from the data stream if at least one pre-filter returns false | | transformers |
Array | | List of functions for data transformations | | postFilters | Array | | List of functions that run after
transformations. Items are removed from the data stream if at least one pre-filter returns false | | processor |
Function | | Function that converts transformed data to the needed format (csv, xml, json) for saving to S3 | | upload |
Function | | Function that runs after data has been uploaded to S3 |
Each feed config should be wrapped using feedDefaults() function from /utils/feed.utils.js
. This function provides
default feed configuration that can nevertheless be overwritten.
Each index.js of feeds must return the wrapped result using applyExtensions() function from /utils/feed.utils.js
that takes the base config (def) and list of extension f it needed (extensions). Last one allows you to
duplicate feed config and override any specific fields.
All configs are contained in exportable collection in /src/feeds/index.js
.
Example config:
const { feedDefaults } = require('../../../utils/feed.utils');
const defaults = feedDefaults({
name: 'my_feed_name',
isProcessable: () => true,
auth: {
basic: { username: 'username', password: 'password' }
},
lang: 'en',
query: () => ({ isPublished: true }),
format: 'csv',
options: {
image: { height: 600 }
},
load: loader,
preFilters: [
preFilterOne,
preFilterTwo
// list of the filters that are executed before any transformations
],
postFilters: [
postFilterOne,
postFilterTwo
// list of the filters that are executed after all transformations
],
transformers: [
toBaseApartment,
toLocalizedDescriptions,
// list of transformers
],
processor: toCsvStream,
upload: uploader
});
module.exports = applyExtensions({
defaults,
extensions: [ // list of extension (if needed), it allows you to duplicate feed config and override any specific fields
{ lang: 'de' },
{ lang: 'es' }
]
});
Loaders
By default, data for processing in feeds is got from the database through getApartmentsStream() function
from /services/apartments.service.js
. But it is possible to specify a special loader instead of the standard one in
which you need to return a stream with data for processing.
For now all the loaders are stored in folders of specific feeds.
Example loader:
module.exports = feed => {
... your loader body here
map(it => stream.push(it), apartmentsList);
stream.push(null);
return stream;
}
Filters
preFilters and postFilters can be used to validate each record being processed. If the filter returns false after its execution, then the checked record will be excluded from current feed processing.
Some filters are stored in next
the /filters
folder. The rest are stored in
folders of specific feeds.
Example filter:
module.exports = feed => apartment => {
... your filter body here
return isValid(apartment); // true OR false
}
Transformers
Transformers are used to transform data to the required format.
Some transformers are stored in
the /transformers
folder. The rest are
stored in folders of specific feeds.
Example transformer:
module.exports = feed => apartment => {
... your transformations here
return newApartment;
}
Processors
Processors are used to convert transformed data to the needed file format (csv, xml, json, etc.).
Should return a list of objects that contain:
- stream - stream with resulting data to save to S3 bucket.
- suffix - string with the file format in which the resulting file will be saved.
Some processors are stored in
the /processors
folder. The rest are
stored in folders of specific feeds.
Example processor:
module.exports = (feed) => (stream) => {
const writer = ... // your stream writer
return [{ stream: stream.pipe(writer), suffix: `.${feed.format}` }];
};
Uploads
Uploaders start after saving the resulting feed file to S3 and can be used to upload the resulting data to other sources like FTP server, partner api, etc.
Some uploaders are stored in
the /uploaders
folder. The rest are
stored in folders of specific feeds.
Example uploader:
const { getLatestFeed } = require('../services/s3.service');
module.exports = feed => async (countOfItems) {
const { fileStream } = await getLatestFeed(feed.name, feed.lang, feed.format);
... your uploader body here
}
Get Feed Result File
We can get feed result file using /feeds/:feed
endpoint.:feed
should be in next format: "{name_of_feed}_{land}.{format}
".
Examples: /feeds/facebook_de.csv
, /feeds/idealista_es.json
, etc.
Jobs
jobs/feed-manager.job.js
Main feed manager job that runs processing for all feeds every hour.
Endpoints
Authorization
Authorization of most requests is made by Basic Auth
managerAuth: | | URL | Username | Password | | ------ | ------ | ------ | ------ | | local | http://localhost:8080 | manager | 19da< R0adm2!@ASteS4$@s | | staging | https://staging-feed-manager-ui.homelike.xyz/ | manager | 19da<R0adm2!@ASteS4$@s | production | https://feeds-ui.services.thehomelike.com/ | manager | 19da<R0adm2!@ASteS4$@s
adwordsAuth: | | URL | Username | Password | | ------ | ------ | ------ | ------ | | local | http://localhost:8080 | devs@homelike.cc | Homelikedev2017 | | staging | https://staging-feed-manager-ui.homelike.xyz/ | devs@homelike.cc | Homelikedev2017 | production | https://feeds-ui.services.thehomelike.com/ | devs@homelike.cc | Homelikedev2017
Ping
The endpoint to check service status.
Method: GET
URL: '/ping'
Jobs force start
Endpoints to force start feed processing.
Method: GET;
URLs: '/feeds/run', '/feeds/run/:feed_name'
Authentication: managerAuth
Get Feed Result File
Endpoint for getting feed result file. :feed_name
should be in next format: "{name_of_feed}_{land}.{format}
".
Method: GET;
URLs: '/feeds/:feed_name'
Authentication: depends on feed auth configuration
Get Adwords Feed Result Files
Get links for getting Adwords Feed Result Files:
Method: GET;
URLs: '/adwords'
Authentication: adwordsAuth
Get Adwords Feed Result File by country and city:
Method: GET;
URLs: '/adwords/:country/:city.csv'
Authentication: adwordsAuth
Get Adwords Feed customizer:
Method: GET;
URLs: '/adwords/customizer.csv'
Authentication: adwordsAuth
Get Adwords Feed price-extension:
Method: GET;
URLs: '/adwords/price-extension.csv'
Authentication: adwordsAuth
Get feed config
Get feed config from feedConfiguration
collection.
Method: GET;
URLs: '/feeds/config', '/feeds/config/:feed_name'
Authentication: managerAuth
Update feed config
Update feed config in feedConfiguration
collection.
Method: PATCH;
URLs: '/feeds/config'
Authentication: managerAuth
Payload body:
{
"feed_name": "facebook_en",
"query": {
"removed": {
"$exists": false
}
}
}
Payload body fields:
- feed_name string: name of feed with language
- query object: query object that will be used to get data from the homelike database
Get feed preview file
Get feed config preview result file generated by provided query
Method: POST;
URLs: '/feeds/config'
Authentication: managerAuth
Payload body:
{
"feed_name": "facebook_en",
"query": {
"removed": {
"$exists": false
}
},
"count": 10
}
Payload body fields:
- feed_name string: name of feed with language
- query object: query object that will be used to get data from the homelike database
- count number: number of items that will be taken for preview file
Count the number of found items
Get number of found items by provided query
Method: POST;
URLs: '/feeds/config/count'
Authentication: managerAuth
Payload body:
{
"query": {
"removed": {
"$exists": false
}
}
}
Payload body fields:
- query object: query object that will be used to get data from the homelike database
Database
feedConfiguration
{
"_id": "String",
"feed_name": "String",
"query": "Object"
}
- feedname - name of feed with language. Should look like `{feed_name}{lang}`
- query - query object that will be used to get data from the homelike database. Moment dates in queries should be with
following fields:
- method - calculation operation from the current date. Can be only "add" or "subtract"
- period - period of calculation
- count - number for calculation
Master list
These are the master query properties that needs to be in all the feeds. Right now we don't have a mechanism to
include master list in all the feeds. So, whenever we create a new feed, make sure to include these queries in
the query()
.
This list is Incomplete
const query = {
ownerAccountId: {
$nin: [
"dbdf2bd9a63beb465435", // blueground
]
}
}
2 years ago