Global

Type Definitions

Configurations

Configurations Schema

Type:
  • Object
Properties:
Name Type Default Description
BITSKY_BASE_URL string http://localhost:9099

The BitSky Application URL

GLOBAL_ID string

The global id of your Retailer Service. Please Get a Retailer Service Global ID

PORT number 8081

Express server port number

SERVICE_NAME string @bitskyai/retailer-sdk

Service name, this name will be used for log

RETAILER_HOME string

Home folder of this retailer. Default is ${process.cwd()}/public.

LOG_LEVEL string info

Loging level you want to log. Please find available loging levels from Winston Logging Levels

ERROR_LOG_FILE_NAME string error.log

Error log file name

COMBINED_LOG_FILE_NAME string combined.log

Combined log file name

DATA_FILE_NAME string data.json

Collect data file name. Default is data.json.

CONNECTOR_TYPE string json

Connector is used to define the way how to store your data, default is json. Currently, we have two connector type - ['json', 'mongodb']

MONGODB_URL string mongodb://localhost:27017/retailer

MongoDB url. **Important: ** if you configured MONGODB_URL, then MONGODB_HOST and MONGODB_NAME doesn't work.

MONGODB_HOST string

MongoDB host url, like ds123456.mlab.com, 10.0.0.247. Default is undefined.

MONGODB_PORT string

MongoDB port number, like 63410, 27017. Default is undefined.

MONGODB_NAME string

MongoDB name, like retailer. Default is undefined.

MONGODB_USERNAME string

MongoDB user name, like admin. Default is undefined.

MONGODB_PASSWORD string

MongoDB password, like 123456. Default is undefined.

Source:

IndexOptions

Type:
  • object
Properties:
Name Type Attributes Default Description
title string <optional>
Retailer Service

Title of this retailer service

description string <optional>
A retailer server to crawl data from website

Description of this retailer service

githubURL string <optional>
https://github.com/bitskyai

Your github repo URL

homeURL string <optional>
https://bitsky.ai

Your github repo URL

docURL string <optional>
https://docs.bitsky.ai

Your document URL

copyright string <optional>
&copy; 2020 BitSky.ai

copyright

items Array.<Item> <optional>

Additional links you want to render

Source:

Item

Type:
  • object
Properties:
Name Type Description
title string

Item title

url string

Item url

description string

Item description

Source:

ParseFunReturn

Type:
  • object
Properties:
Name Type Attributes Description
tasks array <optional>

Send an array of Task to BitSky application

data integer | string | Object | Array <optional>

Data you want to save. If data is empty or undefined or null, then nothing will be saved. If data is an Object not an Array, then data will be saved by property keys, this is useful for in the parse function needs to extract multiple data. data will be saved to Configurations.DATA_PATH

response object <optional>
Properties
Name Type Attributes Default Description
status number <optional>
200

HTTP response status code. Any value big than 300 will be considered of fail

data integer | string | object | array <optional>

Data want to send back. Only use when you want to return an error, and you can add the reason of error, it is useful for troubleshoot

Source:

Task

Type:
  • object
Properties:
Name Type Attributes Default Description
url string

web page url that need to be processed

retailer object
Properties
Name Type Description
globalId string

The global id of your Retailer Service

priority integer <optional>
100

Priority of this task. Only compare priority for same Retailer Service, doesn't compare cross Retailer Service. Bigger value low priority. Priority value 1 is higher than priority value 2.

suitableProducers array <optional>
["HEADLESSBROWSER"]

What kind of producers can execute this task

metadata object <optional>

Additional metadata for this task

Properties
Name Type Attributes Description
script string <optional>

Code want to execute after page load. Only HEADLESSBROSWER producer can execute code.

You code should be a async function, like this:

async function(){
 await $$page.waitFor(5000);
}

This code will let page wait 5s.

Inside your code, you have four global variables, and you CANNOT change it, if you do $$page=newPage, will cause your code execute fail

  1. $$page: Puppeteer page instance, refer to current page
  2. $task: Task information
  3. $$_: Lodash instance
  4. $logger: Winston Logger, you can add log

Except those four global variables, you also can use require to require NodeJS native modules.

If your return value isn't undefined or null, then this vlaue will be set as dataset and send back to your Retailer Service. If you return undefined or null or don't return any value, then will send whole page back to your Retailer Service.

Example

This example, we will wait 5s, then send whole page back. It is useful for single page application to wait until data finish load.

{
 metadata:{
   script: `
             async function(){
               await $$page.waitFor(5000);
             }
           `
 }
}

You also can define your code as function, and use toString().

Source:

TriggerFunReturn

Type:
  • object
Properties:
Name Type Attributes Description
tasks array <optional>

Send an array of Task to BitSky application

Source: