What are NetSuite Map/Reduce Scripts?
Map/Reduce script is a server side script introduced on SuiteScript 2.0. This script is quite similar to that of the scheduled scripts in way of triggering and purpose of use. Like Schedule scripts Map/Reduce scripts can be invoked manually or at a predefined scheduled time or from another script. Both Scheduled and Map/Reduce scripts are used for the purpose of handling operations on large/bulk amounts of data.
Even though there are similarities, Map/Reduce has more power and benefits that offers several advantages over scheduled scripts. They provide dynamic governance handling, parallel processing, automatic yielding and are best suited for use in situations where the large data can be broken down into small and independent parts, which then are processed in parallel.
In spite of automatic yielding, note that there are some aspects of governance which cannot be handled and are categorized as hard and soft limits. Hard limits result in the interruption or dismissal of the current function execution. When the total data used by this script exceeds 50Mb an error called STORAGE_SIZE_EXCEEDED occurs and causes the script to go to the summarize stage.
While all other scripts execute in a single continuous process, Map/Reduce executes over 5 different stages, of which the user has control over 4. As a result, developers will need to think and implement their logic in stages. The following are the different stages of the Map/Reduce script.
1. getInputData (Governance – 10,000 units)
This stage is the first to be invoked when the structured framework of the Map/Reduce script gets triggered and is only executed once. In this stage, data is gathered and prepared for processing in the further stages. The data returned in this stage is in the form of an object, which in the upcoming stages is converted into a list of key/value pairs. One way this object can be formed is by creating or loading a search, and then returning its search object. The search runs in the back end to extract the results and each result is converted into key/value pairs. Another way of doing this is by creating an array of values as per the need and returning the same in this stage.
2. map (Governance – 1,000 units)
The logic in this stage is written in such a way that it is invoked one time for each key/value pair returned from the previous stage. Based on the requirement, if a reduce function is also used then the data sent out from this function is served as an input to the shuffle function/stage and then to the reduce stage.
The context in the map will contain the following.
3. shuffle
This is not a function or a stage that can be defined in the script. This stage usually sorts out the key/value pairs sent as input from the Map stage, and if the map stage is not used, then from the getInputStage. The major purpose of this stage is to group the key/value pairs by the key where key is unique and value is an array. For example, in the above screenshot, the key is given as status and internal ID of the sales order record as the value.
As a result, the shuffle stage will group all the sales order internal id based on the sales order status and provide the result as an input to the reduce stage.
4. reduce (Governance – 5,000 units)
Similar to that of the map stage, this function is written to be invoked once for each key/value pair and write data to the summarized stage.
The input from the shuffle stage will look like the screenshot below, where all the sales orders have been grouped by two statues.
5. Summarize (Governance – 10,000 units)
This function can retrieve and log statistics about the script’s work. It can also take actions with data sent by the reduce stage.
Parallel Processing
In NetSuite all scheduled scripts and Map/Reduce scripts are powered by SuiteCloud wherein the execution is done through jobs, but handling of the scripts is different. The difference is that only one job is created to handle a scheduled script, whereas with Map/Reduce scripts multiple jobs are created. Additionally, multiple jobs can be created for each map and reduce stage, which in turn work independently and in parallel across several processors. Only getInputData and Summarize stages are handled by a single job, and executed once so they are called serial stages.
Sample Script
The below script starts off its execution by invoking the getInputData() function first. In this function a search is created to get all the Pending Fulfillment Sales Orders. The created search object is returned which serves as an input to the Map function.
The map function gets invoked once for every sales order record that is returned from the getInputData() function and creates Item Fulfillment for each sales order and writes the corresponding sales order to the next stage(Reduce) as key/value pairs.
The reduce function gets the fulfilled Sales Order as an input from the Map stage and creates an Invoice for the Sales Order.
Once the script passes through all these stages the summarize stage gets triggered which displays some analytics in the logs like governance used, time taken and total number of yields.