Consumption Analytics Documentation

Home > Documentation for older versions > Cloud Cruiser 4 > Collecting, transforming, and publishing > ETL workbooks > Working with flows > Step types

Step types

Table of contents
No headers

This topic describes the flow steps you can use in workbook flows. For more information about the fields used when defining a step, click the question marks in the Edit Step dialog box in Cloud Cruiser.

Step Description
Aggregate Rows

Consolidates usage data for faster processing and more compact data storage by merging rows within the same time interval and having matching dimensions, hereafter referred to as matching rows.

When a measure exists in two or more matching rows, a calculation is performed to set that measure's value in the aggregated row. By default the SUM calculation is used, but you can optionally use another. By default time frames are matched on identical end dates, but you can optionally choose another method. Input rows are sorted before being processed, and you can specify the dimensions on which to sort.

With advanced options, you can:

  • Specify dimensions that are not to be used for matching rows.
  • Use different calculations for certain measures.
  • Cause the processor to fail when it encounters a row with a usage interval longer than the selected Interval, and set a tolerance for this behavior.
  • Expand the usage interval to match the selected Interval.
  • Create a measure that counts the number of source rows for every aggregated row.
Clean Up Directory

Deletes files from the specified directory on the Cloud Cruiser server that are more than a specified number of days older than the select date based on a date in their file name. This step only deletes files whose names include a date in the format YYYYMMDD, such as Sheet1_20140901_Sheet1_AggregationRows_out.ccr.

With advanced options you can clean up subdirectories and calculate file age from a different starting date.

Correlate Events

Treating input rows as create, modify, or delete events, this step writes output rows reflecting allocations for the select date and maintains a Next Day Start Feed of rows for existing allocations to correlate with the next day's collection. You must use an Import Collections step to explicitly import this feed into the flow before this step so that the step can include those rows on the next day's run.

  • An input row with only a start time is treated as a create event. If a corresponding delete event is found, the two are merged into a single output row with start and end times. Otherwise, the output row is given an end time of 23:59:59 to indicate usage through the end of the day and a row with a start time of 00:00:00 is written to the next day start feed so that the allocations roll forward to the next day.
  • An input row with only a start time but that matches a row with an earlier start time is treated as a modify event. Two output rows are created, one with the previous allocations starting at 00:00:00 and ending at the time of the input row and one with the new allocations starting at that time and ending at 23:59:59. A row with the new allocations is written to the next day start feed.
  • An input row with only an end time is treated as a delete event. If no other input row is a matching create or modify event, the system finds a matching row in the next day start feed and merges the two into a single output row with start and end times.

Two rows match if their values for each dimension in the Key Dimensions list match. The measures in these rows are all considered to be part of the same object. For example, a VM.

Optionally, on a delete event this step can also delete allocations for contained objects that are dependent on the deleted object. For example, when a VM is deleted you can automatically delete its snapshots, even though no delete event for the snapshots was collected. To do this, specify the Contained Dimensions whose value matches in the parent and child objects.

To safeguard against missed events, use the Control Feeds advanced option. A control feed is an inventory snapshot that shows which objects exist at a given point in time. You can tell the processor to trust the control feed as to whether an object exists even though you haven't collected a start or end event for it.

When your input data includes audit events that represent an inventory snapshot for a specified period, use the Start Condition advanced option to assign the correct start times to those inventory objects.

This step sorts input rows before processing.

Create Lookup File

Creates a lookup file from the values of specified dimensions. Each row in the worksheet becomes a row in the lookup file.

The step ignores duplicate source values and overlapping source ranges. When the lookup file is later consumed, the first match is used. If you expect your data to have duplicate or overlapping sources, consider sorting it before this step so that the rows in the lookup file are in the order you want.

Delete Files Deletes the specified files from the Cloud Cruiser server.
Delete Loads Deletes published loads and their associated charges from the Cloud Cruiser database for the specified workbooks, as listed in the Processes field.
Import Collections

Imports data for the select date from the specified collections into the worksheet so that you can transform and publish it. To import all collections in the workbook, do not specify a collection.

For collections that produce more than one feed (sifting collections), enter the desired feed names in the Collections list.

This step deletes any existing rows in the current flow, so after processing it contains only the imported rows. If you need to merge the imported collections with other rows, use a Divert Row to Dataset processor and an Import Datasets step to do the merge in a new worksheet.

Import Datasets

Imports rows into the current flow from other flows and auxiliary datasets. This allows you to unify datasets that require the same processing so that you can develop the steps for that processing once.

This step deletes any existing rows in the current flow, so after processing it contains only the imported rows. If you need to merge the imported collections with other rows, create a new worksheet and import this flow along with the other rows you want to merge with it.

Join Dataset

Joins data from the selected dataset to the current worksheet. If any values of the Key Dimension match in the current worksheet and specified dataset, those rows are joined. If you specify more than one Key Dimension, only rows that have matching values for all Key Dimensions are joined.

With advanced options you can specify a subset of dimensions and measures to be joined. If you do not specify a dimension, all dimensions eligible to be joined are joined. If you do not specify a measure, all dimensions eligible to be joined are joined. You can also specify whether data in the current worksheet is overwritten (true) or retained (false) when the join occurs.

Run Script

Runs an external command on the Cloud Cruiser server as a separate process and waits for completion while logging output. This enables you to integrate with external tools and to create custom steps using other programming or scripting languages.

Standard output from the executed process is logged at the INFO level while error output is logged at the WARN level. Any non-zero exit code is considered a failure.

With advanced options you can specify a timeout and the environment variables for the process. You can also specify whether to run the command as part of worksheet simulation.

This step can add parameters to your command to control input, output, and simulation mode. For information, see the help for the Integrate Into Flow and Run in Simulation fields.

Sort Rows

Sorts all rows in the flow by the dimensions you specify.

When the values of these dimensions match, next in the sort order is the usage start time. You can optionally specify to instead use end time for this tiebreaker. You can optionally also specify the size of sorting chunks to affect performance.

Last modified



This page has no classifications.

 (c) Copyright 2017-2020 Hewlett Packard Enterprise Development LP