How to do multivariate time series forecasting in BigQuery ML

Companies across industries rely heavily on time series forecasting to project product demand, forecast sales, project online subscription/cancellation, and for many other use cases. This makes time series forecasting one of the most popular models in BigQuery ML. What is multivariate time series forecasting? For example, if you want to forecast ice cream sales, it is helpful to forecast using the external covariant “weather” along with the target metric “past sales.” Multivariate time series forecasting in BigQuery lets you create more accurate forecasting models without having to move data out of BigQuery. When it comes to time series forecasting, covariates or features besides the target time series are often used to provide better forecasting. Up until now, BigQuery ML has only supported univariate time series modeling using the ARIMA_PLUS model (documentation). It is one of the most popular BigQuery ML models.While ARIMA_PLUS is widely used, forecasting using only the target variable is sometimes not sufficient. Some patterns inside the time series strongly depend on other features. We see strong customer demand for multivariate time series forecasting support that allows you to forecast using covariate and features.  We recently announced the public preview of multivariate time series forecasting with external regressors. We are introducing a new model type ARIMA_PLUS_XREG, where the XREG refers to external regressors or side features. You can use the SELECT statement to choose side features with the target time series. This new model leverages the BigQuery ML linear regression model to include the side features and the BigQuery ML ARIMA_PLUS model to model the linear regression residuals.The ARIMA_PLUS_XREG model supports the following capabilities: Automatic feature engineering for numerical, categorical, and array features.All the model capabilities of the ARIMA_PLUS model, such as detecting seasonal trends, holidays, etc.Headlight, an AI-powered ad agency, is using a multivariate forecasting model to determine conversion volumes for down-funnel metrics like subscriptions, cancellations, etc. based on cohort age. You can check out the customer video and demo here.The following sections show some examples of the new ARIMA_PLUS_XREG model in BigQuery ML. In this example, we explore the bigquery-public-data.epa_historical_air_quality dataset, which has daily air quality and weather information. We use the model to forecast the PM2.51 , based on its historical data and some covariates, such as temperature and wind speed.An example: forecast Seattle’s air quality with weather informationStep 1. Create the datasetThe PM2.5, temperature, and wind speed data are in separate tables. To simplify the queries, create a new table by joining those tables into a new table “bqml_test.seattle_air_quality_daily,” with the following columns:date: the date of the observationPM2.5: the average PM2.5 value for each daywind_speed: the average wind speed for each daytemperature: the highest temperature for each dayThe new table has daily data from 2009-08-11 to 2022-01-31.code_block[StructValue([(u’code’, u”CREATE TABLE `bqml_test.seattle_air_quality_daily`rnASrnWITHrn pm25_daily AS (rn SELECTrn avg(arithmetic_mean) AS pm25, date_local AS datern FROMrn `bigquery-public-data.epa_historical_air_quality.pm25_nonfrm_daily_summary`rn WHERErn city_name = ‘Seattle’rn AND parameter_name = ‘Acceptable PM2.5 AQI & Speciation Mass’rn GROUP BY date_localrn ),rn wind_speed_daily AS (rn SELECTrn avg(arithmetic_mean) AS wind_speed, date_local AS datern FROMrn `bigquery-public-data.epa_historical_air_quality.wind_daily_summary`rn WHERErn city_name = ‘Seattle’ AND parameter_name = ‘Wind Speed – Resultant’rn GROUP BY date_localrn ),rn temperature_daily AS (rn SELECTrn avg(first_max_value) AS temperature, date_local AS datern FROMrn `bigquery-public-data.epa_historical_air_quality.temperature_daily_summary`rn WHERErn city_name = ‘Seattle’ AND parameter_name = ‘Outdoor Temperature’rn GROUP BY date_localrn )rnSELECTrn pm25_daily.date AS date, pm25, wind_speed, temperaturernFROM pm25_dailyrnJOIN wind_speed_daily USING (date)rnJOIN temperature_daily USING (date)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e6736a2bed0>)])]Here is a preview of the data:Step 2. Create ModelThe “CREATE MODEL” query of the new multivariate model, ARIMA_PLUS_XREG, is very similar to the current ARIMA_PLUS model. The major differences are the MODEL_TYPE and inclusion of feature columns in the SELECT statement.code_block[StructValue([(u’code’, u”CREATE OR REPLACErn MODELrn `bqml_test.seattle_pm25_xreg_model`rn OPTIONS (rn MODEL_TYPE = ‘ARIMA_PLUS_XREG’,rn time_series_timestamp_col = ‘date’,rn time_series_data_col = ‘pm25′)rnASrnSELECTrn date,rn pm25,rn temperature,rn wind_speedrnFROMrn `bqml_test.seattle_air_quality_daily`rnWHERErn datern BETWEEN DATE(‘2012-01-01′)rn AND DATE(‘2020-12-31′)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e674df5f490>)])]Step 3. Forecast the future dataWith the created model, you can use the ML.FORECAST function to forecast the future data. Compared to the ARIMA_PLUS model, you have to specify the future covariates as an input.code_block[StructValue([(u’code’, u”SELECTrn *rnFROMrn ML.FORECAST(rn MODELrn `bqml_test.seattle_pm25_xreg_model`,rn STRUCT(30 AS horizon),rn (rn SELECTrn date,rn temperature,rn wind_speedrn FROMrn `bqml_test.seattle_air_quality_daily`rn WHERErn date > DATE(‘2020-12-31′)rn ))”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e674df5f750>)])]After running the above query, you can see the forecasting results:Step 4. Evaluate the modelYou can use the ML.EVALUATE function to evaluate the forecasting errors. You can set perform_aggregation to “TRUE” to get the aggregated error metric or “FALSE” to see the per timestamp errors.code_block[StructValue([(u’code’, u”SELECTrn *rnFROMrn ML.EVALUATE(rn MODEL `bqml_test.seattle_pm25_xreg_model`,rn (rn SELECTrn date,rn pm25,rn temperature,rn wind_speedrn FROMrn `bqml_test.seattle_air_quality_daily`rn WHERErn date > DATE(‘2020-12-31′)rn ),rn STRUCT(rn TRUE AS perform_aggregation,rn 30 AS horizon))”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e674df5fb90>)])]The evaluation result of ARIMA_PLUS_XREG is as follows:As a comparison, we also show the univariate forecasting ARIMA_PLUS result in the following table:Compared to ARIMA_PLUS, ARIMA_PLUS_XREG performs better on all measured metrics on this specific dataset and date range.ConclusionIn the previous example, we demonstrated how to create a multivariate time series forecasting model, forecast future values using the model, and evaluate the forecasted results. The ML.ARIMA_EVALUATE and ML.ARIMA_COEFFICIENTS table value functions are also helpful for investigating your model. Based on the feedback from users, the model does the following to improve user productivity.  It shortens the time spent preprocessing data and lets users keep their data in BigQuery when doing machine learning. It reduces overhead for the users who know SQL to do machine learning work in BigQuery.   For more information about the ARIMA_PLUS_XREG model, please see thedocumentation here.What’s Next?In this blogpost, we described the BigQuery ML Multivariate Time Series Forecast model, which is now available for public preview. We also showed a code demo for a data scientist, data engineer, or data analyst to enable the multivariate time series forecast model. The following features are coming soon:Large-scale multivariate time series, i.e., training millions of models for millions of multivariate time series in a single CREATE MODEL statementMultivariate time series anomaly detectionThanks to Xi Cheng, Honglin Zheng, Jiashang Liu, Amir Hormati, Mingge Deng and Abhinav Khushraj from the BigQuery ML team. Also thanks to Weijie Shen from the Google Resource Efficiency Data Science team.1. A measure of air pollution from fine particulate matter
Quelle: Google Cloud Platform

Highlights from the BuildKit v0.11 Release

BuildKit v0.11 is now available, along with Buildx v0.10 and v1.5 of the Dockerfile syntax. We’ve released new features, bug fixes, performance improvements, and improved documentation for all of the Docker Build tools.Let’s dive into what’s new! We’ll cover the highlights, but you can get the whole story in the full changelogs.

In this post:1. SLSA Provenance2. Software Bill of Materials3. SOURCE_DATE_EPOCH4. OCI image layouts as named contexts5. Cloud cache backends6. OCI Image annotations7. Build inspection8. Bake features

1. SLSA Provenance

BuildKit can now create SLSA Provenance attestation to trace the build back to source and make it easier to understand how a build was created. Images built with new versions of Buildx and BuildKit include metadata like links to source code, build timestamps, and the materials used during the build. To attach the new provenance, BuildKit now defaults to creating OCI-compliant images.

Although docker buildx will add a provenance attestation to all new images by default, you can also opt into more detail. These additional details include your Dockerfile source, source maps, and the intermediate representations used by BuildKit. You can enable all of these new provenance records using the new –provenance flag in Buildx:

$ docker buildx build –provenance=true -t <myorg>/<myimage> –push .

Or manually set the provenance generation mode to either min or max (read more about the different modes):

$ docker buildx build –provenance=mode=max -t <myorg>/<myimage> –push .

You can inspect the provenance of an image using the imagetools subcommand. For example, here’s what it looks like on the moby/buildkit image itself:

$ docker buildx imagetools inspect moby/buildkit:latest –format ‘{{ json .Provenance }}’
{
"linux/amd64": {
"SLSA": {
"buildConfig": {

You can use this provenance to find key information about the build environment, such as the git repository it was built from:

$ docker buildx imagetools inspect moby/buildkit:latest –format ‘{{ json (index .Provenance "linux/amd64").SLSA.invocation.configSource }}’
{
"digest": {
"sha1": "830288a71f447b46ad44ad5f7bd45148ec450d44"
},
"entryPoint": "Dockerfile",
"uri": "https://github.com/moby/buildkit.git#refs/tags/v0.11.0"
}

Or even the CI job that built it in GitHub actions:

$ docker buildx imagetools inspect moby/buildkit:latest –format ‘{{ (index .Provenance "linux/amd64").SLSA.builder.id }}’

https://github.com/moby/buildkit/actions/runs/3878249653

Read the documentation to learn more about SLSA Provenance attestations or to explore BuildKit’s SLSA fields.

2. Software Bill of Materials

While provenance attestations help to record how a build was completed, Software Bill of Materials (SBOMs) record what components are used. This is similar to tools like docker sbom, but, instead of requiring you to perform your own scans, the author of the image can build the results into the image.

You can enable built-in SBOMs with the new –sbom flag in Buildx:

$ docker buildx build –sbom=true -t <myorg>/<myimage> –push .

By default, BuildKit uses docker/buildkit-syft-scanner (powered by Anchore’s Syft project) to build an SBOM from the resulting image. But any scanner that follows the BuildKit SBOM scanning protocol can be used here:

$ docker buildx build –sbom=generator=<custom-scanner> -t <myorg>/<myimage> –push .

Similar to SLSA provenance, you can use imagetools to query SBOMs attached to images. For example, if you list all of the discovered dependencies used in moby/buildkit, you get this:

$ docker buildx imagetools inspect moby/buildkit:latest –format ‘{{ range (index .SBOM "linux/amd64").SPDX.packages }}{{ println .name }}{{ end }}’
github.com/Azure/azure-sdk-for-go/sdk/azcore
github.com/Azure/azure-sdk-for-go/sdk/azidentity
github.com/Azure/azure-sdk-for-go/sdk/internal
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob

Read the SBOM attestations documentation to learn more.

3. SOURCE_DATE_EPOCH

Getting reproducible builds out of Dockerfiles has historically been quite tricky — a full reproducible build requires bit-for-bit accuracy that produces the exact same result each time. Even builds that are fully deterministic would get different timestamps between runs.

The new SOURCE_DATE_EPOCH build argument helps resolve this, following the standardized environment variable from the Reproducible Builds project. If the build argument is set or detected in the environment by Buildx, then BuildKit will set timestamps in the image config and layers to be the specified Unix timestamp. This helps you get perfect bit-for-bit reproducibility in your builds.

SOURCE_DATE_EPOCH is automatically detected by Buildx from the environment. To force all the timestamps in the image to the Unix epoch:

$ SOURCE_DATE_EPOCH=0 docker buildx build -t <myorg>/<myimage> .

Alternatively, to set it to the timestamp of the most recent commit:

$ SOURCE_DATE_EPOCH=$(git log -1 –pretty=%ct) docker buildx build -t <myorg>/<myimage> .

Read the documentation to find out more about how BuildKit handles SOURCE_DATE_EPOCH. 

4. OCI image layouts as named contexts

BuildKit has been able to export OCI image layouts for a while now. As of v0.11, BuildKit can import those results again using named contexts. This makes it easier to build contexts entirely locally — without needing to push intermediate results to a registry.

For example, suppose you want to build your own custom intermediate image based on Alpine that contains some development tools:

$ docker buildx build . -f intermediate.Dockerfile –output type=oci,dest=./intermediate,tar=false

This builds the contents of intermediate.Dockerfile and exports it into an OCI image layout into the intermediate/ directory (using the new tar=false option for OCI exports). To use this intermediate result in a Dockerfile, refer to it using any name you like in the FROM statement in your main Dockerfile:

FROM base
RUN … # use the development tools in the intermediate image

You can then connect this Dockerfile to your OCI layout using the new oci-layout:// URI schema for the –build-context flag:

$ docker buildx build . -t <myorg>/<myimage> –build-context base=oci-layout://intermediate

Instead of resolving the image base to Docker Hub, BuildKit will instead read it from oci-layout://intermediate in the current directory, so you don’t need to push the intermediate image to a remote registry to be able to use it.

Refer to the documentation to find out more about using oci-layout:// with the –build-context flag.

5. Cloud cache backends

To get good build performance when building in ephemeral environments, such as CI pipelines, you need to store the cache in a remote backend. The newest release of BuildKit supports two new storage backends: Amazon S3 and Azure Blob Storage.

When you build images, you can provide the details of your S3 bucket or Azure Blob store to automatically store your build cache to pull into future builds. This build cache means that even though your CI or local runners might be destroyed and recreated, you can still access your remote cache to get quick builds when nothing has changed.

To use the new backends, you can specify them using the –cache-to and –cache-from flags:

$ docker buildx build –push -t <user>/<image>
–cache-to type=s3,region=<region>,bucket=<bucket>,name=<cache-image>[,parameters…]
–cache-from type=s3,region=<region>,bucket=<bucket>,name=<cache-image> .

$ docker buildx build –push -t <registry>/<image>
–cache-to type=azblob,name=<cache-image>[,parameters…]
–cache-from type=azblob,name=<cache-image>[,parameters…] .

You also don’t have to choose between one cache backend or the other. BuildKit v0.11 supports multiple cache exports at a time so you can use as many as you’d like.

Find more information about the new S3 backend in the Amazon S3 cache and the Azure Blob Storage cache backend documentation. 

6. OCI Image annotations

OCI image annotations allow attaching metadata to container images at the manifest level. They’re an alternative to labels that are more generic, and they can be more easily attached to multi-platform images.

All BuildKit image exporters now allow setting annotations to the image exporters. To set the annotations of your choice, use the –output flag:

$ docker buildx build …
–output "type=image,name=foo,annotation.org.opencontainers.image.title=Foo"

You can set annotations at any level of the output, for example, on the image index:

$ docker buildx build …
–output "type=image,name=foo,annotation-index.org.opencontainers.image.title=Foo"

Or even set different annotations for each platform:

$ docker buildx build …
–output "type=image,name=foo,annotation[linux/amd64].org.opencontainers.image.title=Foo,annotation[linux/arm64].org.opencontainers.image.title=Bar"

You can find out more about creating OCI annotations on BuildKit images in the documentation.

7. Build inspection with –print

If you are starting in a codebase with Dockerfiles, understanding how to use them can be tricky. Buildx supports the new –print flag to print details about a build. This flag can be used to get quick and easy information about required build arguments and secrets, and targets that you can build. 

For example, here’s how you get an outline of BuildKit’s Dockerfile:

$ BUILDX_EXPERIMENTAL=1 docker buildx build –print=outline https://github.com/moby/buildkit.git
TARGET: buildkit
DESCRIPTION: builds the buildkit container image

BUILD ARG VALUE DESCRIPTION
RUNC_VERSION v1.1.4
ALPINE_VERSION 3.17
BUILDKITD_TAGS defines additional Go build tags for compiling buildkitd
BUILDKIT_SBOM_SCAN_STAGE true

We can also list all the different targets to build:

$ BUILDX_EXPERIMENTAL=1 docker buildx build –print=targets https://github.com/moby/buildkit.git
TARGET DESCRIPTION
alpine-amd64
alpine-arm
alpine-arm64
alpine-s390x

Any frontend that implements the BuildKit subrequests interface can be used with the buildx –print flag. They can even define their own print functions, and aren’t just limited to outline or targets.

The –print feature is still experimental, so the interface may change, and we may add new functionality over time. If you have feedback, please open an issue or discussion on the docker/buildx GitHub, we’d love to hear your thoughts!

8. Bake features

The Bake file format for orchestrating builds has also been improved.

Bake now supports more powerful variable interpolation, allowing you to use fields from the same or other blocks. This can reduce duplication and make your bake files easier to read:

target "foo" {
dockerfile = target.foo.name + ".Dockerfile"
tags = [target.foo.name]
}

Bake also supports null values for build arguments and allows labels to use the defaults set in your Dockerfile so your bake definition doesn’t override those:

variable "GO_VERSION" {
default = null
}
target "default" {
args = {
GO_VERSION = GO_VERSION
}
}

Read the Bake documentation to learn more. 

More improvements and bug fixes 

In this post, we’ve only scratched the surface of the new features in the latest release. Along with all the above features, the latest releases include quality-of-life improvements and bug fixes. Read the full changelogs to learn more:

BuildKit v0.11 changelog

Buildx v0.10 changelog

Dockerfile v1.5 changelog

We welcome bug reports and contributions, so if you find an issue in the releases, let us know by opening a GitHub issue or pull request, or get in contact in the #buildkit channel in the Docker Community Slack.
Quelle: https://blog.docker.com/feed/

Vorstellung der neuen Amazon-EMR-Konsole

Sie können jetzt die neu gestaltete Amazon-EMR-Konsole verwenden. Amazon EMR ist die branchenführende Cloud-Big-Data-Lösung für die Datenverarbeitung in Petabyte-Größe, interaktive Analyse und Machine Learning mit Open-Source-Frameworks wie Apache Spark, Apache Hive und Presto. Die neu gestaltete Konsole bietet eine neue, vereinfachte Oberfläche für das Starten und Verwalten von Clustern, auf denen Big-Data-Verarbeitungs-Workloads ausgeführt werden.
Quelle: aws.amazon.com