How to analyze Fastly real-time streaming logs with BigQuery

By Simon Wistow VP of Product Strategy, Fastly

[Editor’s note: Today we hear from Fastly, whose edge cloud platform allows web applications to better serve global users with services for content delivery, streaming, security and load-balancing. In addition to improving response times for applications built on Google Cloud Platform (GCP), Fastly now supports streaming its logs to Google Cloud Storage and BigQuery, for deeper analysis. Read on to learn more about the integration and how to set it up in your environment.] 

Fastly’s collaboration with Google Cloud combines the power of GCP with the speed and flexibility of the Fastly edge cloud platform. Private interconnects with Google at 14 strategic locations across the globe give GCP and Fastly customers dramatically improved response times to Google services and storage for traffic going over these interconnects.

Today, we’ve announced our BigQuery integration; we can now stream real-time logs to Google Cloud Storage and BigQuery, allowing companies to analyze unlimited amounts of edge data. If you’re a Fastly customer, you can get actionable insights into website page views per month and usage by demographic, geographic location and other dimensions. You can use this data to troubleshoot connectivity problems, pinpoint configuration areas that need performance tuning, identify the causes of service disruptions and improve your end users’ experience. You can even combine Fastly log data with other data sources such as Google Analytics, Google Ads data and/or security and firewall logs using a BigQuery table. You can save Fastly’s real-time logs to Cloud Storage for additional redundancy; in fact, many customers back up logs directly into Cloud Storage from Fastly.

A Fastly POP fronts a GCP-based application, and streams its logs to BigQuery

Let’s look at how to set up and start using Cloud Storage and BigQuery to analyze Fastly logs.

Fastly / BigQuery quick setup 

Before adding BigQuery as a logging endpoint for Fastly services, you need to register for a Cloud Storage account and create a Cloud Storage bucket. Once you’ve done that, follow these steps to integrate with Fastly.

Create a Google Cloud service account
BigQuery uses service accounts for third-party application authentication. To create a new service account, see Google’s guide on generating service account credentials. When you create the service account, set the key type to JSON.  

Obtain the private key and client email
Once you’ve created the service account, download the service account JSON file. This file contains the credentials for your BigQuery service account. Open the file and make a note of the private_key and client_email.

Enable the BigQuery API (if not already enabled)
To send your Fastly logs to your Cloud Storage bucket, you’ll need to enable the BigQuery API in the GCP API Manager. 

Create the BigQuery dataset
After you’ve enabled the BigQuery API, follow these instructions to create a BigQuery dataset:

Log in to BigQuery.
Click the arrow next to your account name on the sidebar and select Create new dataset.

The Create Dataset window appears.
In the Dataset ID field, type a name for the dataset (e.g., fastly_bigquery), and click the OK button. 

Add a BigQuery table
After you’ve created the BigQuery dataset, you’ll need to add a BigQuery table. There are three ways of creating the schema for the table:

Edit the schema using the BigQuery web interface
Edit the schema using the text field in the BigQuery web interface
Use an existing table

We recommend creating a new table and creating the schema using the user interface. However, you can also edit a text-based representation of the table schema. In fact, you can switch between the text version and the UI at any time. For your convenience, at the bottom of this blogpost we’ve included an example of the logging format to use in the Fastly user interface and the corresponding BigQuery schema in text format. Note: It’s important that the data you send to BigQuery from Fastly matches the schema for the table, or it could result in the data being corrupted or just silently being dropped.
As per the BigQuery documentation, click the arrow next to the dataset name on the sidebar and select Create new table.

The Create Table page appears:

In the Source Data section, select Create empty table.
In the Table name field, type a name for the table (e.g., logs).
In the Schema section of the BigQuery website, use the interface to add fields and complete the schema. Click the Create Table button.

Add BigQuery as a logging endpoint
Follow these instructions to add BigQuery as a logging endpoint:

Review the information in our Setting Up Remote Log Streaming guide.
Click the BigQuery logo. The Create a BigQuery endpoint page appears:

Fill out the Create a BigQuery endpoint fields as follows:

In the Name field, supply a human-readable endpoint name.
In the Log format field, enter the data to send to BigQuery. See the example format section for details.
In the Email field, type the client_email address associated with the BigQuery account.
In the Secret key field, type the secret key associated with the BigQuery account.
In the Project ID field, type the ID of your GCP project.
In the Dataset field, type the name of your BigQuery dataset.
In the Table field, type the name of your BigQuery table.
In the Template field, optionally type an strftime compatible string to use as the template suffix for your table.

Click Create to create the new logging endpoint.
Click the Activate button to deploy your configuration changes. 

Formatting JSON objects to send to BigQuery 
The data you send to BigQuery must be serialized as a JSON object, and every field in the JSON object must map to a string in your table’s schema. The JSON can have nested data in it (e.g., the value of a key in your object can be another object). Here’s an example format string for sending data to BigQuery:
{
“timestamp”:”%{begin:%Y-%m-%dT%H:%M:%S%z}t”,
“time_elapsed”:%{time.elapsed.usec}V,
“is_tls”:%{if(req.is_ssl, “true”, “false”)}V,
“client_ip”:”%{req.http.Fastly-Client-IP}V”,
“geo_city”:”%{client.geo.city}V”,
“geo_country_code”:”%{client.geo.country_code}V”,
“request”:”%{req.request}V”,
“host”:”%{req.http.Fastly-Orig-Host}V”,
“url”:”%{cstr_escape(req.url)}V”,
“request_referer”:”%{cstr_escape(req.http.Referer)}V”,
“request_user_agent”:”%{cstr_escape(req.http.User-Agent)}V”,
“request_accept_language”:”%{cstr_escape(req.http.Accept-Language)}V”,
“request_accept_charset”:”%{cstr_escape(req.http.Accept-Charset)}V”,
“cache_status”:”%{regsub(fastly_info.state, “^(HIT-(SYNTH)|(HITPASS|HIT|MISS|PASS|ERROR|PIPE)).*”, “23”) }V”
}

Example BigQuery schema 
The textual BigQuery schema for the example format shown above would look something like this:

timestamp:STRING,time_elapsed:FLOAT,is_tls:BOOLEAN,client_ip:STRING,geo_city:STRING,geo_co
untry_code:STRING,request:STRING,host:STRING,url:STRING,request_referer:STRING,request_use
r_agent:STRING,request_accept_language:STRING,request_accept_charset:STRING,cache_status:S
TRING
When creating your BigQuery table, click on the “Edit as Text” link and paste this example in.

Get started now
Congratulations! You’ve just configured Fastly to send its logs in real time to Cloud Storage and BigQuery, where you can easily analyze them to better understand how users are interacting with your applications. Please contact us with any questions. If you’re a current customer, we’d love to hear about how you’re using Fastly and GCP. And if you’re new to Fastly, you can try it out for free; simply sign up here to get going.

Quelle: Google Cloud Platform

Published by