General Availability: Azure Search parses JSON Blobs

Today, we are happy to announce general availability for JSON parsing with Azure Search’s Blob Storage indexer.

Azure Search has long supported indexers for a variety of data sources on Azure: Document DB, Azure SQL Database, Tables, and Blobs. Indexers allow for Azure Search to automatically pull data (along with changes and deletions) into an Azure Search index without writing any code. The Blob indexer in particular is interesting because it is able to crack open and index a multitude of file types: Office documents, PDFs, HTML files, and more.

With today’s announcement, we are releasing the ability for the Blob Storage indexer to parse JSON content stored in blobs. This capability is not currently configurable in the Azure Portal. Note that support for parsing multiple documents from JSON arrays remains in preview.

Indexing JSON objects

With JSON parsing enabled, the Blob Storage Indexer can index properties of JSON objects, like the example below, into separate fields in your search index.

{
"text" : "A hopefully useful article explaining how to parse JSON blobs",
"datePublished" : "2016-04-13"
"tags" : [ "search", "storage", "howto" ]
}

To set up JSON parsing, create a datasource as usual:

POST https://[service name].search.windows.net/datasources?api-version=2016-09-01
Content-Type: application/json
api-key: [admin key]

{
"name" : "my-blob-datasource",
"type" : "azureblob",
"credentials" : { "connectionString" : "DefaultEndpointsProtocol=https;AccountName=;AccountKey=;" },
"container" : { "name" : "my-container", "query" : "optional, my-folder" }
}

Then, create an indexer (https://docs.microsoft.com/rest/api/searchservice/create-indexer) and set the parsingMode parameter to json.

POST https://[service name].search.windows.net/indexers?api-version=2016-09-01
Content-Type: application/json
api-key: [admin key]

{
"name" : "my-json-indexer",
"dataSourceName" : "my-blob-datasource",
"targetIndexName" : "my-target-index",
"schedule" : { "interval" : "PT2H" },
"parameters" : { "configuration" : { "parsingMode" : "json" } }
}

Azure Search only supports primitive data types, string arrays, and GeoJSON points, which means that the Blob Storage indexer cannot index arbitrary JSON. However, it is possible to select parts of the JSON object and “lift” them to top-level fields of an Azure Search document. To learn more about this, visit our documentation on field mappings. 

Learn More

Read more about Azure Search and its capabilities and visit our documentation. Please visit our pricing page to learn about the various tiers of service to fit your needs.
Quelle: Azure

Published by