Tailwind Logo

Sitecore Search - Multi-Language Content Import (Part 1)

Search

Published: 2023-09-06

You can create a source in the form of a search result on a multilingual site that has information about the different languages of a single piece of content. We will introduce this procedure in two parts.

Crawl for localized content

If the same content is deployed in multiple languages, for example, Siteocre.com has the following information

  • https://www.sitecore.com/whats-new
  • https://www.sitecore.com/ja-jp/whats-new

Both are new information, but provided in the form of English and Japanese versions. On the other hand, it is useful to be able to switch between languages when the search results provide results for each language. In this article, we will show you how to crawl in such cases.

The procedure is available on the following page.

Determine the ID of the content

The content ID is assigned a random string by default.

searchcontentid01.png

This ID can be specified individually in the source settings. This value must of course be unique, so the URL is used to create the ID in the JavaScript of the Document Extractors.

Add the following code to create an ID from a URL

JavaScript
function extract(request, response) {
    $ = response.body;

    let url = request.url;

    // URL から id を作成する
    let id = url.replaceAll('/', '_').replaceAll(':', '_').replaceAll('.', '_');    

Then, for the value returned, add one line of code for the following id

JavaScript
    return [{
        'id': id,

Now save the file and run the source acquisition again.

Check crawl results

After a short time, the source will be updated and crawled and indexed.

searchcontentid02.png

We have confirmed that the id is URL-based.

Summary

In this case, we used URLs to create the id. In the next article, we will remove the locale part and include information on content in multiple languages for a single ID to create data that can be searched in multiple languages.

Tags