Crowdin Context Harvester CLI FreeBeta
ByCrowdinVerified Author

A CLI for the extraction of contextual information for your keys using AI

Try Crowdin

About Context Harvester CLI

Copy link

What does a typical localization file look like? A list of keys and values. In the best case, a key might look like "app.settings.general.clear_cache_button_title". One problem with localization resource files that has always existed is that they do not carry enough contextual information for the linguist (or AI) to understand where exactly this string is being used. It's recommended, but rarely done, that each key has a comment from a product team about where it's used, or an instruction on how to translate it. Because UI texts are usually short, they may not contain enough contextual information on their own, making them difficult to translate, and that's one of the biggest reasons for poor localization.

Crowdin has many tools to extend context, if the file format of the localization resource allows it, context information can be provided by developers there. Context can be edited using the Crowdin's web editor. However, reality often shows that providing context for many keys in a large localization project is a tedious task. Not only is it time consuming, but for a developer who might be tasked to do it, it's not that easy as the task would involve jumping between different parts of the large codebase trying to understand what the logic is there and when the text would be presented to a product consumer.

Well, this experiment Crowdin app wants to simplify that. What we thought is that the code itself is probably the best explanation how the text is used. Code is much more deterministic than human language and shows exactly how any translatable text from your UI will be displayed to the consumer.

What this app does is it pulls all the keys from your Crowdin project, then goes through your project code and tries to figure out how that key is used with the LLM. The summary is then saved back to Crowdin so that both human linguists and AI can have more confidence in translating the text.

This application is shared as an open source NPM package, so you can check it's security and run it locally to extract context. When using an OpenAI as your AI provider, your code will never be exposed to Crowdin. The extracted context can be verified before uploading to Crowdin.

Installation

Copy link
npm i -g crowdin-context-harvester

Environment Variables

Copy link

It's recommended that you set the following ENV variables for authentication instead of setting them as CLI arguments.

  • CROWDIN_TOKEN should be granted for projects and AI scopes;
  • CROWDIN_ORG - for Crowdin Enterprise only. Example value: 'acme';
  • OPENAI_KEY - when using OpenAI for AI context extraction;

Demo

Copy link

Initial Setup

Copy link

To configure the CLI, run:

crowdin-context-harvester configure

This command will guide you through setting up the necessary arguments for the harvest command.

Usage

Copy link

After configuration, your command might look like this:

crowdin-context-harvester harvest\
    --token="<your-crowdin-token>"\
    --org="acme"\ 
    --project=<project-id>\
    --ai="openai"\
    --openAiKey="<your-openai-token>"\
    --model="gpt-4o"\
    --localFiles="**/*.*"\
    --localIgnore="/**/node_modules/**;bin;"\
    --crowdinFiles="*.json"\
    --screen="keys"\
    --output="csv"

Note: The org argument is required for Crowdin Enterprise only.

When this command is executed, the CLI will pull strings from all Crowdin files that match the --crowdinFiles glob pattern, then go through all files that match --localFiles, check if strings from Crowdin files are present in every file from that list (because of the --screen="keys"), and if they are, both matching strings and the code files will be sent to LLM with a prompt to extract contextual information, information about how these strings are used in the code, how they appear to the end user in the UI, etc.

Extracted context will be saved to the csv file. Add the --csvFile argument to change the resulting csv file name.

You can now review the extracted context and save the CSV. After reviewing, you can upload newly added context to Crowdin by running:

crowdin-context-harvester upload -p <project-id>

Examples

Copy link
crowdin-context-harvester harvest --project=462

Pull all strings from the Crowdin project, look through all files in the local directory and try to find context for Crowdin strings.

crowdin-context-harvester harvest --project=462 --crowdinFiles="strings.xml"

Extract context for a Crowdin file, look through all local files.

crowdin-context-harvester harvest --project=462 --crowdinFiles="strings.xml" --localFiles="src/*"

Extract context for a Crowdin file, browse files in 'src' directory.

crowdin-context-harvester harvest --project=462 --croql='not (context contains "✨ AI Context")'

Extract context for strings that do not yet have AI extracted context.

crowdin-context-harvester extract --project=462 --croql="added between '2023-12-06 13:44:14' and '2023-12-07 13:44:14'" --output=terminal

Extract context for strings added during a specified time period and print output to the terminal.

crowdin-context-harvester upload --project=462 --csvFile "crowdin-context.csv"

Upload revised AI extracted context from CSV to Crowdin.

crowdin-context-harvester reset -p 462 --crowdinFiles="*.json"

Clean AI context for all JSON files in Crowdin. Original context remains unchanged, only the AI context is removed.

Note: When uploading AI context from CSV or writing extracted context directly after harvesting, the AI context is rewritten, but the original context isn't changed. Of course, AI context can be easily cleaned as shown above.

Options

Copy link

Both of the glob patterns for --localFiles and --localIgnore can have more than one glob pattern, separated by a ;.

--localFiles="../crowdin-context-harvester/**/index.js;../crowdin-context-harvester/**/cli.js".
--localIgnore="/**/node_modules/**;src"

If --crowdinFiles or --croql is not specified, a CLI will try to extract the context for all the strings from all the Crowdin projects.

If --localFiles is not specified, the CLI will read all files from the current directory recursively.

Using CroQL

Copy link

CroQL is a query language that allows you to filter Crowdin resources. In this case source strings. For example, this is a query to filter all strings that do not yet have an AI provided context.

not (context contains "✨ AI Context")

Combining this CroQL query with the --autoConfirm argument might allow you to run this CLI automatically, for example as a GitHub action that tries to find context information for any key that does not already have it.

Note: If you set the --croql argument, the use of --crowdinFiles is not allowed.

Custom Prompt

Copy link

The CLI provides an option -cp or --promptFile to use a custom prompt. This option requires a path to a file containing the custom prompt. If you want to read the prompt from the standard input, use "-" as the path.

The custom prompt text file should contain two placeholders: %strings% and %code%. These placeholders will be replaced with the actual strings and code content respectively when the command is run. Upon execution, the setContext function (tool) is provided with a prompt that an AI model should use to return the result; you may want to instruct the AI model to always use it.

Here is an example of a custom prompt:

Extract the context for the following strings. 
Context is useful information for linguists working on these texts or for an AI that will translate them.
If none of the strings are relevant (neither keys nor strings are found in the code), do not provide context!
Please only look for exact matches of either a string text or a key in the code, do not try to guess the context!
Any context you provide should start with 'Used as...' or 'Appears as...'.
Always call the setContext function to return the context.
        
Strings:
%strings%

Code:
%code%

AI Providers

Copy link

This CLI requires the OpenAI AI Provider to extract context. You can either provide an OpenAI API key or a provider ID from Crowdin.

Handling Large Projects

Copy link

For large projects, use the --screen option to filter keys or texts before sending them to the AI model:

crowdin-context-harvester harvest ... arguments ... --screen="keys"

Removing AI Context

Copy link

To remove previously added AI context, use the reset command:

crowdin-context-harvester clean

Crowdin is a platform that helps you manage and translate content into different languages. Integrate Crowdin with your repo, CMS, or other systems. Source content is always up to date for your translators, and translated content is returned automatically.

Learn More
Categories
AI
Works with
  • Crowdin Enterprise
  • crowdin.com
Details

Released on May 20, 2024

Updated on May 31, 2024

Published by Crowdin

Identifier:crowdin-context-harvester-cli