Collections
Categories
Partners
Other
ByCrowdin
A CLI for the extraction of contextual information for your keys using AI
Context Harvester CLI makes it easy to get translation context from your code. Using an agentic AI approach, it automatically analyzes your project to see how each key is used. This tool-assisted, code-aware context helps both human linguists and AI models create more accurate, natural-sounding translations.
The tool is an open-source NPM package, so you can run it locally and check its security. If you use OpenAI, your code never goes to Crowdin. You can also review the extracted context before it's uploaded, giving you full control.
npm i -g crowdin-context-harvester
It's recommended that you set the following ENV variables for authentication instead of setting them as CLI arguments.
CROWDIN_PERSONAL_TOKEN
should be granted for projects and AI scopes;CROWDIN_BASE_URL
- for Crowdin Enterprise only. Should follow this format: https://<org-name>.api.crowdin.com
;CROWDIN_PROJECT_ID
- Crowdin Project ID.OpenAI:
OPENAI_KEY
- Your OpenAI API key.OPENAI_BASE_URL
- An optional OpenAI-compatible API base URL (defaults to https://api.openai.com/v1).Google Gemini (Vertex AI):
GOOGLE_VERTEX_PROJECT
- Your project identifier from the Google Cloud Console.GOOGLE_VERTEX_LOCATION
- Your project's location (e.g., us-central1).GOOGLE_VERTEX_CLIENT_EMAIL
- The client email of your Vertex AI service user.GOOGLE_VERTEX_PRIVATE_KEY
- The private key of your Vertex AI service user.MS Azure OpenAI:
AZURE_RESOURCE_NAME
- Your MS Azure resource name.AZURE_API_KEY
- Your MS Azure API key.AZURE_DEPLOYMENT_NAME
- Your MS Azure deployment name.Anthropic:
ANTHROPIC_API_KEY
- Your Anthropic API key.Mistral:
MISTRAL_API_KEY
- Your Mistral API key.To configure the CLI, run:
crowdin-context-harvester configure
This command will guide you through setting up the necessary arguments for the harvest
command.
After configuration, your command might look like this:
crowdin-context-harvester harvest \
--token="<your-crowdin-token>" \
--url="https://acme.api.crowdin.com" \
--project=<project-id> \
--ai="openai" \
--openAiKey="<your-openai-token>" \
--model="gpt-4o" \
--concurrency=10
Note: The --url
argument is required for Crowdin Enterprise only.
When this command is executed, the CLI will pull strings from your Crowdin project and, for each one, the AI agent will use tools to search and analyze your local code to determine how the string is used. This extracted context will then be saved to a CSV file.
Add the --csvFile
argument to change the resulting csv file name.
You can now review the extracted context and save the CSV. After reviewing, you can upload newly added context to Crowdin by running:
crowdin-context-harvester upload -p <project-id>
crowdin-context-harvester harvest --project=462
Pull all strings from the Crowdin project, look through all files in the local directory and try to find context for Crowdin strings.
crowdin-context-harvester harvest --project=462 --crowdinFiles="strings.xml"
Extract context for a Crowdin file, look through all local files.
crowdin-context-harvester harvest --project=462 --crowdinFiles="strings.xml" --localFiles="src/*"
Extract context for a Crowdin file, browse files in 'src' directory.
crowdin-context-harvester harvest --project=462 --croql='not (context contains "✨ AI Context")'
Extract context for strings that do not yet have AI extracted context.
crowdin-context-harvester extract --project=462 --croql="added between '2023-12-06 13:44:14' and '2023-12-07 13:44:14'" --output=terminal
Extract context for strings added during a specified time period and print output to the terminal.
crowdin-context-harvester upload --project=462 --csvFile "crowdin-context.csv"
Upload revised AI extracted context from CSV to Crowdin.
crowdin-context-harvester reset -p 462 --crowdinFiles="*.json"
Clean AI context for all JSON files in Crowdin. Original context remains unchanged, only the AI context is removed.
Note: When uploading AI context from CSV or writing extracted context directly after harvesting, the AI context is rewritten, but the original context isn't changed. Of course, AI context can be easily cleaned as shown above.
The -j, --concurrency
option, with a default of 10, allows you to configure the number of strings processed simultaneously. Tune this according to your AI provider's rate limits (RPM/TPM).
If --croql
is not specified, the CLI will try to extract the context for all strings from all Crowdin projects.
Note: The --crowdinFiles
, --localFiles
, --localIgnore
, and --screen
arguments have been removed in this version. The agentic AI approach with tools makes them obsolete.
If --crowdinFiles
or --croql
is not specified, a CLI will try to extract the context for all the strings from all the Crowdin projects.
If --localFiles
is not specified, the CLI will read all files from the current directory recursively.
CroQL is a query language that allows you to filter Crowdin resources. In this case source strings. For example, this is a query to filter all strings that do not yet have an AI provided context.
not (context contains "✨ AI Context")
Combining this CroQL query with the --autoConfirm
argument might allow you to run this CLI automatically, for example as a GitHub action that tries to find context information for any key that does not already have it.
Note: If you set the
--croql
argument, the use of--crowdinFiles
is not allowed.
The CLI provides an option -cp
or --promptFile
to use a custom prompt. This option requires a path to a file containing the custom prompt. If you want to read the prompt from the standard input, use "-" as the path.
The custom prompt text file should contain two placeholders: %strings%
and %code%
. These placeholders will be replaced with the actual strings and code content respectively when the command is run. Upon execution, the setContext
function (tool) is provided with a prompt that an AI model should use to return the result; you may want to instruct the AI model to always use it.
Here is an example of a custom prompt:
Extract the context for the following strings.
Context is useful information for linguists working on these texts or for an AI that will translate them.
If none of the strings are relevant (neither keys nor strings are found in the code), do not provide context!
Please only look for exact matches of either a string text or a key in the code, do not try to guess the context!
Any context you provide should start with 'Used as...' or 'Appears as...'.
Always call the setContext function to return the context.
Strings:
%strings%
Code:
%code%
This CLI now supports all AI providers available in Crowdin for context extraction. You can provide an API key from your preferred provider or use a provider ID from Crowdin.
For large projects, use the --screen
option to filter keys or texts before sending them to the AI model:
crowdin-context-harvester harvest ... arguments ... --screen="keys"
To remove previously added AI context, use the reset command:
crowdin-context-harvester reset
Crowdin is a platform that helps you manage and translate content into different languages. Integrate Crowdin with your repo, CMS, or other systems. Source content is always up to date for your translators, and translated content is returned automatically.
Learn More