Crowdin Context Harvester CLI BetaFree

ByCrowdinVerified Author

A CLI for the extraction of contextual information for your keys using AI

Try Crowdin

About Context Harvester CLI

Copy link

Context Harvester CLI makes it easy to get translation context from your code. Using an agentic AI approach, it automatically analyzes your project to see how each key is used. This tool-assisted, code-aware context helps both human linguists and AI models create more accurate, natural-sounding translations.

The tool is an open-source NPM package, so you can run it locally and check its security. If you use OpenAI, your code never goes to Crowdin. You can also review the extracted context before it's uploaded, giving you full control.

Key Features

Copy link
  • Agentic Context Extraction: The tool uses an agentic AI that, with the help of specialized tools like glob and grep, performs a series of targeted steps to locate and understand how strings are used in your code. This results in precise, deep, and code-aware context.
  • Per-String Processing: The harvester processes each string independently with configurable concurrency, which can significantly speed up runs on large projects.
  • Flexible Configuration: Get started fast with a simple configure command.
  • CroQL Query Support: Filter Crowdin resources with advanced queries.
  • Custom Prompting: Tailor context extraction with custom prompts.
  • Automation or Precision: Automatically save context, or review it first for total control.

Demo

Copy link

Installation

Copy link
npm i -g crowdin-context-harvester

Environment Variables

Copy link

It's recommended that you set the following ENV variables for authentication instead of setting them as CLI arguments.

  • CROWDIN_PERSONAL_TOKEN should be granted for projects and AI scopes;
  • CROWDIN_BASE_URL - for Crowdin Enterprise only. Should follow this format: https://<org-name>.api.crowdin.com;
  • CROWDIN_PROJECT_ID - Crowdin Project ID.

OpenAI:

  • OPENAI_KEY - Your OpenAI API key.
  • OPENAI_BASE_URL - An optional OpenAI-compatible API base URL (defaults to https://api.openai.com/v1).

Google Gemini (Vertex AI):

  • GOOGLE_VERTEX_PROJECT - Your project identifier from the Google Cloud Console.
  • GOOGLE_VERTEX_LOCATION - Your project's location (e.g., us-central1).
  • GOOGLE_VERTEX_CLIENT_EMAIL - The client email of your Vertex AI service user.
  • GOOGLE_VERTEX_PRIVATE_KEY - The private key of your Vertex AI service user.

MS Azure OpenAI:

  • AZURE_RESOURCE_NAME - Your MS Azure resource name.
  • AZURE_API_KEY - Your MS Azure API key.
  • AZURE_DEPLOYMENT_NAME - Your MS Azure deployment name.

Anthropic:

  • ANTHROPIC_API_KEY - Your Anthropic API key.

Mistral:

  • MISTRAL_API_KEY - Your Mistral API key.

Initial Setup

Copy link

To configure the CLI, run:

crowdin-context-harvester configure

This command will guide you through setting up the necessary arguments for the harvest command.

Usage

Copy link

After configuration, your command might look like this:

crowdin-context-harvester harvest \
    --token="<your-crowdin-token>" \
    --url="https://acme.api.crowdin.com" \ 
    --project=<project-id> \
    --ai="openai" \
    --openAiKey="<your-openai-token>" \
    --model="gpt-4o" \
    --concurrency=10

Note: The --url argument is required for Crowdin Enterprise only.

When this command is executed, the CLI will pull strings from your Crowdin project and, for each one, the AI agent will use tools to search and analyze your local code to determine how the string is used. This extracted context will then be saved to a CSV file.

Add the --csvFile argument to change the resulting csv file name.

You can now review the extracted context and save the CSV. After reviewing, you can upload newly added context to Crowdin by running:

crowdin-context-harvester upload -p <project-id>

Examples

Copy link
crowdin-context-harvester harvest --project=462

Pull all strings from the Crowdin project, look through all files in the local directory and try to find context for Crowdin strings.

crowdin-context-harvester harvest --project=462 --crowdinFiles="strings.xml"

Extract context for a Crowdin file, look through all local files.

crowdin-context-harvester harvest --project=462 --crowdinFiles="strings.xml" --localFiles="src/*"

Extract context for a Crowdin file, browse files in 'src' directory.

crowdin-context-harvester harvest --project=462 --croql='not (context contains "✨ AI Context")'

Extract context for strings that do not yet have AI extracted context.

crowdin-context-harvester extract --project=462 --croql="added between '2023-12-06 13:44:14' and '2023-12-07 13:44:14'" --output=terminal

Extract context for strings added during a specified time period and print output to the terminal.

crowdin-context-harvester upload --project=462 --csvFile "crowdin-context.csv"

Upload revised AI extracted context from CSV to Crowdin.

crowdin-context-harvester reset -p 462 --crowdinFiles="*.json"

Clean AI context for all JSON files in Crowdin. Original context remains unchanged, only the AI context is removed.

Note: When uploading AI context from CSV or writing extracted context directly after harvesting, the AI context is rewritten, but the original context isn't changed. Of course, AI context can be easily cleaned as shown above.

Options

Copy link
  • The -j, --concurrency option, with a default of 10, allows you to configure the number of strings processed simultaneously. Tune this according to your AI provider's rate limits (RPM/TPM).

  • If --croql is not specified, the CLI will try to extract the context for all strings from all Crowdin projects.

Note: The --crowdinFiles, --localFiles, --localIgnore, and --screen arguments have been removed in this version. The agentic AI approach with tools makes them obsolete.

If --crowdinFiles or --croql is not specified, a CLI will try to extract the context for all the strings from all the Crowdin projects.

If --localFiles is not specified, the CLI will read all files from the current directory recursively.

Using CroQL

Copy link

CroQL is a query language that allows you to filter Crowdin resources. In this case source strings. For example, this is a query to filter all strings that do not yet have an AI provided context.

not (context contains "✨ AI Context")

Combining this CroQL query with the --autoConfirm argument might allow you to run this CLI automatically, for example as a GitHub action that tries to find context information for any key that does not already have it.

Note: If you set the --croql argument, the use of --crowdinFiles is not allowed.

Custom Prompt

Copy link

The CLI provides an option -cp or --promptFile to use a custom prompt. This option requires a path to a file containing the custom prompt. If you want to read the prompt from the standard input, use "-" as the path.

The custom prompt text file should contain two placeholders: %strings% and %code%. These placeholders will be replaced with the actual strings and code content respectively when the command is run. Upon execution, the setContext function (tool) is provided with a prompt that an AI model should use to return the result; you may want to instruct the AI model to always use it.

Here is an example of a custom prompt:

Extract the context for the following strings. 
Context is useful information for linguists working on these texts or for an AI that will translate them.
If none of the strings are relevant (neither keys nor strings are found in the code), do not provide context!
Please only look for exact matches of either a string text or a key in the code, do not try to guess the context!
Any context you provide should start with 'Used as...' or 'Appears as...'.
Always call the setContext function to return the context.
        
Strings:
%strings%

Code:
%code%

AI Providers

Copy link

This CLI now supports all AI providers available in Crowdin for context extraction. You can provide an API key from your preferred provider or use a provider ID from Crowdin.

Handling Large Projects

Copy link

For large projects, use the --screen option to filter keys or texts before sending them to the AI model:

crowdin-context-harvester harvest ... arguments ... --screen="keys"

Removing AI Context

Copy link

To remove previously added AI context, use the reset command:

crowdin-context-harvester reset

Crowdin is a platform that helps you manage and translate content into different languages. Integrate Crowdin with your repo, CMS, or other systems. Source content is always up to date for your translators, and translated content is returned automatically.

Learn More
Categories
AI
Works with
  • Crowdin Enterprise
  • crowdin.com
Details

Released on May 20, 2024

Updated on Aug 22, 2025

Published by Crowdin

Identifier:crowdin-context-harvester-cli

System
Built by Crowdin
OpenAI
File Context Extractor
AI Context Harvester CLI for Crowdin