JSONStructuredExtraction

Extract structured JSON with LLMs

Convert unstructured text into a JSON object with predefined fields. Provide a schema name and the list of fields to extract. Compatible with OpenAI, Gemini, and Ollama.

yaml
type: "io.kestra.plugin.ai.completion.JSONStructuredExtraction"

Examples

Extract person fields (Gemini)

yaml
id: json_structured_extraction
namespace: company.ai

tasks:
  - id: extract_person
    type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
    schemaName: Person
    jsonFields:
      - name
      - city
      - country
      - email
    prompt: |
      From the text below, extract the person's name, city, and email.
      If a field is missing, leave it blank.

      Text:
      "Hi! I'm John Smith from Paris, France. You can reach me at john.smith@example.com."
    systemMessage: You extract structured data in JSON format.
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      apiKey: "{{ kv('GEMINI_API_KEY') }}"
      modelName: gemini-2.5-flash

Extract order details (OpenAI)

yaml
id: json_structured_extraction_order
namespace: company.ai

tasks:
  - id: extract_order
    type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
    schemaName: Order
    jsonFields:
      - order_id
      - customer_name
      - city
      - total_amount
    prompt: |
      Extract the order_id, customer_name, city, and total_amount from the message.
      For the total amount, keep only the number without the currency symbol.
      Return only JSON with the requested keys.

      Message:
      "Order #A-1043 for Jane Doe, shipped to Berlin. Total: 249.99 EUR."
    systemMessage: You are a precise JSON data extraction assistant.
    provider:
      type: io.kestra.plugin.ai.provider.OpenAI
      apiKey: "{{ kv('OPENAI_API_KEY') }}"
      modelName: gpt-5-mini

Properties

jsonFields *array

SubType string

JSON Fields

List of fields to extract from the text

provider *AmazonBedrock Anthropic AzureOpenAI DashScope DeepSeek GoogleGemini GoogleVertexAI HuggingFace LocalAI MistralAI OciGenAI Ollama OpenAI OpenRouter WorkersAI ZhiPuAI

Language Model Provider

schemaName *string

Schema Name

The name of the JSON schema for structured extraction

configuration ChatConfiguration

Default {}

Chat configuration

prompt string

Text prompt

The input text for structured JSON extraction.

systemMessage string

Default You are a structured JSON extraction assistant. Always respond with valid JSON.

System message

Optional system instruction for the model.

Outputs

extractedJson string

Extracted JSON

The structured JSON output

finishReason string

Possible Values

STOPLENGTHTOOL_EXECUTIONCONTENT_FILTEROTHER

Finish reason

schemaName string

Schema Name

The schema name used for the structured JSON extraction

tokenUsage TokenUsage

Token usage

Metrics

input.token.count counter

Unit token

Large Language Model (LLM) input token count

output.token.count counter

Unit token

Large Language Model (LLM) output token count

total.token.count counter

Unit token

Large Language Model (LLM) total token count

Definitions

Azure OpenAI Model Provider

endpoint *string

API endpoint

The Azure OpenAI endpoint in the format: https://{resource}.openai.azure.com/

modelName *string

Model name

type *object

apiKey string

API Key

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientId string

Client ID

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

clientSecret string

Client secret

serviceVersion string

API version

tenantId string

Tenant ID

Google VertexAI Model Provider

endpoint *string

Endpoint URL

location *string

Project location

modelName *string

Model name

project *string

Project ID

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

Google Gemini Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

Mistral AI Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

ZhiPu AI Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Default https://open.bigmodel.cn/

API base URL

The base URL for ZhiPu API (defaults to https://open.bigmodel.cn/)

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

maxRetries integerstring

The maximum retry times to request

maxToken integerstring

The maximum number of tokens returned by this request

stops array

SubType string

With the stop parameter, the model will automatically stop generating text when it is about to contain the specified string or token_id

OciGenAI Model Provider

compartmentId *string

OCID of OCI Compartment with the model

modelName *string

Model name

region *string

OCI Region to connect the client to

type *object

authProvider string

OCI SDK Authentication provider

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

Deepseek Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Default https://api.deepseek.com/v1

API base URL

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

io.kestra.plugin.ai.domain.ChatConfiguration-ResponseFormat

jsonSchema object

JSON Schema (used when type = JSON)

Provide a JSON Schema describing the expected structure of the response. In Kestra flows, define the schema in YAML (it is still a JSON Schema object). Example (YAML):

text

responseFormat: 
    type: JSON
    jsonSchema: 
      type: object
      required: ["category", "priority"]
      properties: 
        category: 
          type: string
          enum: ["ACCOUNT", "BILLING", "TECHNICAL", "GENERAL"]
        priority: 
          type: string
          enum: ["LOW", "MEDIUM", "HIGH"]

Note: Provider support for strict schema enforcement varies. If unsupported, guide the model about the expected output structure via the prompt and validate downstream.

jsonSchemaDescription string

Schema description (optional)

Natural-language description of the schema to help the model produce the right fields. Example: "Classify a customer ticket into category and priority."

type string

Default TEXT

Possible Values

TEXTJSON

Response format type

Specifies how the LLM should return output. Allowed values:

TEXT (default): free-form natural language.
JSON: structured output validated against a JSON Schema.

Anthropic AI Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

maxTokens integerstring

Maximum Tokens

Specifies the maximum number of tokens that the model is allowed to generate in its response.

OpenRouter Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

Ollama Model Provider

endpoint *string

Model endpoint

modelName *string

Model name

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

OpenAI Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Default https://api.openai.com/v1

API base URL

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

io.kestra.plugin.ai.domain.ChatConfiguration

logRequests booleanstring

Log LLM requests

If true, prompts and configuration sent to the LLM will be logged at INFO level.

logResponses booleanstring

Log LLM responses

If true, raw responses from the LLM will be logged at INFO level.

maxToken integerstring

Maximum number of tokens the model can generate in the completion (response). This limits the length of the output.

responseFormat ChatConfiguration-ResponseFormat

Response format

Defines the expected output format. Default is plain text. Some providers allow requesting JSON or schema-constrained outputs, but support varies and may be incompatible with tool use. When using a JSON schema, the output will be returned under the key jsonOutput.

returnThinking booleanstring

Return Thinking

Controls whether to return the model's internal reasoning or 'thinking' text, if available. When enabled, the reasoning content is extracted from the response and made available in the AiMessage object. It Does not trigger the thinking process itself—only affects whether the output is parsed and returned.

seed integerstring

Seed

Optional random seed for reproducibility. Provide a positive integer (e.g., 42, 1234). Using the same seed with identical settings produces repeatable outputs.

temperature numberstring

Temperature

Controls randomness in generation. Typical range is 0.0–1.0. Lower values (e.g., 0.2) make outputs more focused and deterministic, while higher values (e.g., 0.7–1.0) increase creativity and variability.

thinkingBudgetTokens integerstring

Thinking Token Budget

Specifies the maximum number of tokens allocated as a budget for internal reasoning processes, such as generating intermediate thoughts or chain-of-thought sequences, allowing the model to perform multi-step reasoning before producing the final output.

thinkingEnabled booleanstring

Enable Thinking

Enables internal reasoning ('thinking') in supported language models, allowing the model to perform intermediate reasoning steps before producing a final output; this is useful for complex tasks like multi-step problem solving or decision making, but may increase token usage and response time, and is only applicable to compatible models.

topK integerstring

Top-K

Limits sampling to the top K most likely tokens at each step. Typical values are between 20 and 100. Smaller values reduce randomness; larger values allow more diverse outputs.

topP numberstring

Top-P (nucleus sampling)

Selects from the smallest set of tokens whose cumulative probability is ≤ topP. Typical values are 0.8–0.95. Lower values make the output more focused, higher values increase diversity.

DashScope (Qwen) Model Provider from Alibaba Cloud

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Default https://dashscope-intl.aliyuncs.com/api/v1

API base URL

text

If you use a model in the China (Beijing) region, you need to replace the URL with: https://dashscope.aliyuncs.com/api/v1,
otherwise use the Singapore region of: "https://dashscope-intl.aliyuncs.com/api/v1.
The default value is computed based on the system timezone.

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

enableSearch booleanstring

Whether the model uses Internet search results for reference when generating text or not

maxTokens integerstring

The maximum number of tokens returned by this request

repetitionPenalty numberstring

Repetition in a continuous sequence during model generation

text

Increasing repetition_penalty reduces the repetition in model generation,
1.0 means no penalty. Value range: (0, +inf)

io.kestra.plugin.ai.domain.TokenUsage

inputTokenCount integer

outputTokenCount integer

totalTokenCount integer

LocalAI Model Provider

baseUrl *string

API base URL

modelName *string

Model name

type *object

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

Amazon Bedrock Model Provider

accessKeyId *string

AWS Access Key ID

modelName *string

Model name

secretAccessKey *string

AWS Secret Access Key

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

modelType string

Default COHERE

Possible Values

COHERETITAN

Amazon Bedrock Embedding Model Type

HuggingFace Model Provider

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Default https://router.huggingface.co/v1

API base URL

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

WorkersAI Model Provider

accountId *string

Account Identifier

Unique identifier assigned to an account

apiKey *string

API Key

modelName *string

Model name

type *object

baseUrl string

Base URL

Custom base URL to override the default endpoint (useful for local tests, WireMock, or enterprise gateways).

caPem string

CA PEM certificate content

CA certificate as text, used to verify SSL/TLS connections when using custom endpoints.

clientPem string

Client PEM certificate content

PEM client certificate as text, used to authenticate the connection to enterprise AI endpoints.

​J​S​O​N​Structured​Extraction

JSONStructuredExtraction