JSONStructuredExtraction​J​S​O​N​Structured​Extraction

Extract structured JSON with LLMs

Convert unstructured text into a JSON object with predefined fields. Provide a schema name and the list of fields to extract. Compatible with OpenAI, Gemini, and Ollama.

yaml
type: "io.kestra.plugin.ai.completion.JSONStructuredExtraction"

Extract person fields (Gemini)

yaml
id: json_structured_extraction
namespace: company.ai

tasks:
  - id: extract_person
    type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
    schemaName: Person
    jsonFields:
      - name
      - city
      - country
      - email
    prompt: |
      From the text below, extract the person's name, city, and email.
      If a field is missing, leave it blank.

      Text:
      "Hi! I'm John Smith from Paris, France. You can reach me at john.smith@example.com."
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      apiKey: "{{ kv('GEMINI_API_KEY') }}"
      modelName: gemini-2.5-flash

Extract order details (OpenAI)

yaml
id: json_structured_extraction_order
namespace: company.ai

tasks:
  - id: extract_order
    type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
    schemaName: Order
    jsonFields:
      - order_id
      - customer_name
      - city
      - total_amount
    prompt: |
      Extract the order_id, customer_name, city, and total_amount from the message.
      For the total amount, keep only the number without the currency symbol.
      Return only JSON with the requested keys.

      Message:
      "Order #A-1043 for Jane Doe, shipped to Berlin. Total: 249.99 EUR."
    provider:
      type: io.kestra.plugin.ai.provider.OpenAI
      apiKey: "{{ kv('OPENAI_API_KEY') }}"
      modelName: gpt-5-mini
Properties
SubType string

JSON Fields

List of fields to extract from the text

Text prompt

The input prompt for the AI model

Language Model Provider

Schema Name

The name of the JSON schema for structured extraction

Default {}

Chat configuration

Extracted JSON

The structured JSON output

Possible Values
STOPLENGTHTOOL_EXECUTIONCONTENT_FILTEROTHER

Finish reason

Schema Name

The schema name used for the structured JSON extraction

Token usage

API endpoint

The Azure OpenAI endpoint in the format: https://{resource}.openai.azure.com/

Model name

API Key

Client ID

Client secret

API version

Tenant ID

Endpoint URL

Project location

Model name

Project ID

API Key

Model name

API Key

Model name

API base URL

Model endpoint

Model name

API Key

Model name

API base URL

Log LLM requests

If true, prompts and configuration sent to the LLM will be logged at INFO level.

Log LLM responses

If true, raw responses from the LLM will be logged at INFO level.

Response format

Defines the expected output format. Default is plain text. Some providers allow requesting JSON or schema-constrained outputs, but support varies and may be incompatible with tool use. When using a JSON schema, the output will be returned under the key jsonOutput.

Seed

Optional random seed for reproducibility. Provide a positive integer (e.g., 42, 1234). Using the same seed with identical settings produces repeatable outputs.

Temperature

Controls randomness in generation. Typical range is 0.0–1.0. Lower values (e.g., 0.2) make outputs more focused and deterministic, while higher values (e.g., 0.7–1.0) increase creativity and variability.

Top-K

Limits sampling to the top K most likely tokens at each step. Typical values are between 20 and 100. Smaller values reduce randomness; larger values allow more diverse outputs.

Top-P (nucleus sampling)

Selects from the smallest set of tokens whose cumulative probability is ≤ topP. Typical values are 0.8–0.95. Lower values make the output more focused, higher values increase diversity.

API Key

Model name

Default https://api.deepseek.com/v1

API base URL

JSON Schema (used when type = JSON)

Provide a JSON Schema describing the expected structure of the response. In Kestra flows, define the schema in YAML (it is still a JSON Schema object). Example (YAML):

text
responseFormat: 
    type: JSON
    jsonSchema: 
      type: object
      required: ["category", "priority"]
      properties: 
        category: 
          type: string
          enum: ["ACCOUNT", "BILLING", "TECHNICAL", "GENERAL"]
        priority: 
          type: string
          enum: ["LOW", "MEDIUM", "HIGH"]

Note: Provider support for strict schema enforcement varies. If unsupported, guide the model about the expected output structure via the prompt and validate downstream.

Schema description (optional)

Natural-language description of the schema to help the model produce the right fields. Example: "Classify a customer ticket into category and priority."

Default TEXT
Possible Values
TEXTJSON

Response format type

Specifies how the LLM should return output. Allowed values:

  • TEXT (default): free-form natural language.
  • JSON: structured output validated against a JSON Schema.

AWS Access Key ID

Model name

AWS Secret Access Key

Default COHERE
Possible Values
COHERETITAN

Amazon Bedrock Embedding Model Type

API Key

Model name