What is the difference between Invoke and Stream modes in Eino?

Invoke mode waits for the component to complete processing and returns the full result, suitable for batch scenarios where latency isn't critical. Stream mode returns a StreamReader that allows token-by-token reading, significantly reducing time-to-first-token and ideal for real-time conversations and interactive UIs.

What happens when a downstream node doesn't support stream input?

The selected Eino revision may adapt a stream for a downstream node that accepts only complete values, but buffering, memory, cancellation, and error behavior must be verified for the node combination.

What's the difference between OnStartWithStream and OnStart callbacks?

OnStart receives the complete input value, while OnStartWithStream receives a StreamReader of the input. When a node receives input as a stream, the system calls OnStartWithStream instead of OnStart, allowing you to observe input data chunk by chunk in your callback.

How do I add a Callback to only a specific node?

Use the DesignateNode method to scope the callback. For example, WithCallbacks(handler).DesignateNode('node_1') makes that handler apply only to the node named 'node_1' without affecting other nodes in the Graph.

Does Eino's OpenTelemetry integration require additional configuration?

When supported by the selected revision, an OTel callback can connect node lifecycle events to a TracerProvider. Configure exporters, sampling, redaction, retention, and access controls; span creation is not evidence of complete tracing.

Eino Streaming and Callback System: Production Observability in Go

2026-06-03 - QubitTool Tech Team

TL;DR

LLM outputs are inherently streaming — models generate tokens one at a time, not as a complete response. How do you efficiently propagate stream data through an AI Agent orchestration pipeline? How do you inject logging, tracing, and monitoring without coupling them to business logic? Eino solves both with StreamReader/StreamWriter primitives and a Callback aspect system. This article dissects these two core mechanisms from low-level primitives to production-grade observability.

Key Takeaways
Why Streaming Matters
Eino Streaming Paradigm
Component Streaming Matrix
Streaming in Orchestration
Callback Aspect System
Production Observability
Practice: Full-Chain Logging and Latency Tracing
Best Practices
FAQ
Summary
Related Resources

Key Takeaways

Invocation modes: Components expose the invocation modes supported by their contracts; do not assume every component streams
StreamReader/StreamWriter: Type-safe generic streaming primitives that form the foundation of Eino's stream architecture
Stream conversion: Concatenation, splitting, and type conversion depend on the selected revision and node contracts
Four callback hooks: OnStart / OnEnd / OnStartWithStream / OnEndWithStream cover the entire execution lifecycle
Scope control: Three granularity levels for callback injection — global, per-type, and per-node
OTel integration: Callback-based tracing requires exporter, sampling, redaction, retention, and access-control configuration

Why Streaming Matters

In a non-streaming client, users may see no output until generation completes. Streaming changes delivery behavior, but first-token timing depends on the provider and transport:

Dimension	Non-Streaming	Streaming
Time to first token	Workload- and provider-dependent	Workload-, provider-, and transport-dependent
Memory usage	Must buffer complete response	Process chunk by chunk, constant memory
User experience	Long wait with no feedback	Real-time typewriter effect

In Agentic Workflows, expose only user-safe progress or final content. Hidden chain-of-thought, secrets, tool arguments, and untrusted intermediate data should not be shown by default.

Streaming is not necessarily essential for multi-step agents. Tool calls often require a complete and validated request; any pipelining benefit must be demonstrated with the selected provider and cancellation policy.

Eino Streaming Paradigm

Invoke vs Stream

Eino defines two invocation interfaces for every component. The Invoke mode follows a request-response pattern familiar from traditional RPC — you call a function and block until the complete result is ready. The Stream mode returns a StreamReader immediately and delivers results incrementally. Both modes share the same underlying component logic; the framework handles the conversion between them transparently:

// Invoke: wait for the complete result
result, err := chatModel.Generate(ctx, messages)

// Stream: returns a StreamReader for chunk-by-chunk reading
streamReader, err := chatModel.Stream(ctx, messages)
if err != nil {
    return err
}
defer streamReader.Close()
for {
    chunk, err := streamReader.Recv()
    if err == io.EOF {
        break
    }
    if err != nil {
        return err
    }
    fmt.Print(chunk.Content)
}

StreamReader and StreamWriter

StreamReader[T] and StreamWriter[T] are the core primitives of Eino's streaming system. They leverage Go generics for type-safe stream communication:

// Create a stream pair
reader, writer := schema.Pipe[*schema.Message](bufferSize)

// Writer side (typically inside the component)
go func() {
    defer writer.Close()
    for _, chunk := range chunks {
        if closed := writer.Send(chunk, nil); closed {
            return
        }
    }
}()

// Reader side (consumed by downstream)
for {
    msg, err := reader.Recv()
    if err == io.EOF {
        break
    }
    process(msg)
}

Key design decisions:

Type safety: Generic parameter [T] ensures compile-time type checking
Backpressure: bufferSize controls buffer capacity, preventing producer from running too far ahead
Graceful shutdown: Close() signals stream completion; Recv() returns io.EOF

Component Streaming Matrix

Different components have varying levels of stream support. Verify the selected revision's adapters and buffering behavior:

Component Type	Invoke Input	Stream Input	Invoke Output	Stream Output
ChatModel	✅	❌	✅	✅
PromptTemplate	✅	❌	✅	❌
Retriever	✅	❌	✅	❌
ToolNode	✅	❌	✅	❌
Lambda	✅	Optional	✅	Optional
Transformer	✅	✅	✅	✅

Key insight: ChatModel commonly provides stream output, but downstream buffering, moderation, transport, and cancellation determine whether users receive incremental output.

Streaming in Orchestration

When you call an orchestration Graph in Stream mode, inspect which stream conversions the selected revision supports between each node:

graph LR A["ChatModel (Stream Output)"] -->|"StreamReader[Message]"| B{"Downstream Type?"} B -->|"Supports Stream Input"| C["Transformer (Direct Passthrough)"] B -->|"Invoke Only"| D["Auto Concat"] D --> E["Lambda (Complete Value)"] A -->|"Fork"| F["Callback Observer"] A -->|"Fork"| G["Another Consumer"]

Three Automatic Conversions

1. Stream Concatenation

When a downstream node only accepts complete values, an adapter may collect and merge stream fragments:

// Engine internal equivalent logic
fullMessage := concatenate(streamReader) // collect all chunks
nextNode.Invoke(ctx, fullMessage)         // pass complete value

2. Stream Fork/Split

When a single stream needs multiple consumers, an adapter may fork or buffer it:

// One StreamReader duplicated into multiple copies
readers := stream.Split(originalReader, consumerCount)
// Each consumer independently receives the complete stream data

3. Stream Merge

When multiple upstream streams converge, define ordering, fairness, cancellation, and error semantics rather than assuming completion-order merging:

// Multiple upstream streams merged into one
mergedReader := stream.Merge(reader1, reader2, reader3)

Callback Aspect System

Why Callbacks

In production environments, you need to monitor every step of Agent execution: record inputs/outputs, measure latency, track exceptions. These are classic cross-cutting concerns — functionality that spans multiple modules but doesn't belong in any single one. If you scatter logging and tracing code throughout your business logic, you get tight coupling, code duplication, and maintenance nightmares.

Eino's Callback system provides an AOP (Aspect-Oriented Programming) style solution. Similar to middleware in web frameworks or interceptors in gRPC, callbacks let you define behavior that runs before and after every node execution without modifying the node's core logic. The key difference from traditional middleware is that Eino's callbacks are stream-aware — they understand both batch and streaming execution patterns.

Four-Phase Hooks

When each node executes, the Eino engine triggers four hooks in sequence:

Hook	Trigger Point	Parameters
`OnStart`	Before node begins	`ctx`, `RunInfo`, `CallbackInput`
`OnStartWithStream`	Node begins with stream input	`ctx`, `RunInfo`, `StreamReader`
`OnEnd`	After node completes	`ctx`, `RunInfo`, `CallbackOutput`
`OnEndWithStream`	Node completes with stream output	`ctx`, `RunInfo`, `StreamReader`

Building with HandlerBuilder

handler := callbacks.NewHandlerBuilder().
    OnStartFn(func(ctx context.Context, info *callbacks.RunInfo, input callbacks.CallbackInput) context.Context {
        log.Printf("[%s] execution started, component: %s, input: %v",
            info.Name, info.Type, input)
        return ctx
    }).
    OnEndFn(func(ctx context.Context, info *callbacks.RunInfo, output callbacks.CallbackOutput) context.Context {
        log.Printf("[%s] execution completed, component: %s, output: %v",
            info.Name, info.Type, output)
        return ctx
    }).
    OnEndWithStreamFn(func(ctx context.Context, info *callbacks.RunInfo,
        sr *schema.StreamReader[callbacks.CallbackOutput]) context.Context {
        log.Printf("[%s] stream output started, component: %s", info.Name, info.Type)
        return ctx
    }).
    Build()

Scope Control

Callbacks support three granularity levels of scope control:

// 1. Global: applies to all nodes in the Graph
result, err := compiledGraph.Invoke(ctx, input,
    compose.WithCallbacks(handler))

// 2. Per component type: applies only to specific component types
result, err := compiledGraph.Invoke(ctx, input,
    compose.WithCallbacks(handler).DesignateType(components.ComponentOfChatModel))

// 3. Per node name: applies only to the designated node
result, err := compiledGraph.Invoke(ctx, input,
    compose.WithCallbacks(handler).DesignateNode("llm_node"))

This layered design enables you to:

Inject OTel tracing globally (affects all nodes)
Record token usage only for ChatModel nodes
Add alerting logic only for specific critical nodes

Production Observability

OpenTelemetry Integration

When supported by the selected revision, an OpenTelemetry Callback Handler can map node lifecycle events to OTel Spans; exporters, sampling, redaction, retention, and access controls remain application responsibilities:

sequenceDiagram participant Client participant Graph as "CompiledGraph" participant Prompt as "PromptTemplate" participant LLM as "ChatModel" participant Tool as "ToolNode" participant OTel as "OTel Collector" Client->>Graph: Stream(ctx, input) Graph->>OTel: StartSpan("graph.execute") Graph->>Prompt: OnStart → Span("prompt_node") Prompt->>Graph: OnEnd Graph->>LLM: OnStart → Span("llm_node") LLM-->>Graph: OnEndWithStream (streaming output) Graph->>Tool: OnStart → Span("tool_node") Tool->>Graph: OnEnd Graph->>OTel: EndSpan (all complete) Graph-->>Client: StreamReader

Configuring the OTel Handler

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    einootel "github.com/cloudwego/eino/callbacks/otel"
)

func initTracer() func() {
    exporter, _ := otlptracehttp.New(ctx,
        otlptracehttp.WithEndpoint("otel-collector:4318"),
    )
    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceName("my-agent-service"),
        )),
    )
    otel.SetTracerProvider(tp)
    return func() { tp.Shutdown(ctx) }
}

// Create OTel Callback Handler
otelHandler := einootel.NewHandler()

// Inject into Graph execution
result, err := graph.Invoke(ctx, input,
    compose.WithCallbacks(otelHandler))

Each execution generates Spans containing:

Node name and component type
Input/output summaries
Execution duration
Error information (if any)

Practice: Full-Chain Logging and Latency Tracing

The following illustrative skeleton shows latency timing without logging prompts, tool arguments, tokens, or other sensitive payloads:

package main

import (
    "context"
    "fmt"
    "io"
    "log"
    "time"

    "github.com/cloudwego/eino/callbacks"
    "github.com/cloudwego/eino/compose"
    "github.com/cloudwego/eino/components/model"
    "github.com/cloudwego/eino/schema"
)

// Build a Callback Handler with latency tracing
func buildTracingHandler() callbacks.Handler {
    return callbacks.NewHandlerBuilder().
        OnStartFn(func(ctx context.Context, info *callbacks.RunInfo, input callbacks.CallbackInput) context.Context {
            start := time.Now()
            ctx = context.WithValue(ctx, "start_"+info.Name, start)
            log.Printf("[TRACE] ▶ Node=%s Type=%s started", info.Name, info.Type)
            return ctx
        }).
        OnEndFn(func(ctx context.Context, info *callbacks.RunInfo, output callbacks.CallbackOutput) context.Context {
            if start, ok := ctx.Value("start_" + info.Name).(time.Time); ok {
                duration := time.Since(start)
                log.Printf("[TRACE] ◀ Node=%s Type=%s duration=%v",
                    info.Name, info.Type, duration)
            }
            return ctx
        }).
        OnEndWithStreamFn(func(ctx context.Context, info *callbacks.RunInfo,
            sr *schema.StreamReader[callbacks.CallbackOutput]) context.Context {
            if start, ok := ctx.Value("start_" + info.Name).(time.Time); ok {
                ttft := time.Since(start)
                log.Printf("[TRACE] ⇥ Node=%s first_chunk_latency=%v", info.Name, ttft)
            }
            return ctx
        }).
        Build()
}

func main() {
    ctx := context.Background()

    // Build Graph (refer to the previous article for orchestration setup)
    g := compose.NewGraph[string, string]()

    // ... add nodes and edges ...

    compiled, err := g.Compile(ctx)
    if err != nil {
        log.Fatal(err)
    }
    defer sr.Close()

    // Inject tracing handler
    handler := buildTracingHandler()

    // Stream execution
    sr, err := compiled.Stream(ctx, "Hello, please analyze this code for performance issues",
        compose.WithCallbacks(handler))
    if err != nil {
        log.Fatal(err)
    }

    // Consume streaming output
    for {
        chunk, err := sr.Recv()
        if err == io.EOF {
            break
        }
        if err != nil {
            log.Fatal(err)
        }
        fmt.Print(chunk)
    }
    fmt.Println()
}

Example output:

code

[TRACE] ▶ Node=prompt_template Type=PromptTemplate started
[TRACE] ◀ Node=prompt_template Type=PromptTemplate duration=<duration>
[TRACE] ▶ Node=llm Type=ChatModel started
[TRACE] ⇥ Node=llm first_chunk_latency=<duration>
[TRACE] ▶ Node=output_parser Type=Lambda started
[TRACE] ◀ Node=output_parser Type=Lambda duration=<duration>

Best Practices

Streaming:

Always use Stream mode for user-facing interfaces to maximize responsiveness
For intermediate nodes, verify conversion and buffering behavior when intermediate stream data is not observed
Set appropriate bufferSize — too large wastes memory, too small causes frequent backpressure

Callback Design:

Register universal observability (OTel, logging) as global Callbacks
Scope business-specific monitoring (token counting, cost tracking) to specific component types
Avoid blocking operations in Callbacks — use async channels to send data to background goroutines
Leverage context.Context to pass state between OnStart/OnEnd (e.g., timers)

Production Deployment:

Redact prompts, tool arguments, tokens, tenant identifiers, and personal data before export
Bound queue size and callback work; define cancellation and exporter failure behavior
Set retention, sampling, access control, and deletion policies for traces

FAQ

Q: How do you handle network interruptions in streaming mode?

A: StreamReader.Recv() will return an error (not io.EOF). We recommend implementing fallback handling for non-EOF errors in your consumption loop — log the breakpoint position for retry logic or display a user-friendly notification.

Q: Do Callbacks impact main flow performance?

A: Callbacks execute synchronously in the same goroutine. If your callback logic is heavy (e.g., writing to a database), push data into an async queue inside the callback to avoid blocking the main flow.

Q: Can you register multiple Callback Handlers simultaneously?

A: Yes. Multiple handlers execute in registration order. OnStart runs in forward order, OnEnd runs in reverse order — similar to middleware onion model.

Q: Can stream Callbacks modify data in the stream?

A: OnEndWithStream receives a copy of the stream (implemented via Fork). Consuming this copy doesn't affect the original stream received by downstream nodes. This guarantees pure observation semantics for Callbacks.

Summary

Streaming and the Callback aspect system solve different problems: incremental delivery and observability. Both require explicit transport, cancellation, redaction, retention, and failure policies before production use.

StreamReader/StreamWriter primitives provide typed stream abstractions. Conversion behavior, buffering, and protocol differences still need explicit tests for the selected components.

Callbacks can reduce instrumentation coupling, but they do not guarantee full-chain observability. Validate hook coverage, sampling, stream behavior, exporter failures, and sensitive-data handling for the deployed revision.

With these two mechanisms mastered, you can:

Deliver real-time streaming response experiences to users
Track latency and exceptions at every execution step in production
Build complete distributed tracing pipelines with OpenTelemetry
Gain deep visibility into your Agent's decision-making process

In the next article, we'll move into Eino DevKit: Build Your First Agent, applying the streaming and callback mechanisms from this article to build a real Agent.

Previous in series: Eino Orchestration Engine: Chain, Graph, and Workflow in Practice
Next in series: Eino DevKit: Build Your First Agent
Glossary: AI Agent | Agentic Workflow
Official Docs: Eino GitHub Repository | OpenTelemetry Go SDK

Previous:Eino Orchestration Engine: Chain, Graph, and Workflow in Practice

Next:Eino ADK in Practice: Build Your First AI Agent in Go