I've just started exploring adding OpenTelemetry support to the Comet subproject of DataFusion. I'm excited to see the integration with Apache Arrow (Rust) and potentially DataFusion in the future.
julian-datable 2 hours ago [-]
Integrations with OTLP are critical to driving adoption and probably one of the biggest pain points we've encountered when adopting it ourselves (and encouraging others to the same).
Adopting OTLP without third-party support is pretty time consuming, especially is your tech stack is large and/or varied.
Re runtimes: curious about this too. Feels like the right direction if you’re optimizing a telemetry pipeline.
akdor1154 1 hours ago [-]
Damn that's some scope creep if I ever saw it: 'try sending Arrow frames end to end' => 'rewrite the otel pipeline in rust'. Seems like the goals of the contributors don't exactly align with the goals of the project.
Kind of a bummer - one thing i was hoping to come out of this was better Arrow ecosystem support for golang.
KAdot 2 hours ago [-]
> We are interested in making OTAP pipelines safely embeddable, through strict controls on memory and through support for thread-per-core runtimes.
I'm curious about the thread-per-core runtimes, are there even any mature thread-per-core runtimes in Rust around?
Wow, anyone able to provide a ELI5?
OTel sounds amazing but this is flying over my head
phillipcarter 38 minutes ago [-]
Warning: this is an oversimplification.
Performance optimization and being able to "plug in" to the data ecosystem that Apache Arrow exists in.
OpenTelemetry is pretty great for a lot of uses, but the protocol over the wire is too chunky for some applications where. From last year's post on the topic[0]:
> In a side-by-side comparison between OpenTelemetry Protocol (“OTLP”) and OpenTelemetry Protocol with Apache Arrow for similarly configured traces pipelines, we observe 30% improvement in compression. Although this study specifically focused on traces data, we have observed results for logs and metrics signals in production settings too, where OTel-Arrow users can expect 50% to 70% improvement relative to OTLP for similar pipeline configurations.
For your average set of apps and services running in a k8s cluster somewhere in the cloud, this is just a nice-to-have, but size on wire is a problem for a lot of systems out there today, and they are precluded from adopting OpenTelemetry until that's solved.
Not sure, but seems like it will be producing apache arrow data and carrying it across the data stack end to end from OTEL. This would be great for creating data without a bunch of duplication/redundant processing steps and exporting it in a form that's ready to query.
piterrro 2 hours ago [-]
Unless I dont understand that fully (which could be the case).
This idea could fly if downstream readers will be able to read it. Json is great because anything can read it, process, transform and serialize without having to know the intrisics of the protocol.
Whats the point of using binary, columnar format for data in transit?
You don't do high performance without knowing the data schema.
odie5533 2 hours ago [-]
Is Arrow better than Parquet or Protobuf?
theLiminator 1 hours ago [-]
Arrow is an in-memory columnar format, kinda orthogonal to parquet (which is an at-rest format). Protobuf is a better comparison, but it's more message oriented and not suited for analytics.
arccy 27 minutes ago [-]
the blog post comparison is against OTLP which is protobuf
Adopting OTLP without third-party support is pretty time consuming, especially is your tech stack is large and/or varied.
Re runtimes: curious about this too. Feels like the right direction if you’re optimizing a telemetry pipeline.
Kind of a bummer - one thing i was hoping to come out of this was better Arrow ecosystem support for golang.
I'm curious about the thread-per-core runtimes, are there even any mature thread-per-core runtimes in Rust around?
ByteDance also has their very fast monio. https://github.com/bytedance/monoio
Both integrate io-uring support for very fast io.
Performance optimization and being able to "plug in" to the data ecosystem that Apache Arrow exists in.
OpenTelemetry is pretty great for a lot of uses, but the protocol over the wire is too chunky for some applications where. From last year's post on the topic[0]:
> In a side-by-side comparison between OpenTelemetry Protocol (“OTLP”) and OpenTelemetry Protocol with Apache Arrow for similarly configured traces pipelines, we observe 30% improvement in compression. Although this study specifically focused on traces data, we have observed results for logs and metrics signals in production settings too, where OTel-Arrow users can expect 50% to 70% improvement relative to OTLP for similar pipeline configurations.
For your average set of apps and services running in a k8s cluster somewhere in the cloud, this is just a nice-to-have, but size on wire is a problem for a lot of systems out there today, and they are precluded from adopting OpenTelemetry until that's solved.
[0]: https://opentelemetry.io/blog/2024/otel-arrow-production/
This idea could fly if downstream readers will be able to read it. Json is great because anything can read it, process, transform and serialize without having to know the intrisics of the protocol.
Whats the point of using binary, columnar format for data in transit?
You don't do high performance without knowing the data schema.