Avro is a row-based data serialization system developed within Apache's Hadoop project. It provides rich data structures and a compact, fast binary data format.
Technical Details
Avro uses JSON for defining data schemas, which are stored with the data. This enables schema evolution while maintaining compatibility. The data itself is stored in a compact binary format.
Advantages
- Compact binary serialization
- Schema definition included with the data
- Support for schema evolution
- Dynamic typing and code generation
Limitations
- Not human-readable without special tools
- Less widely supported than formats like JSON or CSV
- More complex to implement than simpler formats
- Less efficient for columnar queries than Parquet