Apache Avro is a row-oriented object container storage format for Hadoop as well as a remote procedure call and data serialization framework. Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Avro is optimized for write operations and includes a wire format for communication between nodes.
The header is made up of:
- 4 bytes of ASCI “OBJ1”
- File metadata including the schema definition
- A sync marker: 16 bytes of randomly generated code
Avro also includes its own interface descriptor language (IDL) also named Avro, aside from JSON to define data types and protocols. IDL eases adoption by users who are used to more common traditional IDLs, which have a syntax more like C/C++.
Avro is a top-level project sponsored by the Apache Software Foundation (ASF).