Question - What is a SequenceFile in Hadoop?
Answer -
Extensively used in MapReduce I/O formats, SequenceFile is a flat-file containing binary key-value pairs. The map outputs are stored as SequenceFile internally. It provides reader, writer, and sorter classes. The three SequenceFile formats are as follows:
- Uncompressed key-value records
- Record compressed key-value records—only values are compressed here
- Block compressed key-value records—both keys and values are collected in blocks separately and compressed. The size of the block is configurable