golang: Parse a large json file in streaming manner

hi all,

i recently tried to parse a large json file in golang and used

'ioutil.ReadAll(file)' followed by 'json.Unmarshal(data, &msgs)'

This lead to a huge spike in memory usage as the entire file is read into memory in 1 go.

I wanted to parse the file in a streaming fashion, i.e. read the file bit by bit and decode the json data just like the jackson streaming api.

I did some digging and found

func NewDecoder(r io.Reader) *Decoder {...} in json package

The above uses a reader to buffer & pick up data.

here is some sample code to do it:

https://github.com/jaihind213/golang/blob/master/main.go

The differences are in terms of memory footprint:

We print memory usage as we parse a ~900MB file generated by generate_data.sh.

Streaming Fashion:

Screenshot 2019-05-07 at 11.20.15

….

Screenshot 2019-05-07 at 11.31.36

Simple Parsing by reading Everything in Mem:

Screenshot 2019-05-07 at 11.32.28

Screenshot 2019-05-07 at 11.32.54

Conclusion:

The streaming parser has a very low Heap_inUse when compared to the massive usage of the readAllInMem approach.

Advertisements