Update 2022-04-30-generating-large-json-files.md

taylorgibb · taylorgibb · commit dca80f9f74cf · 2022-04-30T18:33:45.000+02:00
diff --git a/_posts/2022-04-30-generating-large-json-files.md b/_posts/2022-04-30-generating-large-json-files.md
@@ -5,9 +5,9 @@ updated: 2022-04-30 20:01
 comments: true
 ---
 
-I recently had a customer with a requirement to expose an API that was capable of handling millions of JSON objects at a time. When the system was designed their inbound API only supported CSV. This design decision was made due to the large data involved. With CSV files, you include a header row which means the column names are not duplicated for each row, unlike JSON where each object would have a property name and its corresponding value. They gave me a small CSV file which contained a header row and an additional 145 objects. The file weighed in at just over 17KB. Converting that to JSON gave me 6527 lines of JSON (a 4400 percent increase in lines), weighing in at just over 205KB (a 1200 percent increase in file size). Even on small scale, the difference is noticeable. This data needed to be read and put into a database which is the subject of my next blog post, so stay tuned.
+I recently had a customer with a requirement to expose an API that was capable of handling millions of JSON objects at a time. When the system was designed their inbound API only supported CSV. This design decision was made due to the large data involved. With CSV files, you include a header row which means the column names are not duplicated for each row, unlike JSON where each object would have a property name and its corresponding value. They gave me a small CSV file which contained a header row and an additional 145 objects. The file weighed in at just over `17KB`. Converting that to JSON gave me 6527 lines of JSON (a `4400 percent increase ` in lines), weighing in at just over `205KB`(a `1200 percent increase` in file size). Even on small scale, the difference is noticeable. This data needed to be read and put into a database which is the subject of my next blog post, so stay tuned.
 
-While i was testing a few solutions it became clear that i needed a way to test with large JSON datasets but the biggest ones i could find online ranged from 25MB to 100MB and i wanted at least a few gigabytes of data. With large data, you quickly run into problems, in C# for example the maximum size of a CLR object is 2GB including on a 64-bit systems and even then, fragmentation of the large object heap can cause objects that are less than 2GB to cause an Out Of Memory Exception. In short, this means that you cant just make a list, add objects to it and then serialize it to disk. Instead, you need you stream the data one object at a time. The object i envisioned was the following:
+While i was testing a few solutions it became clear that i needed a way to test with large JSON datasets but the biggest ones i could find online ranged from `25MB` to `100MB` and i wanted at least a few gigabytes of data. With large data, you quickly run into problems, in C# for example the maximum size of a CLR object is 2GB including on a 64-bit systems and even then, fragmentation of the large object heap can cause objects that are less than 2GB to cause an Out Of Memory Exception. In short, this means that you cant just make a list, add objects to it and then serialize it to disk. Instead, you need you stream the data one object at a time. The object i envisioned was the following:
 
 ```json
 {

Original file line number	Diff line number	Diff line change
`@@ -5,9 +5,9 @@ updated: 2022-04-30 20:01`
`5`	`5`	`comments: true`
`6`	`6`	`---`
`7`	`7`
`8`		-I recently had a customer with a requirement to expose an API that was capable of handling millions of JSON objects at a time. When the system was designed their inbound API only supported CSV. This design decision was made due to the large data involved. With CSV files, you include a header row which means the column names are not duplicated for each row, unlike JSON where each object would have a property name and its corresponding value. They gave me a small CSV file which contained a header row and an additional 145 objects. The file weighed in at just over 17KB. Converting that to JSON gave me 6527 lines of JSON (a 4400 percent increase in lines), weighing in at just over 205KB (a 1200 percent increase in file size). Even on small scale, the difference is noticeable. This data needed to be read and put into a database which is the subject of my next blog post, so stay tuned.
	`8`	+I recently had a customer with a requirement to expose an API that was capable of handling millions of JSON objects at a time. When the system was designed their inbound API only supported CSV. This design decision was made due to the large data involved. With CSV files, you include a header row which means the column names are not duplicated for each row, unlike JSON where each object would have a property name and its corresponding value. They gave me a small CSV file which contained a header row and an additional 145 objects. The file weighed in at just over `17KB`. Converting that to JSON gave me 6527 lines of JSON (a `4400 percent increase ` in lines), weighing in at just over `205KB`(a `1200 percent increase` in file size). Even on small scale, the difference is noticeable. This data needed to be read and put into a database which is the subject of my next blog post, so stay tuned.
`9`	`9`
`10`		-While i was testing a few solutions it became clear that i needed a way to test with large JSON datasets but the biggest ones i could find online ranged from 25MB to 100MB and i wanted at least a few gigabytes of data. With large data, you quickly run into problems, in C# for example the maximum size of a CLR object is 2GB including on a 64-bit systems and even then, fragmentation of the large object heap can cause objects that are less than 2GB to cause an Out Of Memory Exception. In short, this means that you cant just make a list, add objects to it and then serialize it to disk. Instead, you need you stream the data one object at a time. The object i envisioned was the following:
	`10`	+While i was testing a few solutions it became clear that i needed a way to test with large JSON datasets but the biggest ones i could find online ranged from `25MB` to `100MB` and i wanted at least a few gigabytes of data. With large data, you quickly run into problems, in C# for example the maximum size of a CLR object is 2GB including on a 64-bit systems and even then, fragmentation of the large object heap can cause objects that are less than 2GB to cause an Out Of Memory Exception. In short, this means that you cant just make a list, add objects to it and then serialize it to disk. Instead, you need you stream the data one object at a time. The object i envisioned was the following:
`11`	`11`
`12`	`12`	```json
`13`	`13`	`{`