Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
fb9ac3d
Update _config.yml
93tilinfinity Jan 23, 2021
63557b1
Update _config.yml
93tilinfinity Jan 23, 2021
8a64740
1st commit from local
93tilinfinity Jan 23, 2021
9609b70
commit post 1
93tilinfinity Jan 23, 2021
e96dbbe
changing homepage layout
93tilinfinity Jan 23, 2021
074fbe4
date post layout
93tilinfinity Jan 23, 2021
d8a50ad
ANOTHER COMMIT
93tilinfinity Jan 23, 2021
b75bd23
post name change
93tilinfinity Jan 23, 2021
ce00562
update formats
93tilinfinity Jan 24, 2021
e49f990
index.html format
93tilinfinity Jan 24, 2021
a04077f
homepage change
93tilinfinity Jan 24, 2021
a962731
Delete 2014-3-3-Hello-World.md
93tilinfinity Jan 24, 2021
3018189
Update workspace.xml
93tilinfinity Jan 24, 2021
5f00877
second post
93tilinfinity Jan 24, 2021
2b40204
blog 2 update
93tilinfinity Jan 24, 2021
90779dd
blog 2 update 2
93tilinfinity Jan 24, 2021
63258a1
update bio
93tilinfinity Jan 25, 2021
ecacf1b
change home links
93tilinfinity Jan 25, 2021
6a33b67
update links
93tilinfinity Jan 25, 2021
93a5f5f
update links 2
93tilinfinity Jan 25, 2021
5428808
images upload
93tilinfinity Jan 25, 2021
73f2490
update image
93tilinfinity Jan 25, 2021
0c0bab7
add images
93tilinfinity Jan 25, 2021
3101ba8
update older blogs
93tilinfinity Jan 25, 2021
7b816f8
logo changes
93tilinfinity Jan 25, 2021
4f10773
logo change
93tilinfinity Jan 25, 2021
5ed489b
update logo again
93tilinfinity Jan 25, 2021
f1df4bb
fixing logo
93tilinfinity Jan 25, 2021
ca69b14
changing fonts
93tilinfinity Jan 25, 2021
b05279d
css changes
93tilinfinity Jan 25, 2021
5f90d1f
add youtube svg
93tilinfinity Jan 25, 2021
507d4de
homepage change
93tilinfinity Jan 25, 2021
465c348
design changes
93tilinfinity Jan 25, 2021
5716f6b
design changes
93tilinfinity Jan 25, 2021
df066d2
mobile design update
93tilinfinity Jan 25, 2021
93e31cb
bio update
93tilinfinity Jan 26, 2021
cfe74ad
bio update
93tilinfinity Jan 26, 2021
32d73ee
bio changes again
93tilinfinity Jan 26, 2021
9ae134e
bio update
93tilinfinity Jan 26, 2021
f52da02
trying to get emojis to work
93tilinfinity Jan 26, 2021
a9f2332
commit jemoji
93tilinfinity Jan 26, 2021
35e8e9b
delete gemfile locally
93tilinfinity Jan 26, 2021
efde94a
bio update
93tilinfinity Jan 26, 2021
d273aeb
bio update
93tilinfinity Jan 26, 2021
a269c0b
another bio update
93tilinfinity Jan 26, 2021
3015e73
bio update again
93tilinfinity Jan 26, 2021
d22bf29
bio update
93tilinfinity Jan 26, 2021
b551fbb
remove links
93tilinfinity Jan 26, 2021
6c757f6
Update CNAME
93tilinfinity Jan 26, 2021
ac8f97b
Update CNAME
93tilinfinity Jan 26, 2021
b24e7e2
Delete CNAME
93tilinfinity Jan 26, 2021
783bdbb
Create CNAME
93tilinfinity Jan 26, 2021
a95d148
update workspace
93tilinfinity Jan 26, 2021
668b84a
bio update
93tilinfinity Jan 26, 2021
8f857da
b
93tilinfinity Jan 26, 2021
a6726cc
upd
93tilinfinity Jan 26, 2021
4347f0f
bio update
93tilinfinity Jan 26, 2021
d33ead6
bio
93tilinfinity Jan 26, 2021
fd62135
new post
93tilinfinity Jan 27, 2021
48319c7
post title updates
93tilinfinity Jan 27, 2021
e91f89e
new blog, who dis
93tilinfinity Jan 28, 2021
d4ef030
add tag cloud
93tilinfinity Jan 28, 2021
1ed2826
change css id to class
93tilinfinity Jan 28, 2021
592625f
last settings
93tilinfinity Jan 28, 2021
65fb799
new post
93tilinfinity Feb 3, 2021
78a55cb
change tags
93tilinfinity Feb 5, 2021
a17de53
tag and bio update
93tilinfinity Feb 5, 2021
ce35d59
blog updates
93tilinfinity Feb 5, 2021
40da852
tb update
93tilinfinity Feb 5, 2021
915b3f7
width edit
93tilinfinity Feb 5, 2021
45f2193
tb update
93tilinfinity Feb 5, 2021
bca73da
update tb
93tilinfinity Feb 5, 2021
2d12e5c
tb update
93tilinfinity Feb 5, 2021
27adb7e
new blog
93tilinfinity Feb 27, 2021
0915aba
name update
93tilinfinity Feb 27, 2021
1c98f96
name update
93tilinfinity Feb 27, 2021
6186f3c
name update
93tilinfinity Feb 27, 2021
71541cc
update name
93tilinfinity Feb 27, 2021
9c5115c
edit article
93tilinfinity Feb 27, 2021
d16da1b
name change
93tilinfinity Feb 27, 2021
7cd5cbc
change md ftype
93tilinfinity Feb 27, 2021
952c519
add wp link
93tilinfinity Feb 27, 2021
587f0ab
article update
93tilinfinity Mar 1, 2021
ac05074
new article
93tilinfinity Apr 30, 2021
8db49e0
title change
93tilinfinity Apr 30, 2021
6aab8f3
update
93tilinfinity Apr 30, 2021
b4ef496
Update 2021-4-30-info-data-queries-views.md
93tilinfinity Apr 30, 2021
8e96ae7
update
93tilinfinity Apr 30, 2021
88f943a
yes
93tilinfinity Apr 30, 2021
dc52740
new blog
93tilinfinity May 3, 2021
962376a
bold
93tilinfinity May 3, 2021
42996db
udpate
93tilinfinity May 3, 2021
87685fb
law of demeter article
93tilinfinity Aug 16, 2021
de842c4
bio update
93tilinfinity Aug 16, 2021
5d4ae45
bio update
93tilinfinity Aug 16, 2021
652fec9
blog update
93tilinfinity Aug 17, 2021
2932779
blog update
93tilinfinity Aug 17, 2021
6b8ac97
bio update
93tilinfinity Jan 12, 2022
93f4e3d
format bio
93tilinfinity Jan 12, 2022
5c1c411
bio update
93tilinfinity Jan 12, 2022
cc1c094
bio update
93tilinfinity Jan 13, 2022
cf38a5e
bio update
93tilinfinity Jan 13, 2022
04a4e6b
bio update
93tilinfinity Jan 13, 2022
e610e28
bio update
93tilinfinity Jan 13, 2022
eb4f892
link fix
93tilinfinity Jan 13, 2022
40c0e28
stop building shitty systems
93tilinfinity Mar 24, 2022
4c9ba34
blog update
93tilinfinity Mar 24, 2022
b558429
blog update
93tilinfinity Mar 24, 2022
553dbe8
blog update
93tilinfinity Mar 24, 2022
e038fb1
blog update
93tilinfinity Mar 24, 2022
755b37d
mercenary change
93tilinfinity Mar 25, 2022
0b85b05
edit
93tilinfinity Mar 25, 2022
e8a005e
Update CNAME
93tilinfinity Feb 24, 2025
9aa2276
Update CNAME
93tilinfinity Feb 24, 2025
a888484
Delete CNAME
93tilinfinity Feb 24, 2025
af35849
Create CNAME
93tilinfinity Feb 24, 2025
4ac8fae
update bio 2025
93tilinfinity Feb 24, 2025
18f7fd9
remove hidden folders
93tilinfinity Feb 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CNAME
Original file line number Diff line number Diff line change
@@ -1 +1 @@

www.neilchandarana.com
22 changes: 13 additions & 9 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
#

# Name of your site (displayed in the header)
name: Your Name
name: Neil Chandarana

# Short bio or description (displayed in the header)
description: Web Developer from Somewhere
description: A home for poorly researched ideas on data systems

# URL of your avatar or profile pic (you could use your GitHub profile pic)
avatar: https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/jekyll-logo.png
avatar: #https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/jekyll-logo.png

#
# Flags below are optional
Expand All @@ -18,17 +18,17 @@ avatar: https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/je
# Includes an icon in the footer for each username you enter
footer-links:
dribbble:
email:
email: [email protected]
facebook:
flickr:
github: barryclark/jekyll-now
instagram:
github: 93tilinfinity
instagram: neilchanda
linkedin:
pinterest:
rss: # just type anything here for a working RSS icon
twitter: jekyllrb
twitter:
stackoverflow: # your stackoverflow profile, e.g. "users/50476/bart-kiers"
youtube: # channel/<your_long_string> or user/<user-name>
youtube: channel/UCf05rA2fSzzL1H_RrZ7LN7w # channel/<your_long_string> or user/<user-name>
googleplus: # anything in your profile username that comes after plus.google.com/


Expand All @@ -41,7 +41,7 @@ google_analytics:

# Your website URL (e.g. http://barryclark.github.io or http://www.barryclark.co)
# Used for Sitemap.xml and your RSS feed
url:
url: # http://neilchandarana.com

# If you're hosting your site at a Project repository on GitHub pages
# (http://yourusername.github.io/repository-name)
Expand Down Expand Up @@ -76,6 +76,10 @@ sass:
gems:
- jekyll-sitemap # Create a sitemap using the official Jekyll sitemap gem
- jekyll-feed # Create an Atom feed using the official Jekyll feed gem
- jemoji

plugins:
- jemoji

# Exclude these files from your production _site
exclude:
Expand Down
23 changes: 23 additions & 0 deletions _includes/collecttags.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{% assign rawtags = "" %}
{% for post in site.posts %}
{% assign ttags = post.tags | join:'|' | append:'|' %}
{% assign rawtags = rawtags | append:ttags %}
{% endfor %}
{% assign rawtags = rawtags | split:'|' | sort %}

{% assign site.tags = "" %}
{% for tag in rawtags %}
{% if tag != "" %}
{% if tags == "" %}
{% assign tags = tag | split:'|' | sort %}
{% endif %}
{% unless tags contains tag %}
{% assign tags = tags | join:'|' | append:'|' | append:tag | split:'|' %}
{% endunless %}
{% endif %}
{% endfor %}





12 changes: 12 additions & 0 deletions _includes/tagcloud.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{% capture temptags %}
{% for tag in site.tags %}
{{ tag[1].size | plus: 1000 }}#{{ tag[0] }}#{{ tag[1].size }}
{% endfor %}
{% endcapture %}
{% assign sortedtemptags = temptags | split:' ' | sort | reverse %}
Tags:
{% for temptag in sortedtemptags %}
{% assign tagitems = temptag | split: '#' %}
{% capture tagname %}{{ tagitems[1] }}{% endcapture %}
[<a href="/tag/{{ tagname }}"><code style="color:#a94064"><nobr>#{{ tagname }}</nobr></code></a>]
{% endfor %}
22 changes: 14 additions & 8 deletions _layouts/default.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,39 +2,45 @@
<html>
<head>
<title>{% if page.title %}{{ page.title }} – {% endif %}{{ site.name }} – {{ site.description }}</title>

{% include meta.html %}

<!--[if lt IE 9]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

<link rel="stylesheet" type="text/css" href="{{ site.baseurl }}/style.css" />
<link rel="alternate" type="application/rss+xml" title="{{ site.name }} - {{ site.description }}" href="{{ site.baseurl }}/feed.xml" />

<!-- Created with Jekyll Now - http://github.com/barryclark/jekyll-now -->
{% include collecttags.html %}
</head>

<body>
<div class="wrapper-masthead">
<div class="container">
<header class="masthead clearfix">
<a href="{{ site.baseurl }}/" class="site-avatar"><img src="{{ site.avatar }}" /></a>

<!-- <a href="{{ site.baseurl }}/" class="site-avatar"><img src="/images/logo-removebg-preview.png" /></a>-->
<div class="site-info">
<!-- <h1>-->
<!-- <a href="{{ site.baseurl }}/">-->
<!-- <img src="/images/logo-removebg-preview.png" alt="Neil Chandarana logo" />-->
<!-- </a>-->
<!-- </h1>-->
<h1 class="site-name"><a href="{{ site.baseurl }}/">{{ site.name }}</a></h1>
<p class="site-description">{{ site.description }}</p>
</div>

<nav>
<a href="{{ site.baseurl }}/">Blog</a>
<a href="{{ site.baseurl }}/about">About</a>
<!-- <a href="{{ site.baseurl }}/links/">Links</a>-->
<a href="{{ site.baseurl }}/bio/">Bio</a>
</nav>
</header>
</div>
</div>

<div id="main" role="main" class="container">
<!-- {% if page.url == "/" %}-->
<!-- <div class="tagcloud">-->
<!-- {% include tagcloud.html %}-->
<!-- </div>-->
<!-- {% endif %}-->
{{ content }}
</div>

Expand Down
19 changes: 15 additions & 4 deletions _layouts/post.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,23 @@
<article class="post">
<h1>{{ page.title }}</h1>

<div class="entry">
{{ content }}
<div class="post_container">

<div class="meta1">
{{ page.date | date: "%B %e, %Y" }}
</div>
<div class="meta2">
{% for tag in page.tags %}
{% capture tag_name %}{{ tag }}{% endcapture %}
<!-- <a href="/tag/{{ tag_name }}"><code style="color:#a94064"><nobr>#{{ tag_name }}</nobr></code>&nbsp;</a>-->
<code style="color:#a94064"><nobr>#{{ tag_name }}</nobr></code>&nbsp;</a>
{% endfor %}
</div>

</div>

<div class="date">
Written on {{ page.date | date: "%B %e, %Y" }}
<div class="entry">
{{ content }}
</div>

{% include disqus.html %}
Expand Down
10 changes: 0 additions & 10 deletions _posts/2014-3-3-Hello-World.md

This file was deleted.

49 changes: 49 additions & 0 deletions _posts/2021-1-23-a-paradigm-for-life.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
layout: post
title: 'Input-> System-> Output: A paradigm for life'
tags: thinking
---
I came across an article that claimed a piece of analysis failed because of ‘bad’ data.

It reminded me of a phrase I used to hear on the trading floor, “US non-farm payrolls beat estimates”.

Is it possible for data to be inherently good or bad? I am not convinced. The state of data should only be that they
exist, and are held as the truth because they exist. In other words, US non-farm payrolls didn't beat estimates, your
estimate was too low.

There must be another reason to explain the system’s unexpected performance. Consider the following paradigm.

![input,system,output](/images/blog_01_2021/PXL_20210122_185257907~2.jpg)

When applied to a machine learning or data science project might give something like the following 4 steps.

![ML system flow](/images/blog_01_2021/PXL_20210125_144448228~3.jpg)

Putting my paranoid/Trader cap on, 4 steps mean 4 modes of failure.

The rest of this blog shall pay close attention to mode 2. Otherwise labelled as DevOps, data architecture, data engineering.
I never know which one to use so I’ll stick with ‘move data from there to here’.

The aim of this blog is to challenge myself and hopefully the reader to think about big data systems at a layer of
abstraction that remains invariant in time.

Learn to fish and you’ll never be hungry, or something like that.

Stick around and you can expect to explore:
* Database principles
* Types of failure
* Lambda architecture
* Where today’s tools fit in
* Lots of hand-drawn diagrams...no one does them anymore

<!---

layout: post
title: You're up and running!

Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).

![_config.yml]({{ site.baseurl }}/images/config.png)

The easiest way to make your first post is to edit this one. Go into /_posts/ and update the Hello World markdown file. For more instructions head over to the [Jekyll Now repository](https://github.com/barryclark/jekyll-now) on GitHub.
--->
38 changes: 38 additions & 0 deletions _posts/2021-1-24-scaling-relational-databases-queues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
layout: post
title: 'Queueing a relational database'
tags: relationaldatabase
---

Neil’s notes:
* Hitting the database directly is bad
* Adding a queue helps
* Batching updates helps

Suppose you decide to build a simple web analytics application.

The application you have in mind tracks the number of pageviews for any URL a customer wishes to track. Additionally,
the application should be able to report the top 100 URLs by number of pageviews.

You start with the following data schema:

![Data schema](/images/blog_01_2021/PXL_20210125_145337225~2.jpg)

You build the back end to consist of a Relational Database Management System (say MySQL), a table with the above schema
and a web server. Each time a web page tracked by your application is loaded, the web page pings your web server with
the pageview, and your web server increments the corresponding row in the database.

![A web application](/images/blog_01_2021/PXL_20210125_145356099~2.jpg)

The web application is a success, and traffic is growing like wildfire. You start getting lots of emails from your monitoring systems all with the same error:

“Timeout error on inserting to the database”

Your logs tell you that write requests are timing out because the database cannot keep up with the load. Your system is losing valuable information and you need to act quickly.

You realise that it is inefficient to perform a single increment per request. You draw up and implement a queuing system between the web server and the database so whenever a pageview is received, that event is added to the queue. You also create a worker that reads 100 events at a time off the queue and batches them into a single update request to your database.

![A web application with queue](/images/blog_01_2021/PXL_20210125_145409766~2.jpg)

It works well and the web server no longer pings the database directly and you no longer receive timeout notifications!

33 changes: 33 additions & 0 deletions _posts/2021-1-25-scaling-relational-databases-sharding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
layout: post
title: 'Sharding a relational database'
tags: relationaldatabase
---

Neil’s notes:
* Queues can’t escape overload, the database is the bottleneck
* Spread the write load horizontally across multiple machines (sharding)
* Sharding requires significant day 1 work
* Re-sharding cost is very real

![Sharded Application](/images/blog_01_2021/PXL_20210125_150212123~2.jpg)

Your application goes viral. Your worker can’t keep up with the writes so you add more workers and parallelise
the updates. This works for a short time but you realise the bottleneck is in the database.

A Google search on scaling write-heavy brings up Sharding or horizontal partitioning. To spread the write load
by partitioning the table across multiple servers using a shard key.

How to shard:
* Choose a hash function
* Shard_id = hash(key) mod n_shards

You write a script to map all the rows of your table to you hash function and split the table into 4 shards.
It takes a while to run and you don’t want new pageview increments to unbalance your uniform partition.
You decide turn off the worker till partition completion caching the interim pageview increments.

Your application now needs to know how to find the shard for each key so you wrap a library around your
database handling code that reads the number of shards from a new configuration file you created.

You also write additional logic to query the top 100 URLs from each shard and merge them to return a global
top 100 URLs.
29 changes: 29 additions & 0 deletions _posts/2021-1-27-scaling-relational-databases-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
layout: post
title: "Mo' data, mo' problems"
tags: relationaldatabase
---

Neil’s notes:
* Bugs inevitably make it to production code
* Fault-tolerance might be more important than I initially thought
* Complexity comes in more than one form; setting up a system + operating it + ?

_“I used to build cool features for customers. Now I’m spending all my time dealing with problems reading and writing data”._

As the application becomes more and more popular, you keep having to re-shard the database into more and more shards to keep up with the write load. Each re-shard, gets more and more painful because there’s so many more shards + active workers + config changes to co-ordinate.

You forget to update the config file and that causes pageview increments to be written to the wrong shards. You have to write a one-off script to manually comb through the data and fix the incorrect increments.

Eventually, there are so many shards across so many servers that occasionally the disk on one of your servers goes bad. While that machine is down, the data on its shard are unavailable to your application. You do a couple things to address this:

* You set a pending queue to divert increments for an unavailable shard. That pending queue flushes every 5 minutes.
* You backup each shard without write ability. Users can now still use the application when the master shard goes down.

Job done? Not quite.

While working on your queue/worker code, you notice you accidently deployed a bug that increments pageview count by 3 instead of by 1. There is no way of knowing which data got corrupted so your backups don’t help.

After all your efforts to make the system tolerant to machine faults and scalable, there is still no resilience to human faults.

Unfortunately, as the system complexity increases the fault probability increases. On top of that, additional work is pushed to you both in operating the database and developing the application code.
39 changes: 39 additions & 0 deletions _posts/2021-1-28-data-vs-information.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
layout: post
title: 'Is there a difference between Data and Information?'
tags: thinking
---

**TL;DR:** Yes.

In the last few posts, I jumped straight into the pitfalls of old school databases. It’s not profound or original material and more practically serves as a history lesson in data systems. I never enjoyed how history was taught at school, looking back I believe it was because I never had any concept of a particular event’s interconnectedness with other events around it in both time and geography.

Sharing this blog with a few friends, the feedback was more or less _“blog goes over my head but I like the bio”_. That’s no fun.

This blog is about data systems. To give a flavour of interconnectedness, data systems are the backbone of almost every application used today. They are like the sewage system; undoubtably important but for most, rarely attracts thought.

**What is a data system?**

A data system answers questions based on information acquired in the past up to the present.

A bank account answers: What is my current balance? What transactions did I make in January?

A social media network answers: How many followers do I have? What is this person’s name?

Data systems don’t just memorise + regurgitate, they combine different bits and pieces of information together to produce answers. For example, your current balance is combines information on all transactions on your account.

**What bits and pieces do data systems store?**

A crucial observation is that some information can be derived from other pieces of information.

Your follower list is derived from the sequence of every follow and unfollow action each and every user has ever made on your profile.

You can keep tracing information back till eventually you get information that is not derived from anything. This what I’ll call data.

**Data vs information: Is there a difference?**

Unfortunately, the words data and information are often used interchangeably in media or within industry. Don’t accept poor word choice. Data are small block of truth that alone may have no meaning but within the right context and combined in the right way become information that does have meaning. Viewing information as a big object made up of irreducible blocks of data gives clarity on what data are and are not.

Wrapping up, data and information are not the same. Anything you could ever imagine doing with data can be expressed as a function that takes in data and returns answers. So, the crux of everything you can learn about data systems is the following general-purpose definition:

Query = function(all data)
Loading