KhaleelKhan.github.io/index.xml at master · KhaleelKhan/KhaleelKhan.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Khaleel Khan&#39;s CV</title>
    <link>https://khaleelkhan.com/</link>
    <description>Recent content on Khaleel Khan&#39;s CV</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator>
    <language>en-us</language>
    <copyright>Khaleel Khan &amp;copy; {year}; </copyright>
    <lastBuildDate>Mon, 10 Jun 2019 00:00:00 +0000</lastBuildDate>

	    <atom:link href="https://khaleelkhan.com/index.xml" rel="self" type="application/rss+xml" />


    <item>
      <title>Coverletter Creator</title>
      <link>https://khaleelkhan.com/project/coverletter-creator/</link>
      <pubDate>Mon, 10 Jun 2019 00:00:00 +0000</pubDate>

      <guid>https://khaleelkhan.com/project/coverletter-creator/</guid>
      <description></description>
    </item>

    <item>
      <title>Game of Cows and Bulls</title>
      <link>https://khaleelkhan.com/project/cows-and-bulls/</link>
      <pubDate>Fri, 26 Oct 2018 00:00:00 +0000</pubDate>

      <guid>https://khaleelkhan.com/project/cows-and-bulls/</guid>
      <description>

&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Bulls_and_Cows&#34; target=&#34;_blank&#34;&gt;Cows and Bulls&lt;/a&gt; is a code breaker game where the player has to guess a 4 digit code. In this case I have 4 letter words, this makes playing for humans much more than random guessing and some strategy is required. I have switched cows and bulls compared to wiki definition, but concept remains same.&lt;/p&gt;

&lt;h1 id=&#34;gui&#34;&gt;GUI&lt;/h1&gt;

&lt;h2 id=&#34;solver&#34;&gt;Solver&lt;/h2&gt;

&lt;p&gt;In this tab the human is the game host, he holds the secret 4 letter word while the computer guesses. User has to provide how many cows and bulls the guess has.&lt;/p&gt;


&lt;figure&gt;

&lt;img src=&#34;images/cnb_solver.png&#34; /&gt;


&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  &lt;h4&gt;Cows and Bulls Solver&lt;/h4&gt;

&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;Ofcourse, a computer crunching numbers is rather uninteresting, so I have a Data page to track how the program is pruning number of availabe guesses. The guesser had two algorithms, Simple and Advanced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Simple - The next guess is naively selected without any preprocessing. This is fast in processing each step but might take longer.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Advanced - The next guess here is selected such that it reduces the remaining choices. Each current valid guess is evaluated and scored according to the number of choices it can eliminate. Best scoring guess if put forward to the user.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;figure&gt;

&lt;img src=&#34;images/cnb_graph.png&#34; /&gt;


&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  &lt;h4&gt;Cows and Bulls Graph&lt;/h4&gt;

&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;English language has around 2000 valid 4 letter words, each guess more than halves the avalanle choices. I notices the program take at max 7 tries regardless of the complexity.&lt;/p&gt;

&lt;h2 id=&#34;game&#34;&gt;Game&lt;/h2&gt;

&lt;p&gt;In this mode the system is the host and user has to guess the code. The programs provides number of cows and bulls for each guess. All previous guess and corresponding scores are printed for reference.&lt;/p&gt;


&lt;figure&gt;

&lt;img src=&#34;images/cnb_game.png&#34; /&gt;


&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  &lt;h4&gt;Cows and Bulls Game&lt;/h4&gt;

&lt;/figcaption&gt;

&lt;/figure&gt;


&lt;figure&gt;

&lt;img src=&#34;images/cnb_succes.png&#34; /&gt;


&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  &lt;h4&gt;Game Solved!&lt;/h4&gt;

&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;h1 id=&#34;code&#34;&gt;Code&lt;/h1&gt;

&lt;p&gt;Core of the processor is this calcuator. This takes a guess and a chosen code and generates number of cows and bulls.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;def score_calc(self, guess, chosen):
	bulls = cows = 0
	for g, c in zip(guess, chosen):
		if g == c:
			cows += 1
		elif g in chosen:
			bulls += 1
	return cows, bulls
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Thus any code that does not match either cows or bulls is eliminated from the dictionary of choices. Now the question is selecting the chosen code, this depends on the strategy we want to employ.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;self.choices = [c for c in self.choices if self.score_calc(c, self.ans) == score]
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;simple-strategy&#34;&gt;Simple Strategy&lt;/h2&gt;

&lt;p&gt;In the simple strategy the chosen code is simply the top most item after elimination. Then repeat till a score of 4 cows is generated and voila! that is the correct guess.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;if self.strategyComboBox.currentIndex() == 0 :
	return self.choices[0]
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;advanced-strategy&#34;&gt;Advanced Strategy&lt;/h2&gt;

&lt;p&gt;In advanced strategy instead of selecting the top item, each item is assumed to be the chosen code and evaluated by eliminating the non matchnig codes. This is similar to simple strategy, however this is repeated for all item and the item which produces the smallest eliminated list is chosen for next guess.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-python&#34;&gt;if self.strategyComboBox.currentIndex() == 1 :
	max_unique = 0
	max_g = &amp;quot;&amp;quot;
	for g in self.choices:
		results_list = []
		for c in self.choices:
			(cows,bulls) = self.score_calc(g,c)
			results_list.append(str(cows) + str(bulls))
			if len(set(results_list)) &amp;lt; len(results_list):
				break
		if len(set(results_list)) == len(results_list):
			return  g
		elif len(set(results_list)) &amp;gt; max_unique:
			max_unique  = len(set(results_list))
			max_g = g
	return max_g
&lt;/code&gt;&lt;/pre&gt;
</description>
    </item>

    <item>
      <title>Khaleel&#39;s Master Thesis</title>
      <link>https://khaleelkhan.com/project/khaleels-master-thesis/</link>
      <pubDate>Wed, 26 Sep 2018 00:00:00 +0000</pubDate>

      <guid>https://khaleelkhan.com/project/khaleels-master-thesis/</guid>
      <description>&lt;p&gt;This thesis details the design and implementation of a digitally controllable sine wave oscillator which
can generate frequencies in the range of 0.7 to 2.7 GHz with an output power of -20 dBm across 50
Ω load. The design and layout was done in the Cadence Virtuoso design environment using the 0.13
μ m BiCMOS SG13GS technology from IHP.&lt;/p&gt;

&lt;p&gt;The Oscillator consists of 2 stages: Digital ring oscillator and an Output buffer. The design is based on harmonic boost technique which uses lower frequency
digitally controllable ring oscillator to generate square waves and combine in the output buffer. This
technique avoids the use of large capacitors or inductors, thus reducing the required chip area, which is
important as this oscillator is intended to be used as part of analog built in self test.&lt;/p&gt;

&lt;p&gt;The final oscillator is designed to have high spectral purity that is, the second and third harmonics are
less than -30 dBc even in the worst case. Phase noise of this oscillator is -99.07 dBc/Hz measured at 1
MHz offset from the highest frequency. The range of frequencies are 631.8 MHz to 2.75 GHz with an
output power of -20.06 to 21.07 dBm. The system consumes a maximum of 40.71 mW and can be
turned off when not in use, it consumes maximum 2 nW of power in this state. Layout is created for
the final design and is found to occupy a total area of 119 x 136 μm 2 . The system is simulated at all
digital control settings and across PVT corners to verify its robustness. All simulations are based on
post layout extraction.&lt;/p&gt;
</description>
    </item>

    <item>
      <title>Microprocessor Design in Verilog</title>
      <link>https://khaleelkhan.com/project/microprocessor-design-in-verilog/</link>
      <pubDate>Thu, 27 Apr 2017 00:00:00 +0000</pubDate>

      <guid>https://khaleelkhan.com/project/microprocessor-design-in-verilog/</guid>
      <description>

&lt;h1 id=&#34;introduction&#34;&gt;Introduction&lt;/h1&gt;

&lt;h2 id=&#34;motivation&#34;&gt;Motivation&lt;/h2&gt;

&lt;p&gt;The purpose of this lab is to implement a THUMB processor with multiple pipeline stages that executes the given C-programs assembled for the ARM THUMB instruction set in Verilog.Appropriate attention is taken to reduce the clock cycles for lower instruction latency.The features of this processor are 5 stage pipeline,hazard detection and data
forwarding.The processor is designed such a way that the CPU runs at a
maximum frequency of 2.857 GHz.&lt;/p&gt;

&lt;h2 id=&#34;project-summary&#34;&gt;Project Summary&lt;/h2&gt;

&lt;p&gt;Major milestones defined as per the project have been completed.
Following table gives a short summary of our design and goals achieved.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Feature&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Value/Result&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Number of Stages&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Synthesizable&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Max Frequency&lt;/td&gt;
&lt;td&gt;2.857 GHz&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Power&lt;/td&gt;
&lt;td&gt;2.0583 mW&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Can process count32&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Can process memcpy46&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;h1 id=&#34;implementation&#34;&gt;Implementation&lt;/h1&gt;

&lt;h2 id=&#34;design&#34;&gt;Design&lt;/h2&gt;

&lt;p&gt;For the simulation of CPU program, the given data files which contains
the instructions were read into the instruction register of the CPU.The
processor designed is a 5 stage pipeline design. The stages are
Instruction Fetch, Instruction decode,Execute,Memory Access and Write
back.&lt;/p&gt;

&lt;p&gt;The instruction fetch stage is where a program counter will pull the
next instruction from the current location in to the program memory. In
addition the program counter was updated with either the next
instruction location sequentially or the instruction location as
determined by a branch.&lt;/p&gt;

&lt;p&gt;The instruction decode stage the control unit determines the values for
the control lines that must be set to process the instruction. The
decoded register addresses are sent to the register file and the data in
the resister is passed to the ALU inputs.&lt;/p&gt;

&lt;p&gt;The opcode that is been fetched is sent to the ALU for execution.If
required,branch addresses are also calculated and the forwarding unit
determines whether the output of the ALU or the memory unit should be
forwarded to the ALU inputs.For this purpose a 2:1 Multiplexer is used.&lt;/p&gt;

&lt;p&gt;In the Write back stage, calculated values are written back to the
specific registers or memory.&lt;/p&gt;

&lt;p&gt;The CPU also contains a hazard detection block to determine when a stall
cycle must be added.This is enabled when the output of a previous load
instruction is required for the current execution.The hazard detection
block will also prohibit the program counter from updating it’s next
calculated value.&lt;/p&gt;

&lt;p&gt;Forwarding unit monitors the output of ALU and system memory and
determines whether this value has to be given as a ALU input.If the
recently calculated value is needed in the current execution before it
is written to the register file it will be sent to the appropriate ALU
input.&lt;/p&gt;


&lt;figure&gt;

&lt;img src=&#34;Images/Figure5.png&#34; /&gt;


&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  &lt;h4&gt;Source: &lt;span&gt;[@CDACpipe]&lt;/span&gt;&lt;/h4&gt;

&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;h2 id=&#34;functional-description&#34;&gt;Functional Description&lt;/h2&gt;

&lt;h5 id=&#34;pipeline&#34;&gt;Pipeline&lt;/h5&gt;

&lt;p&gt;The Pipelined model is an architecture which allows throughput of the
processor to be increased dramatically by reusing the idle stages during
processing of instructions.All the stages of a pipeline are executed in
parallel with Registers inserted between the stages. This enables
several operations to take place simultaneously, and the processing and
memory systems to operate continuously[@TDMI_DS].The stages in the
pipeline are instruction fetch, decode, execute and writeback.&lt;/p&gt;

&lt;h3 id=&#34;instruction-fetch-stage&#34;&gt;Instruction Fetch Stage&lt;/h3&gt;

&lt;p&gt;The instruction fetch stage is responsible for reading the instruction
memory and sending the next instruction to the next stage in pipeline,or
a stall if a branch has been detected in order to avoid incorrect
execution. It consist of three components : instruction memory, program
counter.&lt;/p&gt;

&lt;h4 id=&#34;program-counter&#34;&gt;Program Counter&lt;/h4&gt;

&lt;p&gt;The program counter is incremented by 2 after every instruction is
executed. In cases where a jump is required the PC is modified directly
by the output of ALU via a multiplexer.&lt;/p&gt;

&lt;h3 id=&#34;instruction-decode-stage&#34;&gt;Instruction Decode Stage&lt;/h3&gt;

&lt;p&gt;In decode stage the fetched instruction is decoded and it is responsible
for assigning the different sections of instructions into their proper
representation based on different instruction types.The decode stage
consist of control unit, the hazard detection unit , the sign extender
and the register file, and is responsible for connecting all these
components together. It splits the instruction into various parts and
feeds them into the corresponding components.Registers Rn and Rm are fed
to the register file, the immediate data is fed to the sign extender,
and the ALU opcodes and the function codes are sent to the control unit.
The output of these corresponding components are clocked and then stored
for next stage.The codes used in the decode section are listed in the
table.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Opcode&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Instruction or instruction class&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;00xxxx&lt;/td&gt;
&lt;td&gt;Shift (immediate), add, subtract, move, and compare&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;010000&lt;/td&gt;
&lt;td&gt;Data-processing&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;010001&lt;/td&gt;
&lt;td&gt;Special data instructions and branch and exchange&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;01001x&lt;/td&gt;
&lt;td&gt;Load from Literal Pool, see LDR (literal)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;0101xx&lt;/td&gt;
&lt;td&gt;Load/store single data item&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;10100x&lt;/td&gt;
&lt;td&gt;Generate PC-relative address&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;10101x&lt;/td&gt;
&lt;td&gt;Generate SP-relative address, see ADD (SP plus immediate)&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;1011xx&lt;/td&gt;
&lt;td&gt;Miscellaneous 16-bit instructions&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;1101xx&lt;/td&gt;
&lt;td&gt;Conditional branch, and Supervisor Call&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;11100x&lt;/td&gt;
&lt;td&gt;Unconditional Branch&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;: Source: &lt;span&gt;[@arm_manual]&lt;/span&gt;&lt;/p&gt;

&lt;h3 id=&#34;control-unit&#34;&gt;Control Unit&lt;/h3&gt;

&lt;p&gt;Control is the hardware that tells the datapath what to do, in terms of
switching, operation selection, data movement between ALU components
[@Patterson].It takes the given opcode from the instruction and
translate into individual instruction control lines needed for the
remaining stages.All control signals can be set based on the opcode
bits.&lt;/p&gt;

&lt;h3 id=&#34;hazard-detection-and-forwarding-unit&#34;&gt;Hazard Detection and Forwarding Unit&lt;/h3&gt;

&lt;p&gt;Hazard occurs when we read a value that was just written from memory,as
the value wont be available for execution until the end of the memory
stage. It introduces stall cycle by replacing control lines with zero
and disabling the program counter from updating. When a branch is
detected the hazard detection unit will allow the program counter to
write, but will feed it the branch address instead of the next counted
value.&lt;/p&gt;

&lt;p&gt;The forwarding unit is responsible for choosing what input is to be fed
into the ALU.It takes the input from the decode stage ,the value that
the ALU has fed to the write back stage as well as the register numbers
corresponding to all of these and determines if there is any conflict
exist.It will decide which of this values must be send to ALU.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;Images/examle_of_hazard.jpg&#34; alt=&#34;Source:
&amp;lt;span&amp;gt;[@Patterson]&amp;lt;/span&amp;gt;&#34; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;Images/example_of_stall_insertion.jpg&#34; alt=&#34;Source:
&amp;lt;span&amp;gt;[@Patterson]&amp;lt;/span&amp;gt;&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;sign-extender-and-shifter&#34;&gt;Sign Extender and Shifter&lt;/h3&gt;

&lt;p&gt;The sign extender takes the immediate value and sign extends it if the
current instruction is signed operation. It also has a shifted output
for branch address calculation.&lt;/p&gt;

&lt;h3 id=&#34;register-file&#34;&gt;Register File&lt;/h3&gt;

&lt;p&gt;The data storage in the CPU is the register bank contained within the
Instruction decode stage. This bank of registers is directly referenced
from the ARM Thumb instructions and is designed to allow to access the
data and avoid the use of much slower data memory.The registers are
defined as being written in the negative edge of the clock and read in
the positive edge.This is done to avoid hazards when one instruction is
attempting to write to the register bank while the other is reading.&lt;/p&gt;

&lt;h3 id=&#34;execute-stage&#34;&gt;Execute Stage&lt;/h3&gt;

&lt;p&gt;The execute stage is responsible for performing the specified
operations.The execute stage consist of ALU , branch determiner and the
forwarding unit. It connects these components together so that the ALU
processes the data properly given inputs chosen by the forwarding unit
and will notify the decode stage if a branch is indeed to be taken.&lt;/p&gt;

&lt;h5 id=&#34;alu&#34;&gt;ALU&lt;/h5&gt;

&lt;p&gt;The ALU is responsible for performing the actual calculation specified
by the instruction. It takes two 16 bit inputs and opcode from the
decode block and gives a single 16 bit output along with the Program
status registers.&lt;/p&gt;

&lt;h5 id=&#34;branch-determiner&#34;&gt;Branch Determiner&lt;/h5&gt;

&lt;p&gt;The Branch determiner is responsible for looking at the output of ALU ,
and the type of instruction it is decoding and determining whether the
system is to branch or not.&lt;/p&gt;

&lt;p&gt;For example in case of BLE (Branch if less than or equal) branch must be
taken when if flags Z set, or N set and V clear, or N clear and V
set.[@TDMI_DS]&lt;/p&gt;

&lt;p&gt;Implementation of forwarding block is shown in Figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;Images/forwarding.jpg&#34; alt=&#34;Source:
&amp;lt;span&amp;gt;[@Patterson]&amp;lt;/span&amp;gt;&#34; /&gt;&lt;/p&gt;

&lt;h3 id=&#34;writeback-stage&#34;&gt;Writeback Stage&lt;/h3&gt;

&lt;p&gt;The write back stage is responsible for writing the calculated value
back to the proper register.It has input control lines that tells
whether the instruction writes back the output of ALU to memory or not.&lt;/p&gt;

&lt;h3 id=&#34;push-and-pop-instruction&#34;&gt;Push and Pop Instruction&lt;/h3&gt;

&lt;p&gt;Pushing to stack and Popping data back from stack to registers is
implemented as a FSM shown in the following Figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;Images/fsm.pdf&#34; alt=&#34;FSM for Push and Pop instructions&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;rtl-verification&#34;&gt;RTL verification&lt;/h2&gt;

&lt;p&gt;Test benches are used to simulate design without the need of any
physical hardware. The biggest benefit of this is that it inspects every
signal that is in the design. The overall CPU block is responsible for
tying all of the stages together as well as providing the access to the
outside world that the test bench uses to load instruction memory and
monitor the register bank for test verification.&lt;br /&gt;
The test bench for the CPU involved two different sections in order to
allow the testing of the CPU block as shown in Figure [fig:testbench].
The first section that was part of the test bench was the code that was
responsible for loading the instruction memory within the CPU.This
memory is what would run the instructions through the pipeline once the
CPU was allowed to start. The instructions that were loaded included
register based and immediate adds, subtracts (both signed and unsigned),
reading and writing data memory, and a loop that would force the CPU to
jump back to the start of instruction memory and execute those same
instructions again[@MIPS_paper]. The different adds were important
because each exercised different parts of the CPU including the data
forwarding unit, multiple registers and different functions within the
ALU itself.The jump instruction is important and also in that it
exercised the branch detection unit, hazard detection unit as well as
the ability of the instruction fetch stage to be able to jump to an
address and continue execution with only the input of a single stall
cycle.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;Images/testbench-crop.pdf&#34; alt=&#34;RTL Verification Block diagram&#34; /&gt;&lt;/p&gt;

&lt;h2 id=&#34;synthesis&#34;&gt;Synthesis&lt;/h2&gt;

&lt;p&gt;A synthesis tool takes an RTL hardware description and standard cell
library as input and produces a gate level netlist as output[@iitkgp].
The resulting gate level list is completely a structural description
with standard cells of the design.It is not necessary that the Verilog
is functionally correct ,it must be written in such a way that it
directs the synthesis tool to generate good hardware. Verilog are tied
to particular clock cycles. The synthesized netlist exhibits the same
clock-by-clock cycle behavior, allowing the RTL testbench to be easily
re-used for gate-level simulation. Design Vision was used for
synthesizing our designed processor.&lt;/p&gt;

&lt;h1 id=&#34;unpipelined-processor&#34;&gt;Unpipelined Processor&lt;/h1&gt;

&lt;p&gt;The motive for multi cycle implementation for THUMB processor is to
improve the performance of Single cycle Thumb processor which executes
all instructions in 1 cycle. The main problem in Single cycle processor
is, as all instructions are executed concurrently in one cycle, the
components cannot be used more than once in a cycle. To make this design
more efficient, sharing of the component can be made possible by making
it have multiple input and outputs selected by a multiplexer.The control
signals for such multiplexers are decided using the finite state
machine. The individual components are connected as per the figure.&lt;/p&gt;

&lt;p&gt;&lt;img src=&#34;Images/unpipelined_blk_diagram.png&#34; alt=&#34;Unpipelined block diagram&amp;lt;span
data-label=&amp;quot;fig:unpipe&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&#34; /&gt;&lt;/p&gt;

&lt;h1 id=&#34;evaluation&#34;&gt;Evaluation&lt;/h1&gt;

&lt;p&gt;The comparison of the obtained specifications are listed in Table
below. Features like power, Maximum operating frequency and
processing time is better in un pipelined architecture compared to
pipelined architecture. However the processing time is the basis for
performance and throughput .An interesting aspect about the processing
time achieved in pipelined and un pipelined stages is the that, though 5
stage pipeline is used the execution time or the performance has not
increased 5 folds as expected. This is the result of adding stalls in
Hazard block. Hence the measure of performance is not just dependent on
the number of pipelined stages but also the type of instructions
executed and the frequency at which stalls are inserted in the data
path.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Feature&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Pipelined&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Un-Pipelined&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Area&lt;/td&gt;
&lt;td&gt;3783.4 units&lt;/td&gt;
&lt;td&gt;8649 units&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Power&lt;/td&gt;
&lt;td&gt;2.0583 mW&lt;/td&gt;
&lt;td&gt;2.1109 mW&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Critical time&lt;/td&gt;
&lt;td&gt;.35 nS&lt;/td&gt;
&lt;td&gt;.25 nS&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Maximum Frequency&lt;/td&gt;
&lt;td&gt;2.857 GHZ&lt;/td&gt;
&lt;td&gt;4 GHz&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Execution time&lt;/td&gt;
&lt;td&gt;2184956 pS&lt;/td&gt;
&lt;td&gt;337300 pS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Comparison of Pipelined and Unpipelined&lt;/p&gt;

&lt;p&gt;Following Timing report is obtained for a Clock period of 0.35 nS.
$$\therefore  Max_{frequency} = \frac{1}{0.35 nS} = 2.86 GHz$$ .&lt;/p&gt;

&lt;h1 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;As a result of this work,Thumb processor with pipelined stages is
implemented using Verilog Hardware description language and a throughput
better than Unpipelined architecture is achieved.The design is
synthesize-able and achieves the required goals of the project.&lt;/p&gt;

&lt;h2 id=&#34;future-work&#34;&gt;Future work&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The processor performance can optimized by inserting stalls
intelligently .&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Dynamic Branch prediction algorithm can be implemented inside the
processor&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Gate level simulation needs further debugging&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&#34;referrences&#34;&gt;Referrences&lt;/h1&gt;

&lt;p&gt;This appendix documents the software tools used for this project.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Tool&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Version&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ModelSim&lt;/td&gt;
&lt;td&gt;SE-64 10.3d&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Synopsis Design Vision&lt;/td&gt;
&lt;td&gt;2013.12-SP5-3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The Thumb instruction set,&lt;br /&gt;
&lt;em&gt;&lt;a href=&#34;http://apt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/05_Thumb.pdf&#34; target=&#34;_blank&#34;&gt;http://apt.cs.manchester.ac.uk/ftp/pub/apt/peve/PEVE05/Slides/05_Thumb.pdf&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Pipelined Processor Design,04/25/07, Luke Harvey and Stephanie
Spielbauer&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Addison Wesley - &lt;em&gt;ARM System-on-Chip Architecture&lt;/em&gt;, 2Ed.pdf. .&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Organization of Computer Systems: Pipelining. [Online]. Available:
&lt;em&gt;&lt;a href=&#34;https://www.cise.ufl.edu/ mssz/CompOrg/CDA-pipe.html&#34; target=&#34;_blank&#34;&gt;https://www.cise.ufl.edu/ mssz/CompOrg/CDA-pipe.html&lt;/a&gt;&lt;/em&gt;. [Accessed:
02-Aug-2017].&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Organization of Computer Systems: Processor &amp;amp; Datapath.
[Online]. Available:
&lt;em&gt;&lt;a href=&#34;https://www.cise.ufl.edu/ mssz/CompOrg/CDA-proc.html&#34; target=&#34;_blank&#34;&gt;https://www.cise.ufl.edu/ mssz/CompOrg/CDA-proc.html&lt;/a&gt;&lt;/em&gt;. [Accessed:
01-Aug-2017].&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition Errata
markup Copyright 1996-1998, 2000, 2004-2011 ARM Limited.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;ARM7TDMI Data Sheet Copyright ARM Limited.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Patterson, D.A. and J.L. Hennesey. Computer Organization and Design: The
Hardware/Software Interface, Second Edition, San Francisco, CA: Morgan
Kaufman (1998).&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Vhdl Implementation of A Mips-32 Pipeline Processor, Kirat Pal Singh,
Shivani Parmar&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;IIT Lab tutorial
&lt;a href=&#34;http://www.facweb.iitkgp.ernet.in/ isg/TESTING/SLIDES/Tutorial2b.pdf&#34; target=&#34;_blank&#34;&gt;http://www.facweb.iitkgp.ernet.in/ isg/TESTING/SLIDES/Tutorial2b.pdf&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
    </item>

  </channel>
</rss>