Skip to content

Latest commit

 

History

History
105 lines (80 loc) · 5.49 KB

File metadata and controls

105 lines (80 loc) · 5.49 KB

Exercise 11

Part A

Implement a class HuffmanIso that extends Huffman. In such class, override the methods writeTrie and readTrie. When writing/reading a char for a given leaf node, use the charset ISO-8859-1 instead of the current UTF-16 in Huffman.

Note 0: you would lose Unicode support by doing it.

Note 1: for coding/decoding, you can use the method getBytes in String with charset StandardCharsets.ISO_8859_1.

Write a test class HuffmanIsoTest with the following two tests:

  • testCompareOnShortNorwegianSentence in which you compare both HuffmanIso and Huffman on the text string "Jeg ønsker å få en god karakter i denne eksamenen" encoded in UTF-8. Verify that HuffmanIso can compress it, whereas Huffman actually makes it bigger. Explain why.

  • testCompareOnBook in which you compare both HuffmanIso and Huffman on the text of the Odyssey book encoded in UTF-8. Verify that both HuffmanIso and Huffman do compress it, but their difference in compression ratio is minimal (i.e., less than 0.001). Explain why.

Part B

Consider a string representation for exam grades, in the format of <id><grade>, where id is a number starting from 0 (can assume NO MORE than 500 students in an exam, and there can be exams with just a couple of students), and grades in the range A-F. A valid string would be for example: 0A1F2F3C12F13B14B27A201B497A. You can assume that the ids are sorted, but there can be holes in the sequence (e.g., representing students that did not submit).

Write a class GradeCompressorImp that implements the given GradeCompressor interface.

You MUST come up with an efficient compression algorithm which is specialized and customized for this problem domain (i.e., do not use either Huffman nor LZW, but rather use DnaCompressor as inspiration). For example, if there is just 1 student, you should not use more than 2 bytes in total for the compression. If there are 100 students, should not use more than 150 bytes. In your implementation, you can rely on and use the classes BitWriter and BitReader.

Write a test class GradeCompressorTest that extends the given GradeCompressorTestTemplate test suite. If your implementation of GradeCompressorImp is correct, all tests should pass.

Part C

Write a compression algorithm for a set of simple chess moves. Each chess move consists of where: - the number of the move (i.e. 1 for the 1st, 2 for the 2nd, 3 for the 3rd, etc.) - which of the pieces is being moved (pawn, rook, knight, bishop, queen, King) , - designation of the respective square, in the form E2, H1, A8, etc. Squares are designated with a letter A-H and a number 1-8. - a move that results in a check will be marked with ! e.g. "1pe2e4" - Move 1. Pawn from e2 to e4. "2pe7e5" - Move 2. Pawn from e7 to e5.

1pc2c32pd7d63qd1a4!4pb7b5 1pf2f32pe7e53pg2g44qd8h4!

Write a class called ChessCompressor that extends the GenericCompressor to perform the compression.

You MUST come up with an efficient compression algorithm which is specialized and customized for this problem domain (i.e., do not use either Huffman nor LZW, but rather use DnaCompressor as inspiration).

Write a class called ChessCompressorTest to test your work.

Part D

Write a compression algorithm for a catalog of archival documents. The idea is that there is a large amount of documents stored in archives throughout Norway, and a central repository is needed to keep track of it. There are 7 possible archives, for this exercise: NationalArchive – NAT OsloArchive – OSL BergenArhive – BER KristiansandArchive – KRS TrondheimArchive – TRO TromsoArchive – TRM MaritimeArchive - MAR For each document you have: - the date at which the document was written. The date is in the format: year – month – day. Where the month is written as a 3-letter code: (JAN, FEB, MAR, etc.). - which archive stores the document (Can be one of 7 archives, for this exercise, see above). Location – where the document is stored. Consists of: - a 3 letter code that identifies the archive where the document is stored. (see the list of archives above). - a 2 digit code that identifies which building stores the document. (NOTE: for this exercise, no archive has more than 14 buildings, numbered 1 to 15). - the number of the room. We can safely assume that no building has more than 1023 rooms. - the number of the shelf. We can safely assume that no room has more than 2047 shelves. Each document is described by a separate line with the format: ; ; ; ; ; ; Example inputs: 1815-JUN-18; OsloArchive; OSL; 13; 42; 1500; 1805-DEC-02; NationalArchive; NAT; 07; 84; 1780; 1805-OCT-21; MaritimeArchive; MAR; 03; 126; 1111; 1814-MAY-17; NationalArchive; NAT; 01; 01; 0001; 1905-JUN-07; NationalArchive; NAT; 02; 42; 0042;

You MUST come up with an efficient compression algorithm which is specialized and customized for this problem domain (i.e., do not use either Huffman nor LZW, but rather use DnaCompressor as inspiration).

Write a class called ArchiveCompressor that extends the GenericCompressor to perform the compression. Write a class called ArchiveCompressorTest to test your work.

Solutions

Solutions to this exercise can be found in the solutions module, under the org.pg4200.sol11 package.