Skip to content

Commit 8861dd0

Browse files
committed
Prepping for release 0.2.6-alpha.2
1 parent 9b797be commit 8861dd0

File tree

8 files changed

+88
-35
lines changed

8 files changed

+88
-35
lines changed

CHANGES.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
## Changelog
2+
3+
### 0.2.5-alpha (2024-04-19)
4+
- Initial release
5+
- The tool is based on early, non-public drafts of the EA-PDF specification, specifically 0.2 and 0.3.
6+
- The tool is capable of creating an EA-PDF from a single email message in the EML format or a collection
7+
of email messages in the MBOX format.
8+
- This is [alpha-level](https://en.wikipedia.org/wiki/Software_release_life_cycle#Alpha) software, so expect bugs and missing features.
9+
- If you want to report bugs or make feature requests, please us the [GitHub Issue Tracker](https://github.com/UIUCLibrary/ea-pdf/issues).
10+
11+
### 0.2.6-alpha (2024-06-04)
12+
- This release is still based on the 0.2 and 0.3 drafts of the EA-PDF specification.
13+
- Miscellaneous quality improvements and bug fixes, such as:
14+
- Better deal with XEP-generated error messages, such as a missing license file.
15+
- Update to the latest NuGet packages.
16+
- Improvements to testing.
17+
- To better support legacy PDF Readers. synchonizing document-level metadata between the XMP and the Document Info Dictionary, such as Creator, Producer, Dates, etc.
18+
- XMP metadata now more closely matches the requirements of the 0.2 spec; added a GUID to identify messages; removed the DACS metadata extension schema.
19+
- Each Content Set as defined by the spec now starts on a new page.
20+
- Changed the Front Matter to include the name and version of the three primary tools used to create the EA-PDF.
21+
- Major changes to how the DPart and DPM metdata is created and attached to the PDF.
22+
- Removed the dependency on the NDepend.Path library in order to better support Linux-style file paths.

EaPdf/EaPdf.csproj

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@
1616
<Product>UIUCLibrary.$(AssemblyName)</Product>
1717
</PropertyGroup>
1818

19+
<Import Project="../VersionInfo.xml"/>
20+
1921
<ItemGroup>
2022
<PackageReference Include="AngleSharp.Css" Version="0.17.0" />
2123
<PackageReference Include="CsvHelper" Version="32.0.1" />

EaPdfCmd/EaPdfCmd.csproj

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<Project Sdk="Microsoft.NET.Sdk">
1+
<Project Sdk="Microsoft.NET.Sdk">
22

33
<PropertyGroup>
44
<OutputType>Exe</OutputType>
@@ -15,6 +15,20 @@
1515
<PackageId>UIUCLibrary.$(AssemblyName)</PackageId>
1616
</PropertyGroup>
1717

18+
<Import Project="../VersionInfo.xml" />
19+
20+
<ItemGroup>
21+
<None Include="..\CHANGES.md" Link="CHANGES.md">
22+
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
23+
</None>
24+
<None Include="..\LICENSE.md" Link="LICENSE.md">
25+
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
26+
</None>
27+
<None Include="..\README.md" Link="ABOUT_PROJECT.md">
28+
<CopyToOutputDirectory>Always</CopyToOutputDirectory>
29+
</None>
30+
</ItemGroup>
31+
1832
<ItemGroup>
1933
<PackageReference Include="CommandLineParser" Version="2.9.1" />
2034
<PackageReference Include="EmailValidation" Version="1.0.10" />

EaPdfCmd/README.MD

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -292,10 +292,11 @@ included in the distribution that can be used as a starting point.
292292

293293
#### EmailToEaxsProcessorSettings
294294

295-
The conversion of email messages to PDFs consists of two steps: 1) converting emails to XML, and 2) converting
296-
XML to PDFs; this settings section controls the emails are converted to XML files. These settings should not
297-
need to be changed unless your final product is just the XML files and not the PDFs. Certain of these settings
298-
are required for the conversion to PDFs, so they should not be changed unless you are sure of what you are doing.
295+
The conversion of email messages to PDFs consists of two steps: 1) converting emails to XML and extracting any
296+
attachments, and 2) converting the XML and attachments to PDFs; this settings section controls the emails are
297+
converted to XML files. These settings should not need to be changed unless your final product is just the XML
298+
files and not the PDFs. Certain of these settings are required for the conversion to PDFs, so they should not be
299+
changed unless you are sure of what you are doing.
299300

300301
##### HashAlgorithmName
301302

@@ -341,14 +342,16 @@ is false; FO processors require all attachments be base64 encoded, so this defau
341342

342343
##### PreserveTextAttachmentTransferEncoding
343344

344-
This only applies to textual attachments wrapped in XML, internally or externally. If true, textual content (7bit and 8bit) are always
345-
serialized as UTF-8 text inside the XML; if false, all textual content is serialized as base64 when saved inside the XML. The default is
346-
false; XSL FO processors require all attachments be base64 encoded, so this default should not be changed.
345+
This only applies to textual attachments (MIME Type 'test/*') wrapped in XML, internally or externally. If true, textual content
346+
(7bit and 8bit) are always serialized as UTF-8 text inside the XML; if false, all textual content is serialized as base64 when
347+
saved inside the XML. The default is false; XSL FO processors require all attachments be base64 encoded, so this default should not
348+
be changed.
347349

348350
##### IncludeSubFolders
349351

350-
For MBOX files, subfolders in the same directory as the MBOX file and which match the name of the MBOX file will also be processed,
351-
including all of its files and subfolders recursively. For a folder of EML files, all subfolders will also be processed recursively.
352+
For MBOX files, subfolders in the same directory as the MBOX file and which match the name of the MBOX file (this is how the Mozilla
353+
Thunderbird email client stores emails) will also be processed, including all of its files and subfolders recursively. For a folder
354+
of EML files, all subfolders will also be processed recursively.
352355

353356
##### ExternalContentFolder
354357

@@ -374,7 +377,7 @@ to PDF or other display formats. The default is true.
374377

375378
##### LogToXmlThreshold
376379

377-
LogLevels equal to or above this threshold will also be written to the output XML file as comments in the XML. The default is
380+
Messages with LogLevels equal to or above this threshold will also be written to the output XML file as comments in the XML. The default is
378381
Information. This should not be changed unless you have a specific requirement. The valid values are Trace, Debug, Information,
379382
Warning, Error, Critical, None. Note that this is not effected by the log level setting in the Logging section or the `-l, --log-level`
380383
command line option.
@@ -396,12 +399,14 @@ the email.
396399

397400
Extra non-standard HTML character entities to add to the list of entities that are converted to their Unicode equivalent. The key is the
398401
name of the entity, and the value is the Unicode code point. Standard HTML entities are already handled by the code and XSLT processor, so
399-
this is only needed if you encounter non-standard entities. The default is a single entity: `"QUOT": 134`.
402+
this is only needed if you encounter non-standard entities. The default is a single entity: `"QUOT": 134`. Note that HTML entities are
403+
case-sensitive, so for example, `quot` and `QUOT` are different entities.
400404

401405
##### ForceParse
402406

403407
Force the message parser to run even if the file does not appear to be a valid message file format. This might be useful for debugging. The
404-
default is false.
408+
default is false. Before parsing starts the system will snoop on the first few bytes of the file to determine if it is a valid message file
409+
type. If it is not, the file will be skipped. If this setting is true, the file will be parsed regardless of the file type.
405410

406411
#### EaxsToEaPdfProcessorSettings
407412

@@ -417,20 +422,21 @@ Note that the root XSLT file includes other XSLT files, namely:
417422

418423
- eaxs_xhtml2fo.xsl which converts XHTML to XSL-FO
419424
- eaxs_helpers.xsl which contains common functions and templates
425+
- eaxs_contentset_helpers.xsl which contains templates generate identifiers and links for content sets
420426
- eaxs_helpers_test.xsl which templates to test the helpers
421427

422428
These files should be in the same directory as the root XSLT file.
423429

424-
In general this file should not be changed. However, someone proficient in XSLT could modify this file to change the appearance of the
430+
In general these files should not be changed. However, someone proficient in XSLT could modify this file to change the appearance of the
425431
PDF output. But caution should be used as the XSLT is complex and the output has specific features used during post-processing to ensure
426432
the PDFs conform to the EA-PDF specification.
427433

428434
##### XsltXmpFilePath
429435

430-
This is the path to the XSLT file that converts the XML files to XMP metadata for individual folders and messages. If the path is relative,
431-
it is relative to the directory containing this configuration file.
436+
This is the path to the XSLT file that converts the XML files to DPart DPM metadata for individual folders, messages, and content sets.
437+
If the path is relative, it is relative to the directory containing this configuration file.
432438

433-
The XML created by this XSLT contain the core email metadata fields required by the EA-PDF specification. During post-processing, these
439+
The XML created by this XSLT represents the DPart hierarchy required by the EA-PDF specification. During post-processing, these
434440
metadata are inserted into the PDF file.
435441

436442
##### XsltRootXmpFilePath
@@ -439,12 +445,13 @@ This is the path to the XSLT file that converts the XML files to document-level
439445
directory containing this configuration file.
440446

441447
These metadata contain the top-level metadata by the EA-PDF specification, such as PDF/A and PDF/mail conformanace levels, PDF version,
442-
creator, dates, etc. This XSLT also copies the custom XMP schema, EaPdfXmpSchema.xmp, into the output; this file must be included in the
443-
same directory as this XSLT. During post-processing, these metadata are inserted into the PDF file.
448+
creator, dates, etc. It also includes lists of source files, messages, and attachments included in the PDF. This XSLT also copies the
449+
custom XMP schema, EaPdfXmpSchema.xmp, into the output; this file must be included in the same directory as this XSLT. During post-processing,
450+
these metadata are inserted into the PDF file.
444451

445452
##### LanguageFontMapping
446453

447-
This section contains the mapping of Unicode language scripts and typefaces (Serif, SansSerif, or Monospace) to font families. This is used
454+
This section contains the mapping of Unicode language scripts and primary typefaces (Serif, SansSerif, or Monospace) to font families. This is used
448455
to select the appropriate font family for a given script and desired typeface. The font family names must exist in the FO processor's font
449456
configuration. The outer element name is an ISO 15924 4-letter codes for the script. A special entry with key `Default` should be in the
450457
dictionary with all three typefaces specified. This will be used as the default if a script entry is not found in the list. Usually, this
@@ -453,7 +460,7 @@ for `Latn` (Latin) will be used as the default. If neither `Default` nor `Latn`
453460
be used as the default which may produce unintended results. Regardless of original order, the mappings are sorted alphabetically when
454461
loaded from the config file.
455462

456-
The supported typefaces are `Serif`, `SansSerif`, or `Monospace`. Some scripts may not support all typefaces. If a given typeface is
463+
The supported primary typefaces are `Serif`, `SansSerif`, or `Monospace`. Some scripts may not support all typefaces. If a given typeface is
457464
desired but not present in the mapping, the first typeface in the list will be used instead. The typeface value is a comma-separated list
458465
of font family names to be used for the script and typeface; these names must exist in the FO processor's font configuration.
459466

@@ -475,7 +482,7 @@ Below is an example of the default font mapping:
475482
}
476483
```
477484
For the `Default` entry, the `Serif`, `SansSerif`, and `Monospace` typefaces are mapped to the `serif`, `sans-serif`, and `monospace` font
478-
families, respectively.
485+
families, respectively. The `serif`, `sans-serif`, and `monospace` fonts must be defined in the FO processor's font configuration.
479486

480487
Similarly, `Arab` scripts that need a `Serif` typeface are mapped to the `Traditional Arabic, serif` font families. If an
481488
appropriate character can be found in the `Traditional Arabic` font, it will be used; if not, the `serif` font will be used. Similary
@@ -486,5 +493,5 @@ Note that in the above example, the `serif`, `san-serif`, `monospace`, `Traditio
486493
included in the distribution have examples of how to declare these font families. You should refer to the documentation for the FO
487494
processor you are using for more information on how to configure fonts.
488495

489-
Also included with the distributed files is a `Fonts` directory that contains various open source fonts that can be used with application.
496+
Also included with the distributed files is a `Fonts` directory that contains various open source fonts that can be used with the application.
490497

Email2Pdf.sln

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,12 @@ EndProject
1010
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{100AE2D8-47A2-44F6-B8AE-B01180FA323C}"
1111
ProjectSection(SolutionItems) = preProject
1212
BONEYARD.md = BONEYARD.md
13+
CHANGES.md = CHANGES.md
1314
DEV_NOTES.md = DEV_NOTES.md
1415
DPM.md = DPM.md
1516
LICENSE.md = LICENSE.md
1617
README.md = README.md
18+
VersionInfo.xml = VersionInfo.xml
1719
EndProjectSection
1820
EndProject
1921
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "EaPdfCmd", "EaPdfCmd\EaPdfCmd.csproj", "{ECF4C91E-C503-457F-99E2-BC16B91801A3}"

LICENSE.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ Copyright (c) 2024 University of Illinois Board of Trustees. All rights reserved
55
Developed by:
66

77
University of Illinois Library at Urbana-Champaign
8-
University of Illinois
98
http://www.library.illinois.edu/
109

1110
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation

README.md

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# Email2Pdf
22

3-
Code for Creating Email Archives Conforming to the EA-PDF (PDF/mail) Specification
3+
## Code for Creating Email Archives Conforming to the EA-PDF (PDF/mail) Specification
4+
5+
The source code can be found in GitHub, https://github.com/UIUCLibrary/ea-pdf/, and is licensed under the University of Illinois/NCSA Open Source License
6+
found in the LICENSE.md.
47

58
This solution contains projects used to transform email files (currently EML or MBOX) into archival PDF files
69
that conform to the EA-PDF specification as output. The [EA-PDF specification](https://pdfa.org/resource/ea-pdf/) (not yet published) describes standard
@@ -22,13 +25,3 @@ Refer to the README.md file in the EaPdfCmd project for more information on how
2225
Also note that the most of the sample emails used in the tests are not currently included in the code repository. For
2326
the time being you will need to substitute your own emails for testing purposes.
2427

25-
## Changelog
26-
27-
### 0.2.5-alpha (2024-04-19)
28-
- Initial release
29-
- The tool is based on early, non-public drafts of the EA-PDF specification, specifically 0.2 and 0.3.
30-
- The tool is capable of creating an EA-PDF from a single email message in the EML format or a collection
31-
of email messages in the MBOX format.
32-
- This is [alpha-level](https://en.wikipedia.org/wiki/Software_release_life_cycle#Alpha) software, so expect bugs and missing features.
33-
- If you want to report bugs or make feature requests, please us the [GitHub Issue Tracker](https://github.com/UIUCLibrary/ea-pdf/issues).
34-

VersionInfo.xml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
<Project>
2+
3+
<!-- Properties common to the class library and command line app -->
4+
<!-- import into the .csproj file with <Import Project="VersionInfo.xml" /> -->
5+
6+
<PropertyGroup>
7+
<VersionPrefix>0.2.6</VersionPrefix>
8+
<VersionSuffix>alpha.2</VersionSuffix>
9+
<IncludeSourceRevisionInInformationalVersion>true</IncludeSourceRevisionInInformationalVersion><!-- Include Git Commit ID -->
10+
<Company>University of Illinois Board of Trustees</Company>
11+
<Copyright>Copyright $([System.DateTime]::Now.ToString('yyyy')) University of Illinois Board of Trustees. All Rights Reserved.</Copyright>
12+
</PropertyGroup>
13+
14+
</Project>

0 commit comments

Comments
 (0)