CMCL to Unicode birectional converter

Help and documentation

Unicode Coptic superlinear stroke best practices

Single-letter superlinear stroke must be rendered with combining U+0304.
```
ⲁ̄ = U+2C81 U+0304 ?
```
Conjoining superlinear stroke (alias “Bindestrich”) must be rendered using combining U+FE24 after first character and U+FE25 after last character.
```
ⲓ︤ⲥ︥ = U+2C93 U+FE24 U+2CA5 U+FE25 ?
```
If the Bindestrich covers more than two characters, the in-between character(s) should be followed by U+FE26
```
ⲓ︤ⲏ︦ⲙ︥ = U+2C93 U+FE24 U+2C8F U+FE26 U+2C99 U+FE25 ?
```
A single-letter superlinear stroke that is just as wide as the letter above which it is placed and also to be able to join two such single-letter strokes into a Bindestrich over 2 or more letters, it might be rendered with U+0305. This practice is not recommended and should be explicitly declared.
```
ⲁ̅ = U+2C81 U+0305 ?
```
```
ⲓ̅ⲥ̅ = U+2C93 U+0305 U+2CA5 U+0305 ?
```
```
ⲓ̅ⲏ̅ⲙ̅ = U+2C93 U+0305 U+2C8F U+0305 U+2C99 U+0305 ?
```

The correct use of U+0305 must be that of marking letters as numerals.

ⲁ̅ = U+2C81U+0305 ?

ⲃ̅ = U+2C83U+0305 ?

ⲅ̅ = U+2C85U+0305 ?

ⲇ̅ = U+2C87U+0305 ?

Pay attention: U+0305 is very similar to U+FE26 (at least in Antinoou font), but these two strokesmust not be mixed up and their use must not be confused!

Caveats, known issues and “won't-fixes”

Special attention must be paid to diacritics, particularly to superlinear strokes (see above). The converter will properly work and promptly guess the correct form to use in most cases if conversion from cmcl to unicode is performed. But it will fail to correctly convert the way back, specially in most complex cases.

For example, CMCL a_ (Coptonew: a_) will be correctly converted to Antinoou: ⲁ̄ (Unicode U+2C81 U+304), but it will not work the way back. Antinoou: ⲁ̄ (Unicode U+2C81 U+304) will be converted to CMCL a+ (Coptonew: a+). This should not be considered a bug and no fix will be provided in the future.

The same is true for other combinations, eg:

CMCL (Coptonew)		Antinoou (Unicode)		CMCL (Coptonew)

Coptonew nomina sacra shortcuts

ASCII shortcut	Unicode output

CMCL's entities

Regex	Verbose explanation	Meaning	Replace policy	Examples
`&([0-9]{1,2})n;`	an integer of one or two digits followed by n	Lacuna of known length	plus-minus (±, U+00B1) character followed by the number of missing characters, enclosed by brackets	`&2n;` = `[±2]`
`&([0-9]{1,2})\?;`	an integer of one or two digits followed by ?	Lacuna of supposed length	space and dot repeted the supposed length, enclosed by parentheses	`&2?;` = `( . .)`
`&\?(cap\|capitale);`	? followed by string cap or capitale	Unknown capital character	space followed by dot (same output as entity `&1?;`)	`&?cap;` = `.`
`&[0-9]{1,2}b;`	an integer of one or two digits followed by question mark	Blank space of known length	Not to be rendered	`&2b;` =
`&([a-z]{1})\?;`	one alphabetic character followed by question mark	Uncertain alphabetic character	The alphabetic character followed by subliteral dot (U+0323)	`&a?;` = `ạ`
`&coppa;`	coppa string	Character coppa	Character coppa (U+03D9)	`&coppa;` = `ϙ`
`&(basilios\|Crs\|Cs\|eiote\| ekklHsia\|fq\|i:lHm\|iHl\|iHs\| ilHm\|is\|isrl\|iws\|js\|monaCos\| oute\|pna);`	One of the following strings (comma separated): basilios, Crs, Cs, eiote, ekklHsia, fq, i:lHm, iHl, iHs, ilHm, is, isrl, iws, js, monaCos, oute, pna		The same string (CMCL encoding system) converted to Unicode	`&ekklHsia;` = `ⲉⲕⲕⲗⲏⲥⲓⲁ`
`&ebol_compresso;`	string: ebol_compresso	CMCL's ebol equivalent in Unicode	ⲉⲃⲟⲗ	`&ebol_compresso;` = `ⲉⲃⲟⲗ`
`&etcompresso;`	string: etcompresso	CMCL's et equivalent in Unicode	ⲉⲧ	`&etcompresso;` = `ⲉⲧ`
`&Hspir;`	string: Hspir	Heta with combining dot above (U+2C8F U+0307)	ⲏ̇	`&Hspir;` = `ⲏ̇`
`&.b;`		Simple dot	.	`&.b;` = `.`

cmcl2unicode is an open source software available for download or fork on GitHub. Please report any issue you might encounter here.

How to cite

This software is archived in Zenodo. Please cite it by referring the DOI: 10.5281/zenodo.76262299

PAThs
Tracking Papyrus and Parchment Paths. http://paths.uniroma1.it

An Archaeological Atlas of Coptic Literature Literary Texts in their Geographical Context: Production, Copying, Usage, Dissemination and Preservation

CMCL to Unicode bidirectional converter

CMCL

Unicode