I know the unicode dataset has information about "this character has numerical value X", and such, and it also has sometimes a sort order for the characters. But I don't think it has information on which characters are used in which alphabets/writing systems, and also which characters are numbers and such.
Has anyone collected this information and structured it in some way in CSV or JSON? I know you can find this information on Wikipedia in random unstructured tables, but I would like to find it already aggregated if possible.
Things like the Finnish alphabet, which uses the Latin unicode block, has different letters and order than the English alphabet, etc... And in the Hebrew alphabet, certain characters are given numerical meaning and whatnot. Does any of this data exist structured somewhere? In Tibetan, each character has a pronunciation associated with it, and also characters sometime have names in the native language. That sort of stuff.
I saw https://character-table.netlify.app/ but it looks like simple unicode mappings, so not sure if anything else exists. Omniglot has charts, which are spreadsheets with many alphabets, but they would have to be heavily reworked as the xls format they are in is not currently easily convertible to JSON.
