Importing a VCF file with quoted-printable encoding

Question

The documentation has an example importing a VCF address book file which works fine:

Import[ "ExampleData/wolfram.vcf" ]

{{FormattedName->Wolfram Research, Inc.,Organization->Wolfram Research, Inc.,Email->info@wolfram.com,Phone->217-398-0700,Fax->217-398-0747,Address1->100 Trade Center Drive,City->Champaign,State->IL,ZIPCode->61820,Country->USA}}

But in my case:

Import["F:\\mathematica\\send_contact.vcf"]

{{NameLast->=E6=B5=8B=E8=AF=95,FormattedName->=E6=B5=8B=E8=AF=95,Phone->12345 678 9}}

How about the =E6=B5=8B=E8=AF=95? A bug, or am I using this wrong?

You can get my .vcf file from this link.

The ".vcf" file is my test file from a cellphone export. If you import that file into your cell phone you will get a number like this picture, and if we use Import, the following answer is obtained:

{{NameLast -> =试, FormattedName -> ="I don't know this item", Phone -> 12345 678 9}}

Since @bill s mentioned that it could be a missing font issue, I made another test vcf file with only characters from the English alphabet. The output is normal this time.

{{"NameLast" -> "test name", "FormattedName" -> "test name","Phone" -> "12345 678 9"}}

So is the problem caused by the VCF file not being compatible with Chinese characters? How can we interpret =E6=B5=8B=E8=AF=95 to obtain the original Chinese characters?

It may be a text encoding issue. VCF files are plain text: do you know how yours is encoded? I'm on a tablet so I can't check your file for myself... Unfortunately, however, I couldn't find reference to how one could specify an encoding when importing either. — MarcoB, Apr 01 '16 at 12:51
@george2079 But if you search for "VCF address book" (OP's description) then it's unambiguous. — C. E., Apr 01 '16 at 13:00

score 3 · Accepted Answer · edited Apr 13 '17 at 12:55

3

As the @george2079 's suggetion,I post my solution from a friend as an answer,but I'm sure there are more better method can do this.I accept myself answer just for reader.If anyone have post better solution,I'll change the acceptance.

$Version

"10.3.1 for Microsoft Windows (64-bit) (December 21, 2015)"

string = First@Import["file address"];
Rule @@@ Transpose@{Keys[string], 
   URLDecode[StringReplace[Values[string], "=" -> "%"], 
    CharacterEncoding -> "UTF-8"]}

{NameLast->测试,FormattedName->测试,Phone->12345 678 9}

edited Apr 13 '17 at 12:55

Community

1

answered Apr 01 '16 at 16:17

yode

26,686
4
62
167

Very nice! Thanks for posting this as an answer. (+1) – MarcoB Apr 01 '16 at 16:55

score 1 · Answer 2 · answered Apr 01 '16 at 13:42

1

Here is the plain text of the VCF file from your link:

BEGIN:VCARD VERSION:2.1 N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=E6=B5=8B=E8=AF=95;;;; FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=E6=B5=8B=E8=AF=95 TEL;HOME:12345 678 9 END:VCARD

Given this, Mathematica's answer is not surprising. Perhaps the odd characters are representatives of a font that is not installed on your computer?

answered Apr 01 '16 at 13:42

bill s

68,936
4
101
191

I don't think the problem root in the "odd characters".The character I have input is very common. – yode Apr 01 '16 at 13:50
What I intended to suggest is that a coding like "=E6=B5=8B=E8=AF=95" might be a representation from a font that is not being displayed properly. – bill s Apr 01 '16 at 13:53
Bill, @yode, This is Quoted-Printable encoding, as suggested by the ENCODING tag in the VCF. This is called "PrintableASCII" in Mathematica. It is a way of encoding 8-bit characters to transmit on a 7-bit transmission line (e.g. the Internet). Yode, you will need some post-processing of your chinese characters. See e.g. http://mathematica.stackexchange.com/q/25867/27951. – MarcoB Apr 01 '16 at 14:21
What you have are two 3-byte utf-8 characters. Looking here: http://www.ansell-uebersetzungen.com/gbuni.html you can look up the same characters as in your screen dump. – george2079 Apr 01 '16 at 14:21
1

URLDecode[StringReplace["=E6=B5=8B=E8=AF=95=E4=B8=80=E4=B8=8B","="->"%"],CharacterEncoding->"UTF-8"] work well,Thanks all of you.@george2079 @MarcoB @bill s – yode Apr 01 '16 at 15:57
1

you should make that an answer. (The CharacterEncoding option throws a warning for me by the way, but it works. possible version issue) – george2079 Apr 01 '16 at 16:01

george2079 · Answer 3 · 2016-04-01T17:18:04.067

out of curiosity I worked out the encoding, at least partly. It takes the last 4 bits of the first byte and the last 6 bits of the remaining two from each triplet, so we can directly decode like this:

cdecode[s_String] :=
 FromCharacterCode@FromDigits[
     Join @@ MapThread[IntegerDigits[FromDigits[#1, 16], 2][[#2 ;;]] &,
       {StringTake[#,Array[{3 # - 1, 3 #} &, 3]], {-4, -6, -6}}],
      2] & /@
  StringTake[s,Array[{9 # - 8, 9 #} &, Floor[StringLength@s/9] ]]//StringJoin
cdecode["=E6=B5=8B=E8=AF=95=E4=B8=80=E4=B8=8B"]

same string

No doubt URLDecode is the more robust way to go. Note there are 8 bits that have been ignored here. Presumably the 'E' signifies the start of a 3-byte code - that's ignored here an should be checked.

Importing a VCF file with quoted-printable encoding

3 Answers3