2

The documentation has an example importing a VCF address book file which works fine:

Import[ "ExampleData/wolfram.vcf" ]

{{FormattedName->Wolfram Research, Inc.,Organization->Wolfram Research, Inc.,Email->info@wolfram.com,Phone->217-398-0700,Fax->217-398-0747,Address1->100 Trade Center Drive,City->Champaign,State->IL,ZIPCode->61820,Country->USA}}

But in my case:

Import["F:\\mathematica\\send_contact.vcf"]

{{NameLast->=E6=B5=8B=E8=AF=95,FormattedName->=E6=B5=8B=E8=AF=95,Phone->12345 678 9}}

How about the =E6=B5=8B=E8=AF=95? A bug, or am I using this wrong?

You can get my .vcf file from this link.


The ".vcf" file is my test file from a cellphone export. If you import that file into your cell phone you will get a number like this picture, and if we use Import, the following answer is obtained:

{{NameLast -> =试, FormattedName -> ="I don't know this item", Phone -> 12345 678 9}}

Since @bill s mentioned that it could be a missing font issue, I made another test vcf file with only characters from the English alphabet. The output is normal this time.

{{"NameLast" -> "test name", "FormattedName" -> "test name","Phone" -> "12345 678 9"}}

So is the problem caused by the VCF file not being compatible with Chinese characters? How can we interpret =E6=B5=8B=E8=AF=95 to obtain the original Chinese characters?

MarcoB
  • 67,153
  • 18
  • 91
  • 189
yode
  • 26,686
  • 4
  • 62
  • 167
  • It may be a text encoding issue. VCF files are plain text: do you know how yours is encoded? I'm on a tablet so I can't check your file for myself... Unfortunately, however, I couldn't find reference to how one could specify an encoding when importing either. – MarcoB Apr 01 '16 at 12:51
  • @george2079 But if you search for "VCF address book" (OP's description) then it's unambiguous. – C. E. Apr 01 '16 at 13:00
  • @george2079 It is exported by my cellphone. – yode Apr 01 '16 at 13:42
  • @MarcoB Sorry,actually I don't know it. – yode Apr 01 '16 at 13:48
  • @MarcoB Thanks for your edit. – yode Apr 01 '16 at 17:04

3 Answers3

3

As the @george2079 's suggetion,I post my solution from a friend as an answer,but I'm sure there are more better method can do this.I accept myself answer just for reader.If anyone have post better solution,I'll change the acceptance.

$Version

"10.3.1 for Microsoft Windows (64-bit) (December 21, 2015)"

string = First@Import["file address"];
Rule @@@ Transpose@{Keys[string], 
   URLDecode[StringReplace[Values[string], "=" -> "%"], 
    CharacterEncoding -> "UTF-8"]}

{NameLast->测试,FormattedName->测试,Phone->12345 678 9}

yode
  • 26,686
  • 4
  • 62
  • 167
1

Here is the plain text of the VCF file from your link:

BEGIN:VCARD VERSION:2.1 N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=E6=B5=8B=E8=AF=95;;;; FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=E6=B5=8B=E8=AF=95 TEL;HOME:12345 678 9 END:VCARD

Given this, Mathematica's answer is not surprising. Perhaps the odd characters are representatives of a font that is not installed on your computer?

bill s
  • 68,936
  • 4
  • 101
  • 191
  • I don't think the problem root in the "odd characters".The character I have input is very common. – yode Apr 01 '16 at 13:50
  • What I intended to suggest is that a coding like "=E6=B5=8B=E8=AF=95" might be a representation from a font that is not being displayed properly. – bill s Apr 01 '16 at 13:53
  • Bill, @yode, This is Quoted-Printable encoding, as suggested by the ENCODING tag in the VCF. This is called "PrintableASCII" in Mathematica. It is a way of encoding 8-bit characters to transmit on a 7-bit transmission line (e.g. the Internet). Yode, you will need some post-processing of your chinese characters. See e.g. http://mathematica.stackexchange.com/q/25867/27951. – MarcoB Apr 01 '16 at 14:21
  • What you have are two 3-byte utf-8 characters. Looking here: http://www.ansell-uebersetzungen.com/gbuni.html you can look up the same characters as in your screen dump. – george2079 Apr 01 '16 at 14:21
  • 1
    URLDecode[StringReplace["=E6=B5=8B=E8=AF=95=E4=B8=80=E4=B8=8B","="->"%"],CharacterEncoding->"UTF-8"] work well,Thanks all of you.@george2079 @MarcoB @bill s – yode Apr 01 '16 at 15:57
  • 1
    you should make that an answer. (The CharacterEncoding option throws a warning for me by the way, but it works. possible version issue) – george2079 Apr 01 '16 at 16:01
0

out of curiosity I worked out the encoding, at least partly. It takes the last 4 bits of the first byte and the last 6 bits of the remaining two from each triplet, so we can directly decode like this:

cdecode[s_String] :=
 FromCharacterCode@FromDigits[
     Join @@ MapThread[IntegerDigits[FromDigits[#1, 16], 2][[#2 ;;]] &,
       {StringTake[#,Array[{3 # - 1, 3 #} &, 3]], {-4, -6, -6}}],
      2] & /@
  StringTake[s,Array[{9 # - 8, 9 #} &, Floor[StringLength@s/9] ]]//StringJoin
cdecode["=E6=B5=8B=E8=AF=95=E4=B8=80=E4=B8=8B"]

same string

No doubt URLDecode is the more robust way to go. Note there are 8 bits that have been ignored here. Presumably the 'E' signifies the start of a 3-byte code - that's ignored here an should be checked.

george2079
  • 38,913
  • 1
  • 43
  • 110