StringCases functionality

Question

I have a list of strings in where each element of the list is in this form:

{"created_at":"Thu Aug 08 20:53:26 +0000 2013","id":365576505679568896,"id_str":"365576505679568896","text":"Who wears it better? #TBT http:\/\/t.co\/vAXNgiRmYo","source":"web"

I am trying to extract some specific parts of each element. I prepared this function:

extract[string_] := StringCases[string, {
    "\"created_at\":\"" ~~ Shortest@x__ ~~ "\",\"id\":" :> x,
    "\",\"id\":" ~~ Shortest@b__ ~~ ",\"id_str\":\"" :> b,
    ",\"id_str\":\"" ~~ Shortest@c__ ~~ "\",\"text\":\"" :> c,
    "\",\"text\":\"" ~~ Shortest@d__ ~~ "\",\"source\":\"" :> d,
    "\",\"source\":\"" ~~ Shortest@e__ ~~ "\",\"truncated\":" :> e
   }
]

And then

extract/@listOFelements

But as example for the element above I get this result:

{"Thu Aug 08 20:53:26 +0000 2013", "365576505679568896", "web"}

Some elements like the text flanked by "\",\"text\":\"" and "\",\"source\":\"" are not detected from the string. How should I make it possible to detect it?

Your first expression is not closed, it's missing the end part after web... — Murta, Aug 09 '13 at 10:47
Even the first brace is a part of the string!! The string ends with no brace, as I have manipulated it before. — Morry, Aug 09 '13 at 10:51
I can't copy the expression into MMA. It get's a error, where is the "","truncated":" part? Can you correct it — Murta, Aug 09 '13 at 10:53
If the whole of your "list of strings" is one string, you don't have a list of strings. — m_goldberg, Aug 09 '13 at 10:54
I am just copying the text directly from MMA to web! And I have a list of element which each element has the same structure as the sample above. But the information flanked by the parts in the formula are different! — Morry, Aug 09 '13 at 10:59

score 4 · Accepted Answer · answered Aug 09 '13 at 12:46

4

You say you manipulated the string before and that's why it's missing a brace at the end. It looks like you may have deformed a JSON string, in which case you did yourself a big disservice as such lists can be imported by MMA.

Let's first repair your string:

str = "{\"created_at\":\"Thu Aug 08 20:53:26 +0000 \
2013\",\"id\":365576505679568896,\"id_str\":\"365576505679568896\",\"\
text\":\"Who wears it better? #TBT \
http:\\/\\/t.co\\/vAXNgiRmYo\",\"source\":\"web\"";
repaired = str <> "}"

Now, import the string:

rules = ImportString[repaired, "JSON"];

Extract the information you want:

{"created_at", "id", "source"} /. rules

{"Thu Aug 08 20:53:26 +0000 2013", 365576505679568896, "web"}

JSON is a very popular data format, so you would do well to remember it and recognize it where it pops up.

I also note that you've asked eight questions so far and have accepted no answer for any of those questions.

answered Aug 09 '13 at 12:46

C. E.

70,533
6
140
264

The reason I did not accept the solutions is I realized non of them is a general solution but maybe a solution like the other answer to this post which is not a consistent one!! – Morry Aug 09 '13 at 13:38
And I have another element which is not importable by this function!! – Morry Aug 09 '13 at 13:52
@Morry Perhaps the problem is the way you're asking your questions, no offense. Provide a representative sample and I will be happy to try to find a general solution. – C. E. Aug 09 '13 at 13:59
Please make a look to this case: https://dl.dropboxusercontent.com/u/76785824/test.nb – Morry Aug 09 '13 at 14:03
Maybe you are right.... You were not offensive at all. :) – Morry Aug 09 '13 at 14:03
@Morry I recognize that, that's from the Twitter API? The Twitter API uses JSON. Now there seems to be missing a bracket or something that causes ImportString to throw an error. Did you do anything to this string? – C. E. Aug 09 '13 at 14:04
If not, I would try to look at the JSON format and compare and see what differs. – C. E. Aug 09 '13 at 14:07
There is a huge string which is imported by xml = URLFetch[ "https://api.twitter.com/1.1/statuses/user_timeline.json?screen_\ name=BarackObama&count=50", "OAuthAuthentication" -> token] and then is split by StringCases[xml, "{\"created_at\"" ~~ Shortest@x__ ~~ "\"}"] to make a list of Tweets. Then I face the problem in this post with the second tweet and some other! – Morry Aug 09 '13 at 14:07
It is the complete output https://dl.dropboxusercontent.com/u/76785824/test2.nb – Morry Aug 09 '13 at 14:12
Don't do any splitting, all you have to do is this: ImportString[URLFetch[...],"JSON"] and it should work. It should return a list of tweets. – C. E. Aug 09 '13 at 14:16
@Morry I was able to verify with your second file that ImportString[URLFetch[...],"JSON"] works, or ImportString[xml,"JSON"]. The lesson here is; if they're nice enough to give you data in an importable format, don't ruin it. – C. E. Aug 09 '13 at 14:19
Great! The nested ImportString works perfect! The very truth is I love People here as well Mathematica!! – Morry Aug 09 '13 at 14:22

score 1 · Answer 2 · answered Aug 09 '13 at 11:28

1

Assuming that you first expression is text, I prefer the RegularExpression approach as follows:

str = "{\"created_at\":\"Thu Aug 08 20:53:26 +0000 \
2013\",\"id\":365576505679568896,\"id_str\":\"365576505679568896\",\"\
text\":\"Who wears it better? #TBT \
http:\/\/t.co\/vAXNgiRmYo\",\"source\":\"web"

re=RegularExpression;
StringCases[str,
    {re["\"created_at\":\"(.+?)\""]-> "$1"
    ,re["\"id\":(.+?),"]-> "$1"
    ,re["\"id_str\":\"(.+?)\""]-> "$1"
    ,re["\"text\":\"(.+?)\","]-> "$1"
}
]

you get:

{"Thu Aug 08 20:53:26 +0000 2013","365576505679568896","365576505679568896","Who wears it better? #TBT http:\/\/t.co\/vAXNgiRmYo"}

answered Aug 09 '13 at 11:28

Murta

26,275
6
76
166

Something to consider: (25677) – Mr.Wizard Aug 09 '13 at 12:06
The fact is my text is longer and there would be some duplicates within this procedure. I would like to make a more specific patterns like this: re["\"created_at\":\"(.+?)\",\"id\":"] -> "$1", re["\"id\":(.+?),\"id_str\":\""] -> "$1", re["\",\"text\":\"(.+?)\",\"source\":\""] -> "$1", re["\"id_str\":\"(.+?)\",\"text\":\""] -> "$1" where fails to make the proper result! – Morry Aug 09 '13 at 12:18
Can you give us a better example in your question? – Murta Aug 09 '13 at 14:05

StringCases functionality

2 Answers2

Linked