Thursday, July 7, 2011

The easiest way to parse Google Spreadsheets with PHP

After tinkering with the available publishing options for Google Docs Spreadsheets, I landed on the following.

$jsonData = file_get_contents("");
var_dump(json_decode($jsonData, true));

This makes the assumption that your spreadsheet content is not private and is OK to "publish as a web page".

Wednesday, March 9, 2011

od6 and Finding Other Worksheet IDs

If you dissect the URLs that Google provides when publishing a Doc to the web, you may come across "od6".

"od6" is the worksheet-id of the 'default' sheet within your Google Spreadsheet. That is, the leftmost sheet visible within the interface.

If you have tried publishing all worksheets within the Doc, or any specific worksheet other than the default, as an ATOM or RSS feed, you would have noticed that only the first/default sheet is included in the feed.

To further complicate things, the interface that you use to publish from Google Docs grays-out the drop-down that otherwise would allow you to choose which sheet to publish when ATOM or RSS is selected as your output format. I find this very confusing, don't you?

To address this, you can grab the link that Google provides for your ATOM or RSS feed and edit the "0d6" part to represent the worksheet-id of the worksheet you would like to see. What do you change it to? Well, that's the hairy part. There doesn't seem to be a simple way to find that info.

To determine the worksheet IDs for your spreadsheet, you have to find the data within on of your feeds.
Do this:

Ask for your worksheet feed by going to:

Review the results and look for the various nodes describing your worksheets. Within each node, you should find the actual worksheet ID for each respective sheet. Worksheet IDs always start with "od" which  will usually be followed by a number.

Use the desired worksheet ID in place of "od6" and whalla! All better!

JSON and Google Docs

If you're following along, at this point I am using a PHP script that I wrote which takes the CSV output from Google Docs and spits out a lovely JSON object.

While doing additional research for the project, I came across a page that explained how to do this without my local PHP proxy. It seems that the bright minds at Google already addressed this issue and provided a way to attain a JSON feed of any document, AND they wrap the returned data in SCRIPT tags. You can also add a parameter to declare the name of the callback function that should be called when the data is finished loading.

The example in the documentation uses Google Calendars, but with some trial and error, I constructed a URL that works with the Spreadsheets....

Putting that as the SRC attribute of our script tag will initiate the transaction and will call my custom JavaScript function "dataLoaded()" once the data is finished loading.

You can change which individual worksheet is retrieved by changing the number "1" in the URL.
Changing the "1" to a "0" will retrieve the first/leftmost sheet and changing it to "2" will grab the third sheet from the left. Make sense?

There are inherent problems, though, with that. If you change the order of the sheets at any time down-the-road, your scripts will ask for the wrong sheet and will break. But don't worry, there is a solution for that as well. But it's a little convoluted, so it shall require its own post.

From here on out, I will use this method of retrieving data from Google Doc Spreadsheets for this project.
If that changes, I'll be sure to post about it.

Google Spreadsheets - Publish to the web

A key component to this framework will rely heavily on the "Publish to the web" function within Google Docs.

The concept behind this feature is pretty straight forward: You select which sheets to publish and the format that you would like, and Google provides a link to said output. You can even publish a single spreadsheet cell if you choose.

I experimented with several combinations of output format to determine which would meet the needs of this project.

Here's what I learned about the various output formats:

(Web Page)  - will allow you to publish all sheets (worksheets) within your spreadsheet or any one sheet.
The output is intended to be used as-is. Parsing out individual pieces isn't really worthwhile and restyling any of the elements is further complicated due to the heavy usage of inline styles.

(HTML to embed in a page) - This just spits out the same page as the previous option, but it wraps it in an iFrame. Meh.

(CSV) - Comma separated values. Parsing a CSV stream can be a real pain, but there are functions within PHP that can make it more straight forward. I didn't spend much time with this option because it seemed too likely to cause problems in the future should I decide to change the order of my columns.

(TXT) - Plain Text. This obviously has limited implementations. I found it perfect for a single-cell output.

(PDF) - Didn't try this one.

(ATOM & RSS) - These are separate output options, that behave very similarly. This is where I spent a lot of time because these feeds are typically easy to parse and work with. They do have a good amount of overhead (data in the feed that we will ignore), but that's a small price to pay for the benefits of these formats. It should be noted that these formats allow output either as 'List' or 'Cells'. For this project, I am using 'List'.

While this worked fine at first, I started noticing some strangeness as my testing progressed. The Goole interface, when this output format is selected, de-activates the sheet-selection drop-down. In other words, if you need to get a feed of any sheet other than the first sheet, you're out of luck. Rats!

I have since found a solution to this, but it is convoluted and will require its own post.

Since I NEED to get feeds of individual worksheets within my spreadsheet, I will need to engineer a different solution. So after some further testing, I found that I can get individual worksheet feeds to happend using the CSV output.

Now, anyone who has spent time parsing CSV in javascript can tell you that it can be a pain in the ass. And since I will have to send the feed through a local proxy (to avoid cross domain scripting restrictions), I might as well convert it to a JSON object before I spit it back out. And that's what I did. I wrote a simple PHP script that takes two arguments. The first is the feed url, the second is whether to use the first row as column headers.

This actually worked great! I could get at any worksheet that I needed, and the JSON data was super easy to work with within JavaScript.

But wait, there's more!

Wednesday, March 2, 2011

Cross domain issues

Ideally, this solution will be simple and self contained.
I wanted to pull this off without any locally hosted server-side scripts. But as I thought this through, it became apparent that I would need at least one php file to provide proxy and authentication functionality.

So I built a basic php file that would take the CSV that Google Docs spits out, and returns a JSON object. This solved the xhr problem and allowed me more options in addressing a strange limitation of the Google Docs publishing system.

I am willing to accept that this system may need one php file to make it work.
That being said, I stumbled apon this beauty while researching for this project.
By using a predefined URL with your spreadsheet's id and a few extra parameters, Google will return a json object wrapped in <script> tags. Wrapping the returned data gets us past the cross-domain issues. One of the extra parameters defines the callback function that you want called when the data is finished loading. This should be a great feature for read-only content.

I still need to determine how to handle authentication for read-write content situations.

Thursday, February 3, 2011

Google Docs CMS

More often than not, I find myself in need of a quick site that allows an otherwise tech un-savy client the ability to make changes to the content.

In the past, I've tried popular open source CMSs like WordPress, Drupal and Joomla. I've even programmed a few custom CMSs from scratch. But nothing ever seemed to fit the bill. There was either way too much overhead, too much setup time and/or the content management part was too complex for the client.

So having done all that, I was intrigued by a post (insert link) that claimed to use Google Spreadsheets in a novel way to allow a client to update a part of a webpage.

The concept seemed promising. Inherently limited due to its infancy, but very promising nonetheless.

I decided to see if I could develop it further to establish a system that would meet my recurring needs:  Minimal Setup (preferably none), No Database Setup or Administration, No Backend Programming, and Minimal Learning Curve For The Client.

I decided to see how far I could take the idea. And here's what I've learned.