Tag: "excel"

07/19/13

  09:17:00 am, by Jim Jenal - Founder & CEO   , 1332 words  
Categories: SCE, Ranting, 2012

Friday Geekfest - If you write it, will they come? SEO Insights for Solar

Today’s post has nothing to do with solar, except in a sort of meta sense. You see, the process of creating these posts is not really driven by some Grand Plan but rather what is going on now or at least strikes us as interesting.  But of course what we think is interesting is in no way guaranteed to interest You, the reader.  Every blogger, heck, every writer wants to create content that readers will want to read.  Finding that match is the key - but how?

We came across an interesting article on point the other day thanks to SEOMoz.  The article, titled: Two Amazing Bar Charts: % Content Consumption, % Share of Search, introduced us to a geeky way of attacking the question of what content connects with our readers.  The author, Avinash Kaushik, suggested that you could gain some useful insight into how well you were doing by combining data from two sources: your website analytics provider (Google Analytics in our case) and your content management system (in our case, b2Evolution).

By summarizing your posts by categories and then totaling unique page views for those categorized posts, you could see how much content you are creating in each category and how much those posts are driving traffic.  If you are on track, there should be a direct correlation - categories where you are writing more are getting the most visits.  But are they?  We decided to run the analysis for one year’s worth of posts, from last July through the end of June.

Seems simple enough, here’s where it gets geeky.  (Note: if you just want to see the results, feel free to skip down to the cool chart at the end of all the geek-speak.)

Talk to me, B2!

Way back when we started this blog we had to decide what software to use.  For better or worse, we chose b2Evolution, although most of what you see out there seems to be on Wordpress.  One of our biggest complaints about b2Evo is that it provides pretty meager statistics - and nothing like the data summary that we needed.  Of course the database itself must have what we needed, it was just a question of figuring out where.  Oh, and how would be important too!

The b2Evo database contains 52 separate tables - which one(s) have the information that we need?  After a fair amount of poking around and a couple of false starts, we determined that a query combining two tables would allow us to retrieve the data that we wanted: a new table that would list the URL of each post during the time period combined with the name of the primary category under which the post was filed.  (We frequently associate multiple categories with a given post, but each post has a primary category and that is what we were seeking for this analysis.)  For the mySQL geeks out there, here is the query we (finally) concocted:

SELECT post_ID, post_datecreated, post_main_cat_ID, post_urltitle, cat_name FROM evo_items__item
INNER JOIN evo_categories ON (post_main_cat_ID = cat_ID)
WHERE post_status = ‘published’ AND
post_datecreated > ‘2012-06-30 23:59:59′ AND
post_datecreated < ‘2013-07-01 00:00:00′
ORDER BY post_datecreated DESC

We were then able to export the results table to a text (CSV) file for importing into Excel.  Now we needed to get our analytics data.

Analyze This

Google Analytics (GA) is a much easier platform from which to extract lots and lots of data.  For this analysis what we needed was to filter by our one year start and end dates and look at the Content - Overview report.  That provided more than what we needed for this analysis, but significantly it gave us the URL of the viewed page (within the runonsun.solar domain) and unique pageviews.  We opted for unique pageviews to filter out those instances where someone visits the same page within a session as what we were after was the post-reader connection and unique pageviews is the best handle on that relationship.

Once again, we were able to export the data into a text file for importing into Excel.

Excelling at Posting

So after importing the two data files into separate tabs of our Excel workbook we were ready to start pairing things up.  But of course, things are never that easy.  For starters, the manner in which our URLs appeared from b2Evo and from GA were not at all the same and Excel’s look-up functionality could not resolve the difference.  Take this example:

b2Evo do-we-really-need-more-failed-solar-companies
GA /~runons5/blogs/blog1.php/solnews/do-we-really-need-more-failed-solar-companies

 

To match them up, we wanted to slice off everything to the right of the last ‘/’ and use that to do our matching.  Turns out that is a more complicated task than you might imagine.  A little probing on the Internet turned up this, somewhat counter-intuitive solution:

=RIGHT(A2, LEN(A2) - FIND("|",SUBSTITUTE(A2,"/","|",LEN(A2) - LEN(SUBSTITUTE(A2,"/","")))))

(I warned you this was going to be a geekfest!)

A new column was created in the GA data table and populated with the results of that lovely formula and voilà, we had URLs in both tables that would yield a match. Now our b2Evo data could include a value for the unique pageviews from each post that was written during the previous twelve months.  From there it was a simple matter of creating a pivot table based on categories, counting the number of posts in each category (listed as a percentage of the column total) and summing the unique pageviews in each category (again, displayed as a percentage of the column total).  We filtered for the top ten categories (by pageviews) and that accounted for 110 separate posts and 13,659 unique pageviews.

Put all that into a chart and here is what you get.

Drum Roll Please

Ta da…

Post categories vs unique pageviews

(Please click on this and view the full size image to do this justice!)

So having gone through this exercise, what does this tell us?  Well, my two largest categories for posts - Solar News and Ranting (which is where this post is categorized) - are also my two largest categories for views - so that is consistent with what we would like to see.  (Ok, perhaps a little less ranting - good luck with that!)

On the other end of the scale, we are clearly writing more about both Commercial Solar and Energy Storage than our present viewership supports.  That’s ok, we are looking to build readers in those vital categories so we are willing to be leading a bit there.

But look at the two major outliers: CSI 2012 and SCE.  The CSI 2012 category only had four posts - the three part data analysis series of CSI data that we did last year, plus a teaser post - yet it drove more than 15% of our views!  Perhaps rather than doing this once a year, which is what we have done the past two years, we should do this every six months?  That’s a lot more work - those posts are very labor intensive and require rigorous editing - but in terms of what our audience is reading, it would seem to be time well spent.

The SCE category is more interesting still since it really doesn’t involve as much effort as the CSI category does.  There were three posts in that category yet they accounted for more than 14% of all views!  Wow - didn’t see that coming!  Looks like it is time for us to spend a little more time looking at SCE and putting informative posts out there for our readers.  Message received!

Perhaps the most surprising result from this analysis was that we were surprised by the results.  We like to think that we have a pretty good handle on what our audience wants to see - and for the most part the above graph suggests that we do - but still there were surprises.  Like any other business metric, knowing what your customers want from your blog - or the products that you are selling - is key to being successful.

If you are a solar company writing a blog and you are serious about better connecting with your audience - colleagues and potential clients alike - you might want to try performing a similar analysis and see where it leads you.

Geek on!

 Permalink

05/25/12

  08:31:00 am, by Jim Jenal - Founder & CEO   , 1364 words  
Categories: All About Solar Power, Westridge PAC Project

Solar Eclipse - A Data Geek's View

On Sunday, May 20, 2012, an annular eclipse was visible in large parts of the United States, including here in Pasadena. To the data geeks here at Run on Sun, that gave us a view to the eclipse that was just a bit different - a Data Geek’s view, if you will.

One of the great things about having a solar power system that is monitored - particularly with a system like Enlighten from Enphase - is that it automatically captures a great deal of data and sometimes that data can be used to look at some surprising things - like a solar eclipse! So this will be a brief tutorial on how to take a raw data set and turn that into a graphical representation of the solar eclipse. (We used Excel 2010 for our analysis, but any tool that allows for the creation of pivot tables should allow you to do the same thing.)

Hypothesis

We started out with a simple hypothesis - since the annular eclipse covered as much as 80% of the sun’s surface (as seen from SoCal), we would anticipate that there would be a significant impact on the output of a solar power system during the eclipse. If we looked at the data from the days prior to the eclipse (excluding any days that were dominated by cloud cover during the relevant time when the eclipse occurred) we should see a noticeable decline in system output relative to the other days. If we graphed those days on the same graph, the eclipse should be an obvious outlier.

But to test our hypothesis, we would need to gather some data!

Data, data, data…

Enphase Enlighten reportsThe Enphase Enlighten website allows a system owner to access a variety of reports about how the system has been performing. One of those reports is titled “Site Recent Power Production” and it provides a comma separated values (csv) file that contains the site’s power output over five minute intervals for the preceding seven days.

Excel will open the .csv file and once you have it opened, you can begin the process of massaging the data into the form needed to “see” the eclipse. (If you care to follow along, you can download the dataset that we used by clicking on this link.)

raw data

The raw data consists of two columns: a time stamp value and the power produced at that moment in time in Watts. The time stamp combines the full date with the time of day, and as you can see here, the time increases in five minute intervals. Altogether, our week’s worth of data consists of 2,016 data pairs which we will ultimately want to graph.

However, we want to partition the data into seven discrete days (which will then be graphed on top of each other), with hours of the day and five-minute intervals on the X-axis. To achieve that end, we need to add some columns based on the time stamp value. Fortunately, Excel has a number of built-in functions that will let us do exactly what we want to do with a minimum of effort.

 

By-the-way, it is interesting to note that the system is already generating a significant amount of power by 5:45 in the morning! The system being monitored has 209 solar panels which means that at 5:45 we were receiving, on average, just over 2 Watts per panel.

This ability to produce power at very low levels is another way in which a microinverter system helps to maximize overall system yield.

adding hour and minute columns

We continue this process by creating columns for the hour and the minute. Fortunately, this is nothing more than using the built-in “hour” and “minute” functions from Excel.  By selecting our date/time field with hour and minute, we can establish the remaining data fields that we will need.

However, we now have lots of rows of data to process and making sense of that can be difficult—difficult, that is, unless you use pivot tables.

Pivot Table to the Rescue!

Now we are ready to create our pivot table. For those of you who do not know, pivot tables are a data geek’s best friend, and the means by which massive amounts of data gets distilled into a meaningful table - and ultimately a graph. There are lots of things that you can do with a pivot table but the one we will be using here is very simple.

On the insert tab choose pivot table. For the data source select the power column and the three new columns that we just created. You can embed the table on the current worksheet or in a new one.

our pivot table - order out of chaos

Then do the following:

  • Select the “Day” field and drag it to the “Column Labels” box.
  • Select the “Hour” field and drag it to the “Row Labels” box.
    • Do the same thing with the “Minute” field.
  • Now drag the “Power Produced” field to the “Values” box. Note that it gets labeled as “Count of Power Produced (W)” in the box and all of the data values are “1″s. What Excel is doing is counting how many power values there are for each five-minute interval - not what we want.
    • To change it, click on the down arrow next to the field and select “Value Field Settings…” Where it says “Summarize value field by” change from Count to Sum and click OK.
  • Click on the heading “Column Labels” and type Days.
  • Do the same for “Row Labels” and enter Hours/Minutes.
  • We don’t need the “Grand Total” so right mouse on that heading and select “PivotTable Options…” Click on the “Totals & Filters” tab and deselect both of the Grand Totals options.

If you’ve followed along correctly, you should have a pivot table that looks like the one on the right.

Click on the drop down filters and deselect the “blank” value for both Days and Hours/Mins. Also, we only want hours where the total power exceeded zero watts, so select Value Filters from the Hours/Minutes drop down. Select Greater Than… and enter 0.

Now we are ready to create our graphs.

A Picture is Still Worth a Thousand Words!

We have distilled our 2,016 rows of raw data into a more manageable form, but it is still hard to “see” our eclipse - time to graph this and let the data speak for itself!

first eclipse graph

From the PivotTable Tools menu, choose options and then click on PivotChart. Choose a Line chart from the pop-up. Right mouse on the chart and select “Move Chart…” to move it to a new worksheet. This looks pretty good, but Day 5 is very noisy - let’s exclude it. Click on the Day drop down and deselect Day 5. That’s our eclipse graph on the right.

Can you “see” the eclipse now? The chart is still quite busy; let’s exclude some additional days to see if we can make this cleaner.

Return to the Day drop down and exclude every day except days 3 & 7. Now our new graph just tracks two days and we excluded a lot of distractions from what we were seeking.

second eclipse graph

Here’s our new graph:

The graph clearly illustrates that these two days saw almost identical power production for the vast majority of the day, until late in the afternoon. The timing of that sole difference coincides with our eclipse - can we zoom in for a closer look?

Easily enough we can - all we need to do is limit the time span displayed to the time period associated with the eclipse. The eclipse began around 5:25, reached its maximum roughly an hour later and was over by an hour after that. By selecting the Hour drop down filter we can limit the time period to the four hour window between 5 and 8.

Here is our final “view” of the eclipse:

final view of the eclipse

Our final graph clearly depicts the impact of the eclipse on the power output of our solar array.

Monitoring systems are generally considered an important part of commercial solar systems since they help the system owner maximize their investment. However, this is just one example of how a monitoring system can also be used as a tool to observe and quantify other real world phenomenon - making this an important adjunct to a solar power system at a school - like this one at the Westridge School for Girls in Pasadena.

Help Save Rooftop Solar!

California Utilities are trying to kill rooftop solar on your home by gutting net metering - but you can stop them!
Join the fight by signing the petition today!

Sign the Petition!

Search

Jim Jenal is the Founder & CEO of Run on Sun, Pasadena's premier installer and integrator of top-of-the-line solar power installations.
Run on Sun also offers solar consulting services, working with consumers, utilities, and municipalities to help them make solar power affordable and reliable.

Ready to Save?

Let’s Get Started!

We're Social!



Follow Run on Sun on Twitter Like Run on Sun on Facebook
Run on Sun helps fight Climate Change
Complete website engine