Short takes: Windows Phone screenshots and force closing apps; Android static IP

I’m clearing down my browser tabs and dumping some of the the things I found recently that I might need to remember again one day!

Taking a screenshot on Windows Phone

Windows Phone 7 didn’t have a screenshot capability (I had an app that worked on an unlocked phone) but Windows Phone 8 let me take screenshots with Windows+Power. For some reason this changed in Windows Phone 8.1 to Power+Volume Up.  It does tell you when you try to use the old combination but, worth noting…

Some search engines are more helpful than others

Incidentally, searching for this information is a lot more helpful in some search engines than in others…

One might think Microsoft could surface it’s own information a little more clearly in Bing but there are other examples too (Google’s built-in calculator, cinema listings, etc.)

Force-quitting a Windows Phone app

Sometimes, apps just fail. In theory that’s not a problem, but in reality they need to be force-closed.  Again, Windows Phone didn’t used to allow this but recent updates have enabled a force-close. Hold the back button down, and then click the circled X that appears in order to close the problem app.

Enabling a static IP on an Android device

Talking of long key presses… I recently blew up my home infrastructure server (user error with the power…) and, until I sort things out again, all of our devices are configured with static IP configurations. One device where I struggled to do this was my Hudl tablet, running Android. It seems the answer is to select the Wi-Fi connection I want to use, but to long-press it, at which point there are advanced options to modify the connection and configure static IP details.

Short takes: searching in Outlook; duplexing in Excel; merging in Word; and going wild in Salesforce

This week I’ve mostly been… working in pre-sales. Consequently, this is perhaps not the most exciting blog post I’ve written… but hey, it’s a post and there haven’t been many of them recently!

First up: searching Outlook

Since I changed jobs in April, my email volume has increased by 300x. My mail archive has more messages in it as we approach the end of June than it did for the whole of 2012, and most of them have been sent/received in the last three months.  In short, being able to quickly and accurately search Outlook is important to me.

Microsoft’s website has some good advice for narrowing search criteria for better results in Outlook – for example, if you’re looking for that email from Mark Wilson with the attachment you needed? Try from:"Mark Wilson" hasattachment:yes.

Next: opening two Excel workbooks side by side

If someone sends you a spreadsheet that you need to complete, and there’s information to pull from another spreadsheet, it can be a nuisance to keep switching back and forth between windows inside the application. The answer is to use Task Manager (taskmgr.exe) to open a new copy of Excel so you now have two running processes.  Each one can be used to open a different workbook (e.g. on different monitors) and contents can be copied back and forth.

Then: merging revision comments in Word

Perhaps you work in a team where instead of collaboratively editing one document, people each create their own versions with their own comments? Thankfully, Word 2010 (and probably other versions too) can merge the comments and changes into a single document. That single feature saved me hours this morning…

Finally: wildcards in Salesforce.com reports

My final tip from “Mark’s exciting week in pre-sales” (I jest) was gleaned whilst trying to create a report in Salesforce.com to show my team’s pipeline. I can’t rely on opportunities being correctly tagged, so I needed a report that used searches on a number of fields (and a filter to apply Boolean logic) but was picking up some false positives.  The problem was that one of the search criteria was also a partial match on some other results.  By changing the “contains” criteria from thing to thing*, I got just the results that started with “thing” and not the ones that included “thing” (like “something”).

That explanation is not as clear as I’d like, but I don’t want to spill the beans on some proprietary information – just take a look at the Salesforce.com advice for refining search using wildcards.

Internet Explorer search provider for markwilson.it

Earlier today I had a go at creating a new search provider for Internet Explorer (IE) 7.0 so that I can search the markwilson.it website for information. It’s not of much practical use to anyone except to me but it is incredibly easy to achieve and works well. This is the resulting OpenSearch XML that IE generated for me:

  <?xml version="1.0" encoding="UTF-8" ?>
- <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
  <ShortName>markwilson.it</ShortName>
  <Description>markwilson.it provider</Description>
  <InputEncoding>UTF-8</InputEncoding>
  <Url type="text/html" template="http://www.markwilson.co.uk/blog/index.php?s={searchTerms}" />
  </OpenSearchDescription>

There’s more information on adding search providers to IE 7 using OpenSearch 1.1 at the IEBlog.

Googling for information

Just before Christmas, someone showed me how to use Google (yes… the search engine) to convert values between units (e.g. kilometres into miles). I didn’t know that feature was there in Google and it turns out there are many more useful search features too – things like the weather forecast, films at the local cinema, etc. (as well as the well known stuff like definitions, inbound links and Google’s cache).

What happened to not being evil…

A few weeks back, I saw the number of browser visits to this site drop dramatically overnight whilst RSS subscriptions remained constant. Thankfully, traffic is now back up to the previous levels and there could be many reasons for this but I have to suspect it’s down to Google’s latest round of cat and mouse with the SEOs.

Webstats for the last few weeks, showing a sharp dip and return to normal and last year's numbers for comparison.

markwilson.it is not a big-shot technology website – just the blog of a guy who works in IT, writes down what he learns, and publishes it for others to read. I don’t charge for that content, largely because I don’t think anyone would pay for it but also because I don’t think that to do so would fit with the spirit of the Internet. I like it when I meet people that read my blog. And I like it when I write something and someone gives something back, like a comment that says it helped them, or that they have something to add to the story. I like it when I find myself in conversation with the public relations agencies of some of the world’s largest IT companies. I also like that the advertising revenues, though still small, have been enough to cover my hosting costs and maybe buy me the odd gadget. Or at least they did until Google made its latest round of changes.

Google is trying to penalise paid links and, at the time of writing, I have a few (clearly marked under the heading of sponsors). There’s nothing wrong with what Google is doing (trying to increase the quality of the results in its index) but it’s the way they do it. I sell advertising here because I need to (somehow) monetise this site (although if I convert that into an hourly wage rate, I’m sure it will make me cry). Ironically, it seems to be OK to carry Google’s paid ads but not anybody else’s – even if they are relevant.

Prominent Google blogger, Matt Cutts, said (in 2005) that:

“Reputable sites that sell links won’t have their search engine rankings or PageRank penalized […] However, link-selling sites can lose their ability to give reputation (e.g. PageRank and anchortext).”

That’s fair enough. It seems that I can take some revenue from selling links but it won’t help the sites that I link to gain PageRank; however, if the paid links are relevant, there is a chance that people reading my site will click through to them and everyone’s a winner. Except that now that seems to have changed and selling links can hurt Google rankings. For what it’s worth, I have a disclosure notice and the advertising, sponsorship and other forms of compensation received do not influence the editorial content on this site. I also use rel="nofollow" tags where relevant to ensure that I follow Google’s directions (although I acknowledge the contribution that comments make to the blogosphere by removing the rel="nofollow" as appropriate). And after two months of tweaking links to fit Google’s model, this week my biggest sponsor ended our contract prematurely because they are dropping this form of advertising altogether.

Thanks for nothing Google. Cutts may be right when he asserts that:

“[…] Google has the right to do whatever we think is best (in our index, algorithms, or scoring) to return relevant results.”

but now they are hitting the small guys too. I can’t rely on AdSense alone. It varies too wildly (and has been declining in recent months, suggesting to me that people are spending less on Internet advertising – probably a reflection on the state of various western economies) and now you’ve started to hit the only form of regular income that this site has. What happened to the “don’t be evil” corporate motto?

I will continue to blog about things I find interesting. Maybe some other people will find it interesting too. Perhaps they will link back here and maybe the number of visitors will start to climb again as I gradually increase my placement in the Google index (however I look at things, I’m still 34.95% up on unique visits so far this month, compared to the same period last year, 47.71% up in pageviews with average pageviews and time on site also on the up, and a falling bouncerate – so the metrics all look good, it’s just the financials that are suffering). Until then, I guess I won’t be buying the MacBook Pro that I’ve had my eye on for so long.

Adding a meaningful description to web pages

One of the things that I noticed whilst reviewing the Google results for this site, was how the description for every page was shown using the first text available on the page – mostly the alternative text for the masthead photo (“Winter market scene from the small town of Porjus in northern Sweden – photograph by Andreas Viklund, edited by Alex Coles.”):

Screenshot showing duplicate descriptions

Clearly, that’s not very descriptive and so it won’t help much with people finding my site, linking to me, and ultimately improving the search engine placement for my pages, so I need to get a decent description listed for each page.

The WordPress documentation includes a page on meta tags in WordPress, including an explanation as to why they aren’t implemented by default (my template did include a meta description for each page which included the weblog title and tagline though). Even though meta tags are not a magic solution to search engine placement, I wanted to find a way to add a meaningful description for each page using <meta description="descriptionofcontent"> and also <meta keywords="pagecontext"> (although it should be noted that much of the available advice indicates that major search engines ignore this due to abuse). Fortunately there is a WordPress plugin which is designed to make those changes – George Notoras’ Add-Meta-Tags. There’s plenty of speculation as to whether or not Google actually uses the description meta tag but recent advice seems to indicate that it is one of many factors involved in the description shown in search results (although it will not actually affect positioning).

I already had meta tags in place for content-type, robots, and geolocation but I added some more that I was previously using HTML comments for:

<meta http-equiv="content-language" content="en-gb" />
<meta name="author" content="Mark Wilson" />
<meta name="generator" content="WordPress" />
<meta name="publisher" content="markwilson.it" />
<meta name="contact" content="webmaster@markwilson.co.uk" />
<meta name="copyright" content="This work is licenced under the Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales License. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-sa/2.0/uk/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA" />

Incidentally, a comprehensive list of meta tags and an associated FAQ is available at Andrew Daviel’s Vancouver webpages.

After checking back a couple of weeks later, the same search returns something far more useful:

Screenshot showing duplicate descriptions

Unfortunately my PageRank has dropped too, and it’s possible that the duplicate entries for http://www.markwilson.co.uk/ and http://www.markwilson.co.uk/blog/ are causing the site to be penalisedGoogle’s Webmaster guidelines say “don’t create multiple pages, subdomains, or domains with substantially duplicate content. The presence of those duplicate entries is actually a little odd as checking the server headers for http://www.markwilson.co.uk/ reveals an HTTP 301 response (moved permanently), redirecting to http://www.markwilson.co.uk/blog/.  Of course, it could be down to something entirely different, as PageRank is updated infrequently (there’s more information and links to some PageRank anaylsis tools at RSS Pieces but I use Page Rank Checker) and there have been a lot of changes to this site of late… only time (and building the volume of backlinks to http://www.markwilson.co.uk/blog/) will tell.

Coding horror

I just stumbled upon Jeff Atwood’s Coding Horror blog and it’s very interesting reading (even for those of us who write very little code). The article that I found was commenting on Jakob Nielsen’s latest tome on web usability. Although Nielsen makes some valid points, the comments are worth a read as they highlight some of the real compromises that website designers and website developers have to make.

I’m sure I could lose many hours reading Jeff’s writing but they all seem well-informed, to the point and interesting… these were just a few of the posts that grabbed my attention this afternoon:

  • When in doubt, make it public looks at how Web 2.0 is really just creating websites out of old Unix commands and that the new business models are really about taking what was once private and making it public!
  • SEOs: the new pornographers of the web looks at how much of the real search engine optimisation is just good web development and that many of the organisations focusing on SEO are all about money and connections – whether or not the assertions that Jeff makes in his post are correct, it’s an interesting view and certainly seems to have a lot of SEOs fighting their corner.
  • Why does Vista use all my memory? looks at how Windows Vista’s approach to memory management (a feature called SuperFetch) and how grabbing all the available memory to use it as a big cache is not necessarily a bad thing.

Removing duplicate search engine content using robots.txt

Here’s something that no webmaster wants to see:

Screenshot showing that Google cannot access the homepage due to a robots.txt restriction

It’s part of a screenshot from the Google Webmaster Tools that says “[Google] can’t current access your home page because of a robots.txt restriction”. Arghh!

This came about because, a couple of nights back, I made some changes to the website in order to remove the duplicate content in Google. Google (and other search engines) don’t like duplicate content, so by removing the archive pages, categories, feeds, etc. from their indexes, I ought to be able to reduce the overall number of pages from this site that are listed and at the same time increase the quality of the results (and hopefully my position in the index). Ideally, I can direct the major search engines to only index the home page and individual item pages.

I based my changes on some information on the web that caused me a few issues – so this is what I did and by following these notes, hopefully others won’t repeat my mistakes; however, there is a caveat – use this advice with care – I’m not responsible for other people’s sites dropping out of the Google index (or other such catastrophes).

Firstly, I made some changes to the section in my WordPress template:







Because WordPress content is generated dynamically, this tells the search engines which pages should be in, and which should be out, based on the type of page. So, basically, if this is an post page, another single page, or the home page then go for it; otherwise follow the appropriate rule for Google, MSN or other spiders (Yahoo! and Ask will follow the standard robots directive) telling them not to index or archive the page but to follow any links and additionally, for Google not to include any open directory information. This was based on advice from askapache.com but amended because the default indexing behaviour for spiders is to index, follow or all so I didn’t need to specify specific rules for Google and MSN as in the original example (but did need something there otherwise the logic reads “if condition is met donothing else dosomething” and the donothing could be problematic) .

Next, following fiLi’s advice for using robots.txt to avoid content duplication, I started to edit my robots.txt file. I won’t list the file contents here – suffice to say that the final result is visible on my web server and for those who think that publishing the location of robots.txt is a bad idea (because the contents are effectively a list of places that I don’t want people to go to), then think of it this way: robots.txt is a standard file on many web servers, which by necessity needs to be readable and therefore should not be used for security purposes – that’s what file permissions are for (one useful analogy refers to robots.txt as a “no entry” sign – not a locked door)!

The main changes that I made were to block certain folders:

Disallow: /blog/page
Disallow: /blog/tags
Disallow: /blog/wp-admin
Disallow: /blog/wp-content
Disallow: /blog/wp-includes
Disallow: /*/feed
Disallow: /*/trackback

(the trailing slash is significant – if it is missing then the directory itself is blocked, but if it is present then only the files within the directory are affected, including subdirectories).

I also blocked certain file extensions:

Disallow: /*.css$
Disallow: /*.html$
Disallow: /*.js$
Disallow: /*.ico$
Disallow: /*.opml$
Disallow: /*.php$
Disallow: /*.shtml$
Disallow: /*.xml$

Then, I blocked URLs that include ? except those that end with ?:

Allow: /*?$
Disallow: /*?

The problem at the head of this post came about because I blocked all .php files using

Disallow: /*.php$

As http://www.markwilson.co.uk/blog/ is equivalent to http://www.markwilson.co.uk/blog/index.php then I was effectively stopping spiders from accessing the home page. I’m not sure how to get around that as both URLs are serving the same content, but in a site of about 1500 URLs at the time of writing, I’m not particularly worried about a single duplicate instance (although I would like to know how to work around the issue). I resolved this by explicitly allowing access to index.php (and another important file – sitemaps.xml) using:

Allow: /blog/index.php
Allow: /sitemap.xml

It’s also worth noting that neither wildcards (*, ?) nor allow are valid robots.txt directives and so the file will fail validation. After a bit of research I found that the major search engines have each added support for their own enhancements to the robots.txt specification:

  • Google (Googlebot), Yahoo! (Slurp) and Ask (Teoma) support allow directives.
  • Googlebot, MSNbot and Slurp support wildcards.
  • Teoma, MSNbot and Slurp support crawl delays.

For that reason, I created multiple code blocks – one for each of the major search engines and a catch-all for other spiders, so the basic structure is:

# Google
User-agent: Googlebot
# Add directives below here

# MSN
User-agent: msnbot
# Add directives below here

# Yahoo!
User-agent: Slurp
# Add directives below here

# Ask
User-agent: Teoma
# Add directives below here

# Catch-all for other agents
User-agent: *
# Add directives below here

Just for good measure, I added a couple more directives for the Alexa archiver (do not archive the site) and Google AdSense (read everything to determine what my site is about and work out which ads to serve).

# Alexa archiver
User-agent: ia_archiver
Disallow: /

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Finally, I discovered that Google, Yahoo!, Ask and Microsoft now all support sitemap autodiscovery via robots.txt:

Sitemap: http://www.markwilson.co.uk/sitemap.xml

This can be placed anywhere in the file, although Microsoft don’t actually do anything with it yet!

Having learned from my initial experiences of locking Googlebot out of the site, I checked the file using the Google robots.txt analysis tool and found that Googlebot was ignoring the directives under User-agent: * (no matter whether that section was first or last in the file). Thankfully, posts to the help groups for crawling, indexing and ranking and Google webmaster tools indicated that Googlebot will ignore generic settings if there is a specific section for User-agent: Googlebot. The workaround is to include all of the generic exclusions in each of the agent-specific sections – not exactly elegant but workable.

I have to wait now for Google to re-read my robots.txt file, after which it will be able to access the updated sitemap.xml file which reflects the exclusions. Shortly afterwards, I should start to see the relevance of the site:www.markwilson.co.uk results improve and hopefully soon after that my PageRank will reach the elusive 6.

Links

Google webmaster help center.
Yahoo! search resources for webmasters (Yahoo! Slurp).
About Ask.com: Webmasters.
Windows Live Search site owner help: guidelines for succesful indexing and controlling which pages are indexed.
The web robots pages.

The search engine friendly way to merge domains

In common with many website owners, I have multiple domain names pointing at a single website (markwilson.co.uk, markwilson.me.uk and markwilson.it). There’s nothing wrong with that (it’s often used to present localised content or to protect a trademark) but certain search engines will penalise sites where it appears that multiple URLs are being used to present duplicate content (hence increasing the link count and inflating the position within the index).

The trick is to ensure that the domains are merged in a manner which is acceptable to the major search engines. It’s generally accepted that the way to do this is to choose the primary domain name (in my case, that’s markwilson.co.uk) and to rewrite any requests received by the web server(s) on any secondary domain names so that they are redirected to the primary domain name (using HTTP status code 301 – moved permanently).

For a site running on an Apache web server with the mod_rewrite module compiled, this is achieved using some directives in the .htaccess file. A description of the required code can be found in various locations, including Brian V Bonini’s 301 permanent redirect article but my site uses some code from a recent .net magazine article to combine the domain name rewrite with the placement of any missing www prefix:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^primarydomain\.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^(www\.)?secondarydomain\.com$ [NC]
RewriteRule ^(.*)$ http://www.primarydomain.com/$1 [R=301,L]

After making the changes, it’s important to check the server headers (e.g. using the SEO Consultants check server headers tool) and ensure that the server is correctly returning an HTTP status code 301 that redirects to the primary domain name, hopefully resulting in an eventual HTTP status code 200 – OK:

#1 Server Response: http://www.secondarydomain.com/
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Tue, 27 Feb 2007 15:19:42 GMT
Server: serverdetails
Location: http://www.primarydomain.com/
Connection: close
Content-Type: text/html; charset=iso-8859-1
Redirect Target: http://www.primarydomain.com/

#2 Server Response: http://www.primarydomain.com/
HTTP Status Code: HTTP/1.1 200 OK
Date: Tue, 27 Feb 2007 15:19:44 GMT
Server: serverdetails
Connection: close
Content-Type: text/html

More problems since the Blogger upgrade

Blogger logoSince the middle of last year, I’ve been using a sitemap to help spiders to crawl around my little bit of the web. After looking into the various options, the easiest method for me (by far) was to use the XML-Sitemaps generator but the free version is limited to 500 pages. Upgrading to the paid version was the best $19.99 I ever spent as Google now indexes all my pages (therefore increasing my exposure on the ‘net and hence my advertising revenue, which may be small but is worth having).

Unfortunately, when I tried to run the generator yesterday, it refused to index my blog (which, at the time of writing, represents 98.8% of my website’s pages) but luckily (and this is another reason for having the paid XML Sitemap generator), within a few hours I had an answer to my problem from the administrator of the XML Sitemaps Forum – for some reason, my blog pages contained the following tag:

<meta name="ROBOTS" content="NOINDEX,NOFOLLOW">

It’s no wonder that the pages were being skipped as this is a directive for robots that says not to index this page and not to follow links!

Now, I didn’t add that tag… so how did it get into my code? It seems that it was added by Blogger. Blogger uses a system of template tags to generate content, one of which is <$Blog MetaData$>, used to insert all of the blog’s meta data. This has been working for me up to now, but it seems that the upgrade has added the directive for robots not to index my pages, nor to follow links. According to Blogger’s help text, this is only inserted if a blog is set not to be added to listings, but mine has a very definite yes (I do want to be listed):

Screen shot showing that the blog is set to be listed

After replacing the template tag with the correct (manually-inserted) meta data, I was able to crawl the site successfully and create an updated sitemap.

I’m not denying that Blogger is a great system for people starting out with their own blog (and many of the new features are good for more advanced bloggers too) but it seems to me that considering it’s owned by Google (a company with many products that seem to be in perpetual beta) it has more than its fair share of problems and it looks as if a major upgrade has been rushed out of the door (I’ve already had to apologise to subscribers that old posts are creeping back in to the Atom and RSS feeds). I wanted to stay on the old platform for as long as possible but when I logged in a few days back I was given no choice but to upgrade.

Thankfully, my pages didn’t drop out of the Google index (as I upload the sitemap manually and so spotted the error) but this directive may well have affected the way in which other search engines index my site… luckily I caught it within a few days of the offending code being inserted.