Planning and deploying Microsoft Office SharePoint Server 2007

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

It’s been a few months since I attended a Microsoft event but last night I made the trip to Reading for a session on planning and deploying Microsoft Office SharePoint Server. Hosted by a vendor (rather than one of the IT Professional technical evangelist team), I was initially unsure of how useful the event would be but Steve Smith (an MVP who part-owns consultancy Combined Knowledge and is very active in the UK SharePoint User Group) was an extremely knowledgeable and engaging speaker. Furthermore, he took the time during the mid-session break to answer each of my (many) questions, for which I’m extremely grateful! What follows, is a summary of the content from last night’s event combined with links and additional information from my own research.

Firstly, it’s worth clarifying that SharePoint is a branding for a group of products and technologies and the two major product versions are:

It is important to note that WSS is a free of charge download for licensed Windows Server users whereas MOSS requires separate licenses (server and client access). It should also be noted that MOSS replaces the previous SharePoint Portal Server (SPS) 2003 product and the change of name reflects that SharePoint is more than a portal – it’s a collaboration platform.

WSS integrates with Windows Server (2003, at the time of writing there are still some issues with the beta for Windows Server codenamed Longhorn), Internet information services (IIS) and the Microsoft.Net Framework 3.0 (including the Windows Workflow Foundation) to provide a collaboration platform, using SQL Server 2000 or 2005 as its database. That’s a lot of dependencies and a lot of variables in the choice of server configuration; however it’s worth noting that a separate database server is recommended (more on that in a moment) and using SQL Server 2005 will provide performance gains over SQL Server 2000. WSS provides the ability to open, add, create and check in/out documents for collaboration at a departmental level; however it is not a document management solution. It provides some foundation services (storage, security, management, deployment, a site model and extensibility) for a collaborative solution.

MOSS builds on WSS (indeed, the MOSS installation includes WSS and will overwrite an existing WSS installation) to provide shared services for portal services, enterprise content management (formerly provided by Content Management Server 2002), enterprise search and indexing and business intelligence and forms (described as “a window on business systems”). What Microsoft’s marketing materials do not highlight, is that MOSS can also provide a front end to enterprise document and records management (EDRM) solutions such as those provided by Meridio, (EMC) Documentum and Open Text.

In designing MOSS, Microsoft attempted to address a number of customer pain points that existed in SPS:

  • Poor resource utilisation and isolation.
  • Inconsistent setup.
  • Network support.
  • Difficult central administration.
  • Topology restrictions.
  • Upgrades.

Many of these have been addressed (for example, unlike with SPS, it’s a simple to add another server to an existing infrastructure); however upgrades are still not as simple as they could be and were referred to anecdotally as being the most common reason for an incident to be logged with Microsoft Product Support Services (PSS) at the moment.

The WSS/MOSS administration design goals were:

  • Simplicity – easy setup using an existing SQL Server or installing a copy of SQL Server 2005 Express Edition.
  • Extensibility – a single object model so that moving from WSS to MOSS does not break SharePoint applications.
  • Consistency – no more “jumps” from WSS team sites to portal pages.
  • Resource optimisation – the ability to scale out by dedicating servers to specific tasks, e.g.indexing.
  • Delegation – the ability to delegate control over parts of the infrastructure to particular groups of users.

Steve Smith compared the changes between SPS 2003 and MOSS 2007 with the period when another Microsoft product – Exchange Server – reached maturity in the late 1990s; it was not until the release of Exchange Server 5 (which was actually the second product version) that it began to build market presence and by version 5.5 it was the arguably the de facto product for building a corporate messaging platform. Microsoft is hoping (and business interest is indicating) that MOSS 2007 could mark a similar turning point for SharePoint; however it seems likely that many organisations will experience some difficulties as a consequence of poor design decisions made when they originally created their SharePoint superstructure – it’s worth getting specialist advice from the outset.

Notice the term superstructure – that’s not one that was used at the event I attended but I was introduced to the term a few weeks back by my colleague Andy May and it seems appropriate for enterprise-wide applications that sit above the basic server and network infrastructure and provide services for true business applications – examples would be Exchange Server (messaging) and SharePoint (collaboration). Carlo Pescio described the semantic differences between infra- and super-structures in a recent blog post.

Many organisations will experience some difficulties as a consequence of poor design decisions… it’s worth getting specialist advice from the outset.

The need to plan ahead begins with the initial setup where there is a choice between a basic or an advanced installation. Most administrators who intend to try out SharePoint with a view to adapting the topology later as the organisation builds its knowledge and use of the product could be expected to elect a basic installation but unfortunately, a basic installation uses SQL Server 2005 Express Edition as a local database server and cannot be scaled out. The alternative is to select an advanced installation, where there is a choice of complete (which actually allows the selection of services as required), web front end (assumes that complete installations exist elsewhere in a web farm) or standalone (as for the basic installation). In most cases a complete install will be the most appropriate selection; however that does require an existing SQL Server to be in existence (either locally, or on another server). After determining the file location and electing whether or not to join the customer improvement programme), setup copies the binaries to the chosen location before launching a separate wizard to configure services.

Another design item is the concept of a server farm. A SharePoint server farm shares a single configuration database – that means that a fast network (<80ms latency: i.e. not geo-replicated) is required between the SharePoint servers and the database server(s). Microsoft also recommends that one domain controller should be provided for every three front-end SharePoint servers (and that doesn’t include any load on the DCs from other applications).

SharePoint configuration needs to know whether a new farm is to be created or if the server is to join an existing farm. Advanced settings include options as to whether or not the server should host the administration website. Errors at this stage of setup generally relate to permissions with the SQL Server service account, which needs to be a local Administrator. I have to ask if software developers will ever learn to provide a list of rights for delegation in place of saying “make it an administrator” but if Microsoft don’t even follow that approach on their own server operating system then what chance is there for third party application providers?

SharePoint administration is provided through a web interface (over a dedicated port), or from the command line on the server (using the stsadm command). In the case of web administration, there is a three-tier model employed with tasks delineated based on roles, allowing for controlled delegation and secure isolation:

  • Central administration – this is where the IT department is most likely to retain control, for farm-level resource management and status. Aiming to reduce administration time through provision of a single point of administration with a consistent (and extensible) user interface for all SharePoint products, the central administration console provides:
    • Administrative task list – informing operators of tasks for action, including links to the appropriate user interface.
    • Home page topology view – a quick view of the servers in a farm and what is running on each one.
    • Services on a server page – for management of components running on a single server.
    • Flat menu structure – operations and application management with only those options available to the current user displayed.
    • Remote administration – web based administration interface and scheduled system updates.
  • Shared services – this (MOSS-only) level may be managed by whoever is responsible for IT within a business unit; determining the services that team sites can consume. The shared service goal is to separate services from portals and remove scaling limitations around the number of portals. Shared services act as a group, providing a logical and secure partition of the server farm and are required for site and cross-site level Office Server features. Shared services components are the shared service administration website and associated databases, providing:
    • Search.
    • Directory import.
    • User profiles.
    • Audiences.
    • Targetting.
    • Business data cataloguing.
    • Excel caclulation services.
    • Usage reporting.
  • Site settings – management of a site or site collection within an hierarchy, e.g. a portal or a team site. Rights can be delegated on individual sites so a business user could have total (or partial) control over a tiny part of the overall SharePoint superstructure, without impacting on any other sites. It may sound counter-intuitive for an IT administrator to delegate control to business users but that’s often the best approach for administration at the site level.

One major change between SPS and WSS/MOSS is that there is no longer any requirement to create a site in IIS and then tell SharePoint to use the site. With the current SharePoint products, all management is performed though the SharePoint administration tools (with one exception – assigning certificates to SSL-secured sites, which is still done by IIS). SharePoint-aware IIS websites are no longer called virtual servers (server virtualisation has brought an entirely different meaning to that term) but are instead known as web applications.

Shared services are one of the key design elements for MOSS implementation. It is possible to define multiple shared service providers; however each is completely isolated from the other. This may be viewed as a limitation; however it is potentially useful (e.g. in an application service provider scenario, of for providing complete separation of one department’s collaborative web application from the rest of the enterprise for political or organisational reasons). Web applications can be re-associated with another shared service provider (e.g. to consume a new set of services) but they cannot consume services from more than one provider (with the exception of My Sites – through the concept of a trusted My Site location). Content that this “marooned” in another shared service provider needs to be recreated, or migrated using stsadm at the command line. The majority of SharePoint superstructures will use a single shared service provider.

Another key design element is the definition of the hierarchy for the site structure. It is not normally appropriate for an IT department to define a structure bu simply following an organisation chart and some business analysis is required to determine how the business actually functions (cross-group collaboration, etc.

Despite expecting SQL service accounts to be administrators (!), Microsoft also suggests some best practices from a security perspective:

  • Use unique accounts for centralised administration, managing servers in the farm and service accounts – i.e. do not use generic administration accounts!
  • Enable Kerberos – not only is it viewed as more secure but it is faster than NTLM.
  • Enable SSL on sites (set within SharePoint but certificates are assigned within IIS).
  • Consider the management of the SPAdmin service – it requires access to various items within SharePoint but is a service account; therefore consider password resets and the level of access required on individual WSS/MOSS servers (stsadm can be used to globally reset passwords across all application pools as detailed in Microsoft knowledge base article 934838).

In terms of physical architecture, there is a balance to be struck between availability and resilience – the main options (in order of increasing availability and performance) are:

  • Single server – potentially supporting many users but also a single point of failure. Serves content (sites), shared services, administration and all databases.
  • Small server farm (e.g. a single database server and one or two load-balanced SharePoint servers) – better resilience; however still reliant on a single database server.
  • Medium server farm (e.g. clustered SQL servers and SharePoint roles broken out onto multiple servers for front end web access and a middle tier for shared service provision, e.g. indexing). This solution potentially provides the best balance between performance, resilience and cost.
  • Large server farm – many dedicated servers for individual SharePoint roles providing a scalable solution for a global enterprise (but probably overengineering the solution for many organisations).

Due to the network requirements discussed previously, server farms need to be centralised (the user experience for remote users may be improved using hardware accelerators to cache content across the WAN). Other considerations for improving the user experience include not making the front page too “busy” to improve the time it takes to render and provision of additional front-end web servers to render pages quickly and increase throughtput to the back-end shared service and SQL servers. If SharePoint is to become the point of access for all information within a business then it will quickly be viewed as critical and some thought should be given to the location of various shared services. Load balancing across front end servers can be achieved using Windows Server network load balancing (NLB) or a hardware-based load-balancing solution – Steve Smith demonstrated using NLB at last night’s event; however it’s also worth checking out Joel Oleson’s NLB and SharePoint configuration and troubleshooting tips. It’s also worth noting that SharePoint automatially handles load balancing of application roles (where configured – clearly it won’t load balance a role if it only exists on a single server – something to think about when considering placement of the centralised administration role in a small or medium server farm) – a separate load balancing solution is only required for client access to front-end servers.

If it’s proving difficult to justify the cost of additional web servers, then some basic performance analysis can be undertaken using Microsoft’s web application stress tool (linked from Microsoft knowledge base article 231282) which can then be used to demonstrate the point at which user performance is likely to be impacted. Performance can also be improved by caching data (pages, graphics, etc.) on a per-site basis.

One potential method of scaling up rather than out, is to use 64-bit versions of Windows Server 2003, SQL Server 2005 and SharePoint; however it’s worth considering that IFilters (which are used to index non-native file formats) may only be available as 32-bit versions and that may limit the options for 64-bit deployments.

When thinking about other SharePoint roles, it’s worth considering that although individual roles can be started/stopped on SharePoint servers as required, certain roles have additional configuration items to be provided at startup and it’s better to plan the workload accordingly.

With regards to indexing, indexes can become large (10-15% of the size of the content that is being indexed); therefore the default location on the system drive is probably not ideal. Also, only one index server is allowed within the farm; however if a separate query server is created, this will hold a copy of the index (albeit not necessarily the latest version) avoiding the creation of a single point of failure.

To help with management of a SharePoint superstructure, Microsoft Operations Manager (MOM) 2005 management packs exist for both WSS and MOSS; however it’s also worth considering other systems management elements as SharePoint has its own security threats against which to mitigate:

  • A SharePoint-aware anti-virus product is required to interface with the SharePoint object model (e.g. Microsoft Forefront Security for SharePoint).
  • Some additional content filtering (e.g. using ISA Server) may be required to prevent content from circumventing SharePoint’s simple protection which is based upon file-extension and size limits.

ISA Server can potentially be used to bring other benefits to a SharePoint infrastructure, for example, whilst SharePoint does provide for extranet access, it may be appropriate to let ISA Server handle the security and caching elements of the connection and then pass simple (and fast) HTTP requests back to SharePoint. This is particularly convenient in a complex combine intranet and extranet scenario, where the need to access Active Directory for MySites and personalisation can cause issues around forms based authentication.

One point I’ve not mentioned yet is the change of name from Microsoft SharePoint Portal Server to Microsoft Office SharePoint Server 2007. Leaving aside the decision to drop portal from the name, the Microsoft Office part is significant because of the high level of integration between Microsoft Office and the SharePoint products and technologies; however it is worth noting that MOSS 2007 is not reliant on the 2007 Microsoft Office system although use of the latest products will allow the most complete user experience (Microsoft has published a fair, good, better, best white paper for Microsoft Office programs and SharePoint products and technologies).

The key message for me at last night’s presentation was that SharePoint needs to be planned in detail and that some outside help will probably be required. As yet, there is no prescriptive guidance from Microsoft (although this is rumoured to be in production – for details watch the Microsoft SharePoint products and technologies team blog which, somewhat curiously but in common with all the other Microsoft blogs is hosted using Community Server and not SharePoint!) so it’s worth consulting with those who have done it before – either via various Internet resources linked throughout this post or by engaging with one of Microsoft’s authorised solution provider partners (and yes, I do work for one of them so there is a potential conflict of interest there but the views, thoughts and opinions expressed in this blog are purely personal).

One final area of interest for me which was I have not seen covered anywhere is the SharePoint product roadmap. I can’t get anyone at Microsoft to comment on this (not even under NDA) but I understand that WSS3 will ship within Windows Server codenamed Longhorn and there are no new versions planned for the foreseeable future.

Further information

The elements of meaningful XHTML

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

I’m really trying to use good, semantic, XHTML and CSS on this website but sometimes it’s hard work. Even so, the validation tools that I’ve used have helped me to increase my XHTML knowledge and most things can be tweaked – I’m really pleased that this page current validates as both valid XHTML 1.1 and CSS2.

Last night I came across an interesting presentation by Tantek Çelik (of box model hack fame) that dates back to the 2005 South by SouthWest (SxSW) interactive festival and discusses the elements of meaningful XHTML. Even though the slidedeck is no substitute for hearing the original presentation, I think it’s worth a look for a few reasons:

  • It taught me about some XHTML elements that I wasn’t familiar with (e.g. <address>) and others I’m just getting to grips with (e.g. <cite>).
  • It highlighted some techniques which abuse the intended meaning for XHTML elements and how the same result should be achieved using semantically correct XHTML.
  • It introduced me to extending XHTML with microformats for linked licenses, social relationships, people, events, outlines and even presentations (thanks to the links provided by Creative Commons and the XHTML Friends Network, I already use linked licenses and social relationships on this site but now I understand the code a little better).
  • It reinforced that I’m doing the right thing!

Modifying wp-mobile to create content that validates as XHTML-MP

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Yesterday, I wrote a post about using Alex King’s WordPress Mobile Edition plugin (wp-mobile) to generate WordPress content formatted for the mobile web. wp-mobile makes the code generation seamless; however I did have a few issues when I came to validating the output at the ready.mobi site. After a few hours (remember, I’m an infrastructure bod and my coding abilities are best described as weak) I managed to tweak the wp-mobile theme to produce code that validates perfectly.

Screen grab from the ready.mobi report for this website

The changes that I made to the wp-mobile index.php file can be seen at Paul Dixon’s PHP pastebin but are also detailed below:

  1. Add an XHTML Mobile Profile (XHTML-MP) document type descriptor: <!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">. Incidentally, I didn’t include an XML declaration (which looks like: <?xml version="1.0" charset="UTF-8" ?>) as it kept on generating unexpected T_STRING PHP errors and it seems that it is not strictly necessary if the UTF-8 character set is in use:

    “An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.”

    W3C recommendation for XHTML 1.0

  2. Add some caching controls: <?php header ("Cache-Control: max-age=10 "); ?>. 10 seconds is a little on the low side but it can be changed later and it means that the caching is unlikely to affect testing of subsequent changes.
  3. Remove <meta name="HandheldFriendly" value="true" />: this code doesn’t appear to do anything and is not valid XHTML-MP – media="handheld" can be used instead when linking the stylesheet (see below).
  4. Change the stylesheet link method: although <style type="text/css">@import url("<?php print(get_stylesheet_uri()); ?>"); </style> should work, I found that the validator was only completely satisfied with the form <link href="<?php print(get_stylesheet_uri()); ?>" rel="stylesheet" type="text/css" media="handheld" />.
  5. Provide access keys using accesskey="key" inside the <a> tag for each of the main menu items.
  6. Surround <?php ak_recent_posts(10); ?> with <ul> and </ul> tags – this bug took the most time to track down and was the final change necessary to make the markup validate as XHTML-MP.

I also made some minor changes in order to fit my own page design (adding a legal notice, etc.) but in order to get the elusive 100% in the report for this site, there was one minor tweak required to style.css: removal of the height: 1px; rule for <hr>. I understand why it was there but the validator didn’t like it, suggesting that relative units should be used instead (I would argue that 1px is far more logical for a horizontal rule than the use of relative units but this change resulted in another pass on the report).

Right, enough of these mobile diversions – I’d better focus my development efforts on getting the rest of this site to be fully XHTML compliant…

Publishing WordPress content on the mobile web

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

A few nights back, I was reading a .net magazine article about developing websites enabled for mobile content.

As my blog is written primarily for technical people, it seems logical to assume that a reasonable proportion of its readers could make use of access from a mobile device, especially as the magazine article’s author, Brian Fling, believes that:

“[the mobile web] will revolutionize the way we gather and interact with information in the next three years”

Web 2.0 Expo: From Desktop to Device: Designing the Ubiquitous Mobile Experience

Basically, the catalyst for this comes down to a combination of increasing network speeds and mobile services, combined with a falling cost in the provision of data services.

It seems that there are basically two schools of thought when it comes to designing mobile content for the web: some (most notably the W3C) believe that content should be device agnostic; whilst that approach is perfectly laudable (a mobile browser is, after all, just another form of browser) others believe that the whole point of the mobile web is that device-specific functionality can be used to provide services that wouldn’t otherwise be available (e.g. location-based services).

Brian’s .net magazine article explains that there are for major methods of mobile web publishing:

  1. Small screen rendering
  2. Programatically reformatting content
  3. Handheld style-sheets
  4. Mobile-specific site.

As we work down the list, each of these methods is (potentially) more complex, but is also faster. Luckily, for WordPress users like myself, Alex King has written a WordPress Mobile Edition plugin, which applies a different stylesheet for mobile browsers, publishing a mobile friendly site. Using the Opera Mini live demo to simulate a mobile browser, this is what it did for my site:

This website, viewed in a simulated mobile phone browserThe mobile-optimised version of this website, viewed in a simulated mobile phone browser

The first image shows the content as it would be rendered using the default, small screen rendering – not bad but not exactly ideal on a small screen – but the second image is using the WordPress Mobile Edition plugin to display something more suitable for the mobile web. Not only is the display much simpler and easy to navigate on a handset, but the page size has dropped from 28KB to 1KB. Consequently, I was a bit alarmed when I used the ready.mobi site to generate a report for this site, as the site only scored 3 out of 5 and was labelled as “will possibly display poorly on a mobile phone”. Even so, the user experience on my relatively basic (by modern standards) Nokia 6021 was actually quite good (especially when considering that the device is not a smartphone and it failed the handheld media type test) whereas viewing the normal (non-mobile) version generated a “memory full” error.

So, it seems that preparing a WordPress site for the mobile web is actually pretty simple. I have a couple of tweaks to make in order to improve the ready.mobi test results (quick fixes ought to include support for access keys and working out why the page heading is being tagged as <h3> when the standard site uses an <h1> tag) but there is certainly no need for me to develop a separate site for mobile devices, which is just as well as it’s taking me ages to finish the redevelopment of the site (and I can save myself a few quid by not registering the markwilson.mobi domain)!

Links
The following links may be useful to anyone who is looking at developing content for the mobile web:

It may also be worth stopping by at Keni Barwick’s blog on all things mobile.

Coding horror

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

I just stumbled upon Jeff Atwood’s Coding Horror blog and it’s very interesting reading (even for those of us who write very little code). The article that I found was commenting on Jakob Nielsen’s latest tome on web usability. Although Nielsen makes some valid points, the comments are worth a read as they highlight some of the real compromises that website designers and website developers have to make.

I’m sure I could lose many hours reading Jeff’s writing but they all seem well-informed, to the point and interesting… these were just a few of the posts that grabbed my attention this afternoon:

  • When in doubt, make it public looks at how Web 2.0 is really just creating websites out of old Unix commands and that the new business models are really about taking what was once private and making it public!
  • SEOs: the new pornographers of the web looks at how much of the real search engine optimisation is just good web development and that many of the organisations focusing on SEO are all about money and connections – whether or not the assertions that Jeff makes in his post are correct, it’s an interesting view and certainly seems to have a lot of SEOs fighting their corner.
  • Why does Vista use all my memory? looks at how Windows Vista’s approach to memory management (a feature called SuperFetch) and how grabbing all the available memory to use it as a big cache is not necessarily a bad thing.

Has the Leopard lost it’s spots?

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

If you read many Apple or Mac OS X forums, magazines or listen to Mac-related podcasts, soon enough you’ll come across a comment about how Windows Vista was late to market, only just competes with Mac OS X 10.4 (Tiger) and how OS X 10.5 (Leopard) will rewrite history and further boost Apple’s growth. Some of the podcasts I listen to even expected Apple to release Leopard at MacWorld in January and therefore beat Vista to market. Well, Leopard wasn’t ready at MacWorld (why would Apple rush it to market just to “beat Microsoft”, especially as Vista was already available to Microsoft’s business customers at that time?) – all Apple announced at MacWorld 2007 were some products that weren’t ready yet (although the Apple TV and AirPort Extreme have since begun shipping).

For a while now, Apple has said that Leopard will arrive in the spring 2007. Well, spring is here, and there is no sign of Leopard but when exactly does spring end? I could be generous and assume that the second quarter of the year counts as spring and maybe Leopard, iLife, iPhone, a new iSight camera and updates to the Macintosh and iPod product lines will be launched at Apple’s worldwide developer conference in June. Nope. Leopard will be late. Except in the southern hemisphere, where it will be spring in October. Yes, October. Now, I’m no Microsoft apologist (although some of my friends may disagree) but I do feel an element of smugness here as the same Apple fanboys who poured scorn on Windows Vista weep while they have to wait until the autumn (at least) for a new version of OS X.

To Apple: shame on you. I’m not sure whether to be more annoyed that you dropped the ball and let down your existing customer base in order to enter the highly-competitive smartphone market with an unproven product or that you are hiding behind the development of the iPhone in a crude attempt to mask the hypocrisy of criticising Microsoft’s incessant delays on Vista then delaying your own operating system update.

100 million iPod sales is a fantastic achievement, as is the resurgence in Apple’s computer sales but, by introducing uncertainty into the market, delaying releases of Mac-related products and failing to ship a new generation of iPods in order to follow a dream of becoming a consumer electronics giant, Apple risks losing it all. If they don’t get their act together soon then the winners will be Microsoft (PC operating systems), Nokia (phones) and Sony (consumer electronics).

Even before Apple announced that Leopard would not ship until October, there were rumours that all is not well in Cupertino – in TWiT episode 94 it was even suggested that the reason for the delays is not actually a lack of resources but actually because Steve Jobs is personally involved in so many of the decisions at Apple and only has limited time himself. An interesting theory (there were others too that I hold less credence in).

From a personal perspective, I’ve been considering a new Mac purchase and was looking at Leopard to see if it’s worth waiting for – even before this announcement I’d been preparing to blog about Leopard because my conclusion is that it’s probably not worth the wait. It looks to have some nice features but it doesn’t seem to offer much at all that’s ground breaking and I very much doubt that it can live up to Apple’s claims of “advancing the world’s most advanced operating system”. Now, before I get flamed, I’ll set out why I don’t see what the fuss is about, based on the Leopard Sneak Peek on the Apple website:

  • Time Machine. Looks good. Very pretty. Windows has had a backup utility since the mid-1990s (Apple make you pay for theirs) and has had the volume shadow copy service (VSS) for snapshots since Windows XP too – not as pretty as Time Machine but present in the operating system nevertheless.
  • Mail and iCal. The first new Mail features that Apple cites are based around HTML stationary, which either looks nice or tacky (depending on your point of view) but is pretty pointless as any decent mail client will block images in HTML mail for security reasons (at least until the message can be confirmed as safe). iCal’s collaboration functionality sounds good but in my experience the majority of non-geek users struggle to get any further at collaboration than e-mailing documents to one another. As for Notes and To Dos – have you ever heard of Outlook or Entourage? They may not be part of the operating system but let’s face it there aren’t many PCs in the world that don’t have Office on them. Regardless, I’ll concede that Mail and iCal are already better than their Windows equivalents.
  • Anti-phishing improvements in Mail and Safari. Check – already there in Windows, whether you use Internet Explorer or Firefox.
  • iChat. Fair enough – it is a great IM client and the new presentation features are miles ahead what the competition offers but in order to use the iChat audio-visual features with non-iChat contacts there are a lot of hoops to jump through, and getting iChat to talk to certain IM networks is difficult too.
  • Spaces. Something similar has been there on Linux for as long as I’ve used it (which, admittedly, is not very long) and the technology is already available for Mac OS X using VirtueDesktops. It’s a pity that Apple pulled up the rug from under Tony Arnold’s feet rather than making him an offer he couldn’t refuse, although the Leopard implementation does look pretty cool.
  • Dashboard. Nice. Should widgets be on a separate desktop or at the side of the screen? I guess that depends on your preference – personally I prefer the Apple implementation but I already have it in OS X 10.4 – either way, widgets weren’t invented by Apple (or Microsoft). As for users creating their own widgets… hmm… that’s sounds like a way to inject something nasty into my system (at the very least, user-generated widgets are unlikely to be frugal with system resources).
  • Spotlight. I hope it’s better than in Tiger – at the moment the productivity gurus recommend Quicksilver instead.
  • Accessibility. I understand that accessibility is a legal requirement (maybe that is just for websites). Maybe one day we’ll have a computer that can speak without sounding like a computer. Sorry but that new “Alex” voice still sounds very synthetic.
  • 64-bit. We’ve had 64-bit support for Windows since XP (albeit with limited driver support) and it’s been around in Linux for a while too; however the main advantage of 64-bit processing is access to more memory and unless we get some more Macs that support more than 2-3GB (at the time of writing, only the Mac Pro can use more than 3GB), what’s the big deal?
  • Core Animation. I’m not a developer but I understand all the core-* technologies are a method of exposing functionality to developers in a way that encourages simple application development. Is that like the Microsoft.NET framework or Java then?

Now I’m not saying that Windows is better than Mac OS X. That would be a purely subjective view; what I will say is that, even though I still use computers running Windows and Linux, my personal preference is to use my Mac as much as possible (probably just because it’s the computer with the large display, two processing cores and 2GB of memory, rather than any operating system considerations). Even so, I guess it means that I’m still a switcher – as is Kevin Ridgway, who thinks that people who prefer Windows are dumb. I just think this whole “my operating system is better than yours” nonsense is pointless and am disappointed that Apple has sunk to that level in their (admittedly rather funny) advertising. As for Kevin’s assertion that the latest version of OS X will be “an even more enticing reason to make the switch”, I just can’t see it.

Incidentally, for those who favour a third way (i.e. not Microsoft or Apple), a new version of the popular Ubuntu Linux distribution was released today…

Removing duplicate search engine content using robots.txt

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Here’s something that no webmaster wants to see:

Screenshot showing that Google cannot access the homepage due to a robots.txt restriction

It’s part of a screenshot from the Google Webmaster Tools that says “[Google] can’t current access your home page because of a robots.txt restriction”. Arghh!

This came about because, a couple of nights back, I made some changes to the website in order to remove the duplicate content in Google. Google (and other search engines) don’t like duplicate content, so by removing the archive pages, categories, feeds, etc. from their indexes, I ought to be able to reduce the overall number of pages from this site that are listed and at the same time increase the quality of the results (and hopefully my position in the index). Ideally, I can direct the major search engines to only index the home page and individual item pages.

I based my changes on some information on the web that caused me a few issues – so this is what I did and by following these notes, hopefully others won’t repeat my mistakes; however, there is a caveat – use this advice with care – I’m not responsible for other people’s sites dropping out of the Google index (or other such catastrophes).

Firstly, I made some changes to the section in my WordPress template:







Because WordPress content is generated dynamically, this tells the search engines which pages should be in, and which should be out, based on the type of page. So, basically, if this is an post page, another single page, or the home page then go for it; otherwise follow the appropriate rule for Google, MSN or other spiders (Yahoo! and Ask will follow the standard robots directive) telling them not to index or archive the page but to follow any links and additionally, for Google not to include any open directory information. This was based on advice from askapache.com but amended because the default indexing behaviour for spiders is to index, follow or all so I didn’t need to specify specific rules for Google and MSN as in the original example (but did need something there otherwise the logic reads “if condition is met donothing else dosomething” and the donothing could be problematic) .

Next, following fiLi’s advice for using robots.txt to avoid content duplication, I started to edit my robots.txt file. I won’t list the file contents here – suffice to say that the final result is visible on my web server and for those who think that publishing the location of robots.txt is a bad idea (because the contents are effectively a list of places that I don’t want people to go to), then think of it this way: robots.txt is a standard file on many web servers, which by necessity needs to be readable and therefore should not be used for security purposes – that’s what file permissions are for (one useful analogy refers to robots.txt as a “no entry” sign – not a locked door)!

The main changes that I made were to block certain folders:

Disallow: /blog/page
Disallow: /blog/tags
Disallow: /blog/wp-admin
Disallow: /blog/wp-content
Disallow: /blog/wp-includes
Disallow: /*/feed
Disallow: /*/trackback

(the trailing slash is significant – if it is missing then the directory itself is blocked, but if it is present then only the files within the directory are affected, including subdirectories).

I also blocked certain file extensions:

Disallow: /*.css$
Disallow: /*.html$
Disallow: /*.js$
Disallow: /*.ico$
Disallow: /*.opml$
Disallow: /*.php$
Disallow: /*.shtml$
Disallow: /*.xml$

Then, I blocked URLs that include ? except those that end with ?:

Allow: /*?$
Disallow: /*?

The problem at the head of this post came about because I blocked all .php files using

Disallow: /*.php$

As http://www.markwilson.co.uk/blog/ is equivalent to http://www.markwilson.co.uk/blog/index.php then I was effectively stopping spiders from accessing the home page. I’m not sure how to get around that as both URLs are serving the same content, but in a site of about 1500 URLs at the time of writing, I’m not particularly worried about a single duplicate instance (although I would like to know how to work around the issue). I resolved this by explicitly allowing access to index.php (and another important file – sitemaps.xml) using:

Allow: /blog/index.php
Allow: /sitemap.xml

It’s also worth noting that neither wildcards (*, ?) nor allow are valid robots.txt directives and so the file will fail validation. After a bit of research I found that the major search engines have each added support for their own enhancements to the robots.txt specification:

  • Google (Googlebot), Yahoo! (Slurp) and Ask (Teoma) support allow directives.
  • Googlebot, MSNbot and Slurp support wildcards.
  • Teoma, MSNbot and Slurp support crawl delays.

For that reason, I created multiple code blocks – one for each of the major search engines and a catch-all for other spiders, so the basic structure is:

# Google
User-agent: Googlebot
# Add directives below here

# MSN
User-agent: msnbot
# Add directives below here

# Yahoo!
User-agent: Slurp
# Add directives below here

# Ask
User-agent: Teoma
# Add directives below here

# Catch-all for other agents
User-agent: *
# Add directives below here

Just for good measure, I added a couple more directives for the Alexa archiver (do not archive the site) and Google AdSense (read everything to determine what my site is about and work out which ads to serve).

# Alexa archiver
User-agent: ia_archiver
Disallow: /

# Google AdSense
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Finally, I discovered that Google, Yahoo!, Ask and Microsoft now all support sitemap autodiscovery via robots.txt:

Sitemap: http://www.markwilson.co.uk/sitemap.xml

This can be placed anywhere in the file, although Microsoft don’t actually do anything with it yet!

Having learned from my initial experiences of locking Googlebot out of the site, I checked the file using the Google robots.txt analysis tool and found that Googlebot was ignoring the directives under User-agent: * (no matter whether that section was first or last in the file). Thankfully, posts to the help groups for crawling, indexing and ranking and Google webmaster tools indicated that Googlebot will ignore generic settings if there is a specific section for User-agent: Googlebot. The workaround is to include all of the generic exclusions in each of the agent-specific sections – not exactly elegant but workable.

I have to wait now for Google to re-read my robots.txt file, after which it will be able to access the updated sitemap.xml file which reflects the exclusions. Shortly afterwards, I should start to see the relevance of the site:www.markwilson.co.uk results improve and hopefully soon after that my PageRank will reach the elusive 6.

Links

Google webmaster help center.
Yahoo! search resources for webmasters (Yahoo! Slurp).
About Ask.com: Webmasters.
Windows Live Search site owner help: guidelines for succesful indexing and controlling which pages are indexed.
The web robots pages.

Where are the WVP2 codecs for QuickTime on a Mac?

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

It’s generally accepted that Macs are great computers for graphic design and audio-visual work – so why is it so hard to play Windows Media content on a Mac? I know that QuickTime is the centre of Apple’s audio-visual experience – so why should Apple support competing formats – but perhaps I should really ask why the various software companies have seen fit to introduce such a myriad of audio and video codecs? I’m a techie and I can only just keep up – think about the poor consumer who just wants to share some family videos with the grandparents!

The trouble is that Microsoft, as the developer of the most widely installed operating system on the planet (with a correspondingly huge number of multimedia file formats as described in Microsoft knowledge base article 316922), has seen fit to dump development of Windows Media products for other platforms. Quoting part of the Wikipedia article on Windows Media Player:

Version 9 was the final version of Windows Media Player to be released for Mac OS X before development was cancelled by Microsoft. WMP for Mac OS X received widespread criticism from Mac users due to poor performance and features. Developed by the Windows Media team at Microsoft instead of the Macintosh Business Unit and released in 2003, on release the application lacked many basic features that were found in other media players such as Apple’s iTunes and QuickTime Player. It also lacked support for many media formats that version 9 of the Windows counterpart supported on release 10 months earlier.

The Mac version supported only Windows Media encoded media (up to version 9) enclosed in the ASF format, lacking support for all other formats such as MP4, MPEG, and Microsoft’s own AVI format. On the user interface front, it did not prevent screensavers from running during playback, it did not support file drag-and-drop, nor did it support playlists. While Windows Media Player 9 had added support for some files that use the WMV9 codec (also known as the WMV3 codec), in other aspects it was seen as having degraded in features from previous versions.

On January 12, 2006 Microsoft announced it had ceased development of Windows Media Player for Mac.[4] Microsoft now distributes a third-party plugin called WMV Player (produced and maintained by Flip4Mac) which allows some forms of Windows Media to be played within Apple’s QuickTime player and other QuickTime-aware applications.[5] Mac users can also use the free software media player VLC, which is also able to play WMV-3 / WMV-9 / VC-1 Windows Media files.

It seems that the Flip4Mac WMV Player, which should provide the missing Windows Media support for Mac users (as endorsed by Microsoft) does not support all Windows Media codecs, namely it refuses to play content encoded with the Windows Media Video 9 Image v2 (WVP2) codec.

I can understand Microsoft’s position – after all they want to preserve their market share – so why doesn’t Apple make it easier for switchers with legacy video content? As the iLife applications are such a selling point for Apple, why not make it easier to convert from the Windows equivalents?

My problem is that, for the last few years, I’ve been creating home video content using Windows Movie Maker and Photo Story. They may not be the best video applications in the world but they are fine for movies of holidays and the kids and are included with Windows XP (well, Movie Maker is – Photo Story is a free add on). Nowadays, I have a Mac but I still want to play my old content.  The resulting WMV content from Movie Maker hasn’t caused too many problems as it uses the Windows Media Audio 9.1 and Windows Media Video 9 (WMV3) codecs and simply needs appropriate QuickTime components to be installed. Unfortunately the Photo Story output refuses to play the (WVP2) video track in either QuickTime (WMV Player) or Windows Media Player for Mac OS X and as far as I can tell there are no suitable codecs available.

In desperation, I went back to PhotoStory and tried to export in another format but there is no such option (it supports various screen sizes and frame rates but they all seem to be using the same codec).

One macKB thread suggests using Dr Div X to convert the file but the latest version of Dr DivX failed (on both Windows and Mac); similarly the DivX Converter didn’t work for me.

Eventually, I found a utility that could convert the file for me (Advanced X Video Converter) – it’s done a good job although whilst the quality is acceptable for my home movies there are some visible compression artifacts (I used the H264 video and 24bit audio codecs to convert to a .MOV file). In fairness, the compression artifacts may also be visible in the original WMV file and anyway they are hardly surprisingly as the video was created from compressed JPEG and MP3 files, which have then been compressed to WMV and once more to MOV so the quality is certain to have suffered along the way. What’s possibly of greater concern is the resulting increase in file size – up from 19.5MB to 431.4MB.

I’m glad I got there in the end – for a while it seemed that I would have to keep a Windows virtual machine just to play old home movies – and there I was, naively believing that converting to digital capture and storage would save me from issues with legacy formats.

The extent of automotive IT

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

I know that modern cars contain several computers (obviously there is engine management, then there is access control, in-car entertainment, possibly satellite navigation, etc.) but I’ve just returned from the local Saab dealership after having to have the electric windows reprogrammed! Next month the car is going back to have some bits of trim replaced, a new key fob and a software upgrade!

Whatever next, Microsoft Windows Vista Sportswagon Edition? Or perhaps a new version of Ubuntu codenamed Speeding Saab (which turns the Warty Warthog, Hoary Hedgehog, Breezy Badger, Dapper Drake, Edgy Eft and Fiesty Fawn into roadkill)?

(Sorry – I think last weekend’s sunshine has made me delirious.)

Missing QuickTime codecs

This content is 17 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Earlier this week, I needed to play back a .AVI file in iTunes/Front Row. That’s not really a problem as it’s easy to convert the video to a .MOV file using Apple QuickTime Pro but one major issue was a complete lack of sound.

Now, before I go any further I should explain that there is one common theme throughout the comments section of every site discussing media formats and players – someone always says something to the effect of “use VLC – it plays everything”. VLC is a great media player but:

  • I have Apple QuickTime Pro.
  • I use Apple iTunes and Front Row (both of which depend on QuickTime).
  • QuickTime components are available for many audio and video formats.

In other words, using VLC isn’t the right solution for me. QuickTime gave me a clue as to the problem as it informed me that:

Some necessary QuickTime software is missing. It may be available on the QuickTime Web site.
If you have a dialup connection to the internet, make sure it is active, then click the Continue button to check for the software.

I could have worked out for myself that I was missing a codec (and that message is pretty poorly written… should I not click continue if I don’t have a dialup connection? Maybe I’m reading the message too literally!) but clicking continue took me to the QuickTime components page and I didn’t know which one I needed. I was pretty sure that the video was an XviD movie and I already had the DivX codec (v6.4) as well as Christoph Nägeli’s XviD codec (v0.51) installed but then I found a big clue in the XviD FAQ:

It’s important to understand that video and audio are two separate things, which when combined make up movies. A movie consists of a video stream for the picture and an audio stream for the sound. The XviD codec is what makes it possible to decode the video stream, but it has nothing to do with decoding the audio stream. If the sound in a movie isn’t working you have to find out which audio codec is missing and install it.

The FAQ continues to explain how to use a Windows utility called GSpot to identify the necessary codecs but after reading Mike Peck’s article about playing XviD movies on an Intel Mac (and Paul Stamatiou’s follow-up post on getting Front Row to play XviD, DivX and 3ivX videos), I realised that the missing codec was for AC-3 (Dolby Digital). After installing the A52Codec (v1.7.2) for AC-3 playback and restarting QuickTime and iTunes I was able to watch my video, complete with the previously-missing audio stream.

I’m sure that over time I’ll need to add more codecs and one potentially useful resource is afreeCodec , offering downloads for Windows, Linux and Macintosh computers, games consoles and mobile phones.