| Overview | [Contents] |
Sitemap 0.84 is offered under the terms of the Attribution-ShareAlike Creative Commons License.
| XML Sitemap Format | [Contents] |
The XML Sitemap Format allows you to provide a list of URLs and include additional information about those URLs in your Sitemap. This additional information includes the date the content at that URL last changed, how often that content can be expected to change and how important that URL is relative to other URLs on your site.
The XML Sitemap Format uses the following XML tags:
- changefreq — how frequently the content at the URL is likely to change
- lastmod — the time the content at the URL was last modified
- loc — the URL location
- priority — the priority of the page relative to other pages on the same site
- url — this tag encapsulates the first four tags in this list
- urlset — this tag encapsulates the first five tags in this list
Note: All data values, including URLs, in your Sitemap files must be XML-encoded. The chart below provides a list of characters with their corresponding encoded values. You can use either the entity or the character code to XML encode a character. Please see the FAQ for more information about XML encoding.
| Character | Escaped Forms | ||
|---|---|---|---|
| Entity | Character Code | ||
| Ampersand | & | & | & |
| Single Quote | ' | ' | ' |
| Double Quote | " | " | " |
| Greater Than | > | > | > |
| Less Than | < | < | < |
The following example shows a Sitemap in XML format. The Sitemap in the example contains a small number of URLs, each of which is identified using the loc XML tag. In this example, a different set of optional parameters has been provided for each URL.
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://www.yoursite.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.yoursite.com/catalog?item=12&desc=vacation_hawaii</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.yoursite.com/catalog?item=73&desc=vacation_new_zealand</loc>
<lastmod>2004-12-23</lastmod>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.yoursite.com/catalog?item=74&desc=vacation_newfoundland</loc>
<lastmod>2004-12-23T18:00:15+00:00</lastmod>
<priority>0.3</priority>
</url>
<url>
<loc>http://www.yoursite.com/catalog?item=83&desc=vacation_usa</loc>
<lastmod>2004-11-23</lastmod>
</url>
</urlset>
You can compress your Sitemap files using gzip. Compressing your Sitemap files will reduce your bandwidth requirement. Please note that your uncompressed Sitemap file may not be larger than 10MB.
Note: Your Sitemap files must use UTF-8 encoding.
This section provides details about the XML tags that can appear in your Sitemap(s). In the "Subtags" section of some of the XML tag definitions, a question mark ("?") appearing after the name of an XML tag indicates that the tag is optional.
| changefreq | |
|
Definition |
Optional. This value indicates how frequently the content at a particular URL is likely to change. The value must be either "always", "hourly", "daily", "weekly", "monthly", "yearly" or "never". The value "always" should be used to describe documents that change each time they are accessed. The value "never" should be used to describe archived URLs. Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers consider this information when making decisions, they may crawl pages marked "hourly" less frequently than that, and they may crawl pages marked "yearly" more frequently than that. It is also likely that crawlers will periodically crawl pages marked "never" so that they can handle unexpected changes to those pages. |
|
Constraints |
Enumerated list. Valid values are "always", "hourly", "daily", "weekly", "monthly", "yearly" and "never". |
|
Example |
<changefreq>monthly</changefreq> |
|
Subtag of |
url |
|
Content Format |
Text |
| lastmod | |
|
Definition |
Optional.
The time the URL was last modified. You should specify the timestamp
using ISO 8601;
for example, |
|
Constraints |
Value must be in ISO 8601 format. |
|
Example |
<lastmod>2005-02-21</lastmod> or <lastmod>2005-02-21T18:00:15+00:00</lastmod> |
|
Subtag of |
url |
|
Content Format |
Text |
| loc | |
|
Definition |
Required. A URL for a page on your site. |
|
Constraints |
Value must be <= 2048 characters. |
|
Example |
<loc>http://www.yoursite.com/catalog?item=1&desc=vacation_hawaii</loc> |
|
Subtag of |
url |
|
Content Format |
Text |
| priority | |
|
Definition |
Optional. The priority of a particular URL relative to other pages on the same site. The value for this tag is a number between 0.0 and 1.0, where 0.0 identifies the lowest priority page(s) on your site and 1.0 identifies the highest priority page(s) on your site. The default priority of a page is 0.5. Please note that the priority you assign to a page has no influence on the position of your URLs in a search engine's result pages. Search engines use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your more important pages are present in a search index. Also, please note that assigning a high priority to all of the URLs on your site will not help you. Since the priority is relative, it is only used to select between URLs on your site; the priority of your pages will not be compared to the priority of pages on other sites. |
|
Constraints |
Value must be between 0.0 and 1.0 inclusive. |
|
Example |
<priority>0.7</priority> |
|
Subtag of |
url |
|
Content Format |
Text |
| url | |
|
Definition |
Encapsulates information about a particular URL. |
|
Subtags |
changefreq?, lastmod?, loc, priority? |
|
Subtag of |
urlset |
|
Content Format |
Empty |
| urlset | |
|
Definition |
Encapsulates information about all of the URLs in a Sitemap file. |
|
Subtags |
url |
|
Content Format |
Empty |
| Providing Multiple Sitemap Files | [Contents] |
Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.mysite.com or http://yourhost.yoursite.com.
The following example shows a Sitemap index in XML format. The Sitemap index lists two Sitemaps:
<sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84">
<sitemap>
<loc>http://www.mysite.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.mysite.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
Note: Sitemap URLs, like all values in your XML files, must be XML-encoded.
Sitemap Index XML Tag Definitions
-
The loc tag is required and identifies the location of the Sitemap.
-
The lastmod tag is an optional tag that identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in ISO 8601 format.
By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index — i.e. a crawler could only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites.
-
The sitemap tag encapsulates information about an individual Sitemap.
-
The sitemapindex tag encapsulates information about all of the Sitemaps in the file.
| Location of Sitemap Files | [Contents] |
The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://yoursite.com/catalog/sitemap.gz can include any URLs starting with http://yoursite.com/catalog/ but can not include URLs starting with http://yoursite.com/images/.
If you have the permission to change "http://site.org/path/sitemap.gz", it is safe to assume that you also have permission to provide information for URLs with the prefix "http://site.org/path/". Examples of URLs considered valid in http://yoursite.com/catalog/sitemap.gz include:
http://yoursite.com/catalog/show?item=23
http://yoursite.com/catalog/show?item=233&user=3453
URLs not considered valid in http://yoursite.com/catalog/sitemap.gz include:
http://yoursite.com/image/show?item=23
http://yoursite.com/image/show?item=233&user=3453
http://mysite.com/catalog/show?item=24
URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your HTTP Web server is at yoursite.com, then your Sitemap index file would be at "http://yoursite.com/sitemap.gz". In certain cases, you may need to produce different Sitemaps for different paths — e.g. if security permissions in your organization compartmentalize write access to different directories.
| Frequently Asked Questions | [Contents] |
Does it matter which character encoding method I use to generate my Sitemap files?
How do I specify time?
How do I compute lastmod date?
Where do I place my Sitemap?
How big can my Sitemap be?
My site has tens of millions of URLs; can I somehow submit only those that have changed recently?
What happens after I produce my Sitemap?
Do URLs in the Sitemap need to be completely specified?
My site has both "http" and "https" version of URLs. Do I need to list both?
URLs on my site have session IDs in them. Do I need to remove them?
Does position of a URL in a Sitemap influence its use?
Some of the pages on our site use frames. Should we include the frameset URLs or the URLs of the frame contents?
Can I zip my Sitemaps or do they have to be gzipped?
Will the "priority" hint in the XML Sitemap change the ranking of my pages in search results?
Is there an XML schema that I can validate my XML Sitemap against?
To properly encode your URLs, follow the procedure recommended by the HTML 4.0 specification, section B.2.1. Convert the string to UTF-8 and then URL-escape the result. For details about Internationalized Resource Identifiers, also see RFC2396 (sections 2.3 and 2.4) and RFC3987.
The following is an example python script for XML encoding a URL:
$ python
Python 2.2.2 (#1, Feb 24 2003, 19:13:11)
>>> import xml.sax.saxutils
>>> xml.sax.saxutils.escape("http://www.test.org/view?widget=3&count>2")
The encoded URL from the example above is:
http://www.test.org/view?widget=3&count>2
Q: Does it matter which character encoding method I use to generate my Sitemap files?
Yes. Your Sitemap files must use UTF-8 encoding.
Use ISO 8601 encoding for the lastmod timestamps and all other dates and times in this protocol. For example, 2004-09-22T14:12:14+00:00.
If you wish, you can omit the time portion of the ISO8601 format; for example, 2004-09-22 is also valid. However, if your site changes frequently, you are encouraged to include the time portion so crawlers have more complete information about your site.
Q: How do I compute lastmod date?
For static files, this is the actual file update date. You can use the UNIX date command to get this date:
>> 2004-10-26T08:56:39+00:00
For many dynamic URLs, you may be able to easily compute a lastmod date based on when the underlying data was changed or by using some approximation based on periodic updates (if applicable). Using even an approximate date or timestamp can help crawlers avoid crawling URLs that have not changed. This will reduce the bandwidth and CPU requirements for your Web servers.
Q: Where do I place my Sitemap?
It is strongly recommended that you place your Sitemap at the root directory of your HTML server; that is, place it at http://yoursite.com/sitemap.gz.
In some situations, you may want to produce different Sitemaps for different paths on your site — e.g. if security permissions in your organization compartmentalize write access to different directories.
If you have the permission to change http://site.org/path/sitemap, then it is generally safe to assume that you also have permission to report metadata under http://site.org/path/.
Search engines will not process Sitemaps larger than 10MB (10,485,760 bytes) in length when uncompressed or that contain more than 50,000 URLs. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a filesize of 10MB.
Q: My site has tens of millions of URLs; can I somehow submit only those that have changed recently?
You can list the updated URLs in a small number of Sitemaps that change frequently and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines will then incrementally crawl only the changed Sitemaps.
Q: What happens after I produce my Sitemap?
After you produce your Sitemap, you will need to notify search engines of the Sitemap's location. The search engines that you notify will then retrieve your Sitemap and make the URLs available to their crawlers.
Q: Do URLs in the Sitemap need to be completely specified?
Yes. Search engines will crawl the URLs exactly as you provide them. (Search engines will XML decode your URLs if they are XML-encoded.) You do need to include the protocol — e.g. http — in your URL; you also need to include a trailing slash in your URL if your Web server requires one. For example, http://www.google.com/ is a valid URL for a Sitemap, whereas www.google.com is not.
Q: My site has both "http" and "https" version of URLs. Do I need to list both?
No. Please list only one version of a URL in your Sitemaps. Including multiple versions of URLs may result in incomplete crawling of your site.
Q: URLs on my site have session IDs in them. Do I need to remove them?
Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site.
Q: Does position of a URL in a Sitemap influence its use?
No. The position of a URL in the Sitemap has no impact on how it is used or regarded by search engines.
Q: Some of the pages on our site use frames. Should we include the frameset URLs or the URLs of the frame contents?
Please include both URLs.
Q: Can I zip my Sitemaps or do they have to be gzipped?
Please use gzip to compress your Sitemaps.
Q: Will the "priority" hint in the XML Sitemap change the ranking of my pages in search results?
No. The "priority" hint in your Sitemap only indicates the importance of a particular URL relative to other URLs on your own site.
Q: Is there an XML schema that I can validate my XML Sitemap against?
An XML schema is available for Sitemap files at http://www.google.com/schemas/sitemap/0.84/sitemap.xsd, and a schema for Sitemap index files is available at http://www.google.com/schemas/sitemap/0.84/siteindex.xsd.

