Skip to content

add an autogenerated sitemap #14295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

adrinjalali
Copy link
Member

Fixes #13518.

Uses sphinx-sitemap to generate a sitemap from the stable version only.

@thomasjpfan
Copy link
Member

Here is the sitemap generated by this PR: https://63867-843222-gh.circle-artifacts.com/0/doc/sitemap.xml

@jnothman
Copy link
Member

jnothman commented Jul 8, 2019 via email

@adrinjalali
Copy link
Member Author

The sitemap is not going to prevent google from indexing the older pages or the dev version, since there are links to them on the web anyway. However, it'll tell google that this is what we intend to have indexed, and it influences the algorithm [link]:

Using a sitemap doesn't guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you'll never be penalized for having one.

Also, the sphinx-sitemap plugin supports different versions, which also means we can put all the versions in different sitemap files and let it tell google about all of them, but I think this is a good start.

@adrinjalali
Copy link
Member Author

Not sure how to proceed from here.

@rth
Copy link
Member

rth commented Aug 7, 2019

We could add a sitemap, but I don't think it would directly help with the parent issue. https://webmasters.stackexchange.com/questions/99867/how-to-correctly-mark-up-different-versions-of-the-same-document-which-are-non-c hints that it's a non trivial problem.

@rth
Copy link
Member

rth commented Aug 7, 2019

Generally scikit-learn pages are strongly interlinked and search engine should be able to reconstruct the sitemap from links. I would rather rely on sphinx producing search engine friendly structure, than do part of its work with a sitemap.

As to the older scikit-learn versions maybe rather a banner on the top indicating that a newer version exists? That's what RDT is doing (I think) or at least some other documentations sites that use sphinx.

@adrinjalali
Copy link
Member Author

The issue is that the search engines find the pages too well, as a result list the old versions. Ideally the search engine would deprioritize the old versions, which is what the purpose of this PR is. I don't think this would make search engines not index the old pages, but it may make them not list them too high up in the list.

@adrinjalali
Copy link
Member Author

I also think we don't have too much of an issue with making sure we let search engines see the old versions, since we don't really actively support them anyway.

@thomasjpfan
Copy link
Member

Maybe generate a robots.txt to discourage the crawler from going to older versions?

@adrinjalali
Copy link
Member Author

I think that would be more drastic. We probably still want people to find things which are only available on the older versions, don't we? (I'm really not sure myself TBH)

@thomasjpfan
Copy link
Member

Hmm how many people google "DecisionTreeClassifier 0.18" D:

We can restrict the crawler to the last few versions?

@adrinjalali
Copy link
Member Author

adrinjalali commented Aug 7, 2019 via email

@jnothman
Copy link
Member

jnothman commented Aug 8, 2019

@rth wrote:

As to the older scikit-learn versions maybe rather a banner on the top indicating that a newer version exists? That's what RDT is doing (I think) or at least some other documentations sites that use sphinx.

See scikit-learn/scikit-learn.github.io#13

@adrinjalali
Copy link
Member Author

This doesn't seem to have a priority, closing.

@adrinjalali adrinjalali closed this Sep 8, 2019
@adrinjalali adrinjalali deleted the mnt/sitemap branch September 8, 2019 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google shows 0.16 doc when searching GridSearchCV
4 participants