• 1 Installation
  • 2 Quick Start
  • 3 Introduction
    • 3.1 What is WP Content Crawler
    • 3.2 What is a CSS selector
    • 3.3 How to open developer tools
    • 3.4 How to find CSS selectors for an element
  • 4 Sites
    • 4.1 Category settings
      • 4.1.1 Automatically adding category URLs
      • 4.1.2 Finding post URLs
      • 4.1.3 Removing unnecessary elements
      • 4.1.4 Saving featured images
      • 4.1.5 Finding next page URL of a category
      • 4.1.6 Finding and replacing in the HTML at first load
    • 4.2 Post settings
      • 4.2.1 Getting list-type posts
      • 4.2.2 Getting paginated posts
      • 4.2.3 Custom meta selectors
      • 4.2.4 Other settings
    • 4.3 Template settings
      • 4.3.1 Main post template
      • 4.3.2 List item template
      • 4.3.3 Find-and-replaces
    • 4.4 Using customized general settings for a site
    • 4.5 Importing/exporting settings
  • 5 Tester
  • 6 Tools
    • 6.1 Manually crawling and saving a post
    • 6.2 Deleting URLs
  • 7 General settings
    • 7.1 Post settings
    • 7.2 SEO settings
    • 7.3 Scheduling settings
    • 7.4 Advanced settings
  • 8 Understanding find-and-replaces
  • 9 Lifecycle of events
    • 9.1 Category page
    • 9.2 Post page
  • 10 FAQ
    • 10.1 What can it be used for?
 
  • 1 Installation

    It is very easy to install WP Content Crawler by following the steps below.

    1. Go to your WordPress admin panel and, using the admin menu on the left of the page, select Plugins > Add New.
      ss-en-1-min

    2. On “Add New Plugin” page, click Upload Plugin button.
      ss-en-2-min

    3. Next, click Choose File and find the zip file of the plugin which you downloaded.
      ss-en-3-min

    4. Then, click Install Now and wait a few moments.
      ss-en-4-min

    5. Now, when you see that the plugin is installed, click Active Plugin link.
      ss-en-5-min

    6. Then, go to Settings > Content Crawler License Settings from your admin menu.
      ss-en-6-min

    7. Finally, write your license key to the input area and click Save Changes button.
      ss-en-7-min

    8. Now you can reach all plugin features under Content Crawler menu in admin menu.
      ss-en-8-min
    9. Congratulations! You have completed the setup. You can now start crawling almost any site you want. Please read the following sections of this documentation to learn how to use WP Content Crawler, or watch our introduction tutorial. Thank you for using WP Content Crawler!
  • 2 Quick Start

    Check out our introduction tutorial for a quick start. Watching this tutorial, you can learn how to open and use developer tools, how to use CSS selectors, how to create a site in WP Content Crawler and do its settings, and how to activate automated post and category crawling.

  • 3 Introduction
    • 3.1 What is WP Content Crawler

      WordPress Content Crawler is a WordPress plugin that can get content from almost any site to your WordPress blog. It works with CSS selectors.

    • 3.2 What is a CSS selector

      A CSS selector is a text which is used to find an element in HTML of a web page. CSS selectors use attributes of HTML elements such as class and id. To be able to write a CSS selector for an HTML element, you should first see the source code of the web page. You can use developer tools, which are available in your web browser, to see the source of the page and find specific elements.

    • 3.3 How to open developer tools

      In Chrome:
      View > Developer > Developer Tools

      In Safari:
      First, go to Safari > Preferences, open Advanced tab and select the checkbox next to Show Develop menu in menu bar option.
      Next, go to Develop in menu bar, and select Show Web Inspector

      In Firefox:
      Tools > Web Developer > Inspector

      In Opera:
      First, select View > Show Developer Menu.
      Next, go to Developer > Developer Tools 

      In Internet Explorer:
      Press F12 or click Tools > Developer Tools

    • 3.4 How to find CSS selectors for an element

      To find a CSS selector for an HTML element, you should look at the attributes of the element, as well as the attributes of elements that contain the element. Let’s see some examples.

      <a class="web-page-link" href="https://wordpress.org">WordPress</a>

      Here are valid CSS selectors for above element:

      a.web-page-link
      .web-page-link
      
      <div id="content">Content…</div>

      Here are valid CSS selectors for above element:

      div#content
      #content
      <div class="pagination"><a class="next-page" href="https://wordpress.org/news/page/2">Next Page</a></div>

      Here are a few valid CSS selectors for above element:

      .pagination .next-page
      div.pagination a.next-page
      .pagination a.next-page
      div.pagination a
      .pagination > .next-page
      div.pagination > a.next-page
      .pagination > a

      You can learn more about CSS selectors here: http://www.w3schools.com/cssref/css_selectors.asp

  • 4 Sites

    Sites are special kind of posts which include settings for a specific web site. You can create a site for each website from which you want to take contents. You can create a site by visiting Content Crawler > Add New from the menu in the admin page of your WordPress blog.

    • 4.1 Category settings

      site-en-category-min

      Category settings are required for programmatically collecting post URLs and featured images from the target site.

      • 4.1.1 Automatically adding category URLs

        You can either add category URLs of target web site manually, or use CSS selectors to automatically add them. To be able to add them automatically, you should go to the target web site’s source code and find HTML elements that contain category URLs. Next, after filling the category page URL, you should write CSS selectors for the elements you have just found. Finally, click “plus” button next to the CSS selector input. You can see that the links are added to category map. After that, you can select a category from your blog for each category link, so that each post taken from each category is placed in a category you want.

      • 4.1.2 Finding post URLs

        If you want to programmatically check each category for new posts, you should write a CSS selector (or selectors) for post URLs in category pages. First, fill test category URL field by writing a full URL for a category from the target web site. Next, go to the category whose link you have just copied, and find CSS selector (or selectors) for hyperlink (< a >) elements that contain the post URLs. Then, write the CSS selector to an input field for category post URL selectors. You can use the test button next to each input to check if the CSS selector works properly.

      • 4.1.3 Removing unnecessary elements

        You can remove certain elements from the target category page by using unnecessary element selectors setting. If you write selectors for this setting, target category page will be cleared from the elements you specified by their CSS selectors. To use this option properly, please go and check the lifecycle of events.

      • 4.1.4 Saving featured images

        You can save a featured image for each post from the target category page. In order to do this, go to the source code of the target category page and find a CSS selector (or selectors) for featured image URLs. You should find image (< img >) elements. After you find the selector(s) and check save featured images option, you can add a new selector for featured image selectors option. You can check if the selector works properly by using the test button next to the input field for the selector. Finally, you should check if the post URLs come before the featured image elements in the source code of page. You should consider the start position of the elements when you do this. If the starting position of the post link comes before the image, then select the checkbox for “Post links come before featured images?” option.

        You can also modify the image URLs found by the CSS selectors you provide. To do this, first, fill the test image URL option with the full URL of the image. Next, go to the find-and-replace setting and write what to find and with what to replace it. For more information about find-and-replaces, please check “understanding ‘find and replace’” section.

      • 4.1.5 Finding next page URL of a category

        In order for WP Content Crawler to check all of the pages of the target category, you should provide a CSS selector for the next page URL. To do this, go to the target category page’s source code and find a CSS selector(s) for the next page URL. This setting gets “href” attribute of found elements as default. However, you can provide a different attribute, as well. After you write the selector(s), you can use the test button next to the input fields to check if the selector works as expected.

      • 4.1.6 Finding and replacing in the HTML at first load

        Sometimes you want to change things in the HTML of the target page. You can do this by using this option. For more information about find-and-replaces, please check “understanding ‘find and replace’” section. You should also check lifecycle of events section to get the most of this option.

    • 4.2 Post settings

      site-en-post-min

      Post settings are for the post page of the target web site. These settings are applied each post taken from each category. Usage of the most of the options in post settings are the same as the ones in category settings. Hence, you can learn usage of more specific options here. For basics, please refer to category section. To understand the explanations below, you should know how to use CSS selectors, developer tools, writing CSS selectors for options and testing them. If you don’t know these yet, please go and check the category settings section.

      • 4.2.1 Getting list-type posts

        Sometimes the posts on the target page are written as a list. You can get these posts as a list and provide a template for each list item, and reverse the list. To enable retrieval of the posts as list, you should select “posts are list type?” checkbox. When you do this, the options related to the list-type posts will be shown. You can read the information of each option by clicking the information symbol next to label of the options to learn more about them.

      • 4.2.2 Getting paginated posts

        If the target posts are paginated, you can save them to your site as paginated posts. To be able to do this, you should first select “paginate posts” checkbox. When you do this, the related options will be shown. Next, you should provide a CSS selector (or selectors) for the next page URL and/or all page URLs. If the target post has a next page URL, then you should definitely write a CSS selector for next page URL. On the other hand, if the first page of the post (or each page) includes all of the links for other pages, you can provide a CSS selector (or selectors) for those elements. For more information, please read the information of the options by clicking the information symbol next to the labels of the options.

      • 4.2.3 Custom meta selectors

        You can save something from the target post’s HTML as value of a post meta key. Post meta keys are used to store information about posts in the database. This option can be handy if you use another plugin that has specific type of posts (such as WooCommerce’s products). You can, for instance, write a CSS selector for the price that exists in the target post page, and save it as the price of to-be-created post (in this case, product). To be able to do this, you should learn what post meta keys are used to store the price of the product. This is one example of course. Here, the possibilities are endless.

      • 4.2.4 Other settings

        For other settings in the post configuration page, you can read the information for each option on the settings page, or the explanations in the category section. When you read the category section, you will be able to understand the basics of the settings.

    • 4.3 Template settings

      site-en-templates-min

      Templates are for you to put specific data into anywhere in the posts that will be saved to your site. You can use the buttons above the template editors to see what short codes are available and their explanations when you hover over them. You can simply click the buttons to copy the short code. Then, paste it anywhere in the template.

      • 4.3.1 Main post template

        To be able to save a post, you must prepare main post template, and place main content short code. You can place title, excerpt, content, source URL and the list inside the post in this template. If you expect a list from the target page, you must place the list inside the template. Otherwise, the list will not be shown. In addition, if you do not place the main content, nothing will be shown.

      • 4.3.2 List item template

        This template is used for each list item of the list-type post. You can place title, position and content of the list item in this template. If you do not place the content of the list item, the list item in the post in your site will not have a content.

      • 4.3.3 Find-and-replaces

        Please refer to “understanding find-and-replaces” section.

    • 4.4 Using customized general settings for a site

      site-en-custom-general-min

      You can use customized general settings for a site. To do this, go the main settings of the site and check “Use custom general settings?” option. When you do this, a new tab will appear among other tabs, named as settings. Go to that tab and configure the options. For further information about general settings, please refer to general settings section.

    • 4.5 Importing/exporting settings

      site-en-import-export-min

      You can import and export settings of a site. Go to the import/export tab. You will se two things. One of them is an empty text area that you can paste the exported settings. The other one is a text area with settings inside it. You can copy the settings from this text area, go to another site and paste the settings into the text area for importing settings. Next, save the settings. You will see that all of the settings are imported.

  • 5 Tester

    Tester is a tool which helps you test category and post pages of a site. To test a site’s page, go to the tester by selecting Content Crawler > Tester from the admin menu. Then, select a site, select type of the page you want to test, and write the URL of the page. Next, press test button. 

    If you test a category page, you will see the post links (if the CSS selector exists) with featured images (if the CSS selector exists) and next page URL (if the CSS selector exists). You can use test buttons next to each post URL or next page URL to test it.

    If you test a post, you will see the template of the post. If you choose to save images to your server, the images will be saved to your server before showing you the template.

    In each test, you can go to details section at the bottom and check other details. The details section will include the time spent and memory used for the test for both post and category pages. The details are more comprehensive for the post test. You can see meta keywords, meta description, meta keywords as tags, post title, post excerpt, next page URL(s), saved images and featured image URL. Note that the featured image will not be saved to your server for the test.

  • 6 Tools

    You can reach the tools by selecting Content Crawler > Tools from the admin menu.

    • 6.1 Manually crawling and saving a post

      You can manually crawl and save a post by providing the site from which the post will be taken, a category in which the post will be saved, and the URL of the post. After you filled these information, just click “crawl and save” button. After a while, you can see the link for the post which has just been saved to your site. Waiting time depends on whether the images will be saved or not, whether the target post is paginated or not, number of pages of the target post, target site’s speed, and your server’s capabilities. You can check how many milliseconds are required to save a certain post by testing the post using the tester and checking the details section after the test completes.

    • 6.2 Deleting URLs

      When the scheduling is active, target site’s categories will be visited to collect post URLs. These URLs will be stored in the database, so that they can be saved when the time comes. You can delete these URLs by using this tool.

  • 7 General settings

    General settings are applied for all of the sites. You can reach general settings by selecting Content Crawler > General Settings from the admin menu.

    • 7.1 Post settings

      You can configure a few options for the posts that will be saved to your site. You can allow comments, directly publish the post or keep it as draft, select the type and author of the post, and set a password for it.

    • 7.2 SEO settings

      If you want to get meta keywords and description from the target site and save them as your post’s meta keywords and description, you should provide the post meta keys under which these meta values are saved. These post meta keys depend on your SEO plugin. If you do not know these keys, you should ask your SEO plugin’s authors/support forums. For the find-and-replace setting, please refer to “understanding find-and-replaces” section.

    • 7.3 Scheduling settings

      You can configure scheduling options here. Scheduling is good for automatically collecting URLs and saving posts from the active sites. When you activate scheduling, post URLs will be collected from active sites uniformly and saved to the database. In addition, when the time comes, the post URLs in the database will be visited and saved to your site as posts. You can select time intervals for URL collection and post crawling. You can also define a limit to the number of category pages to be analyzed. In addition, you can limit the number of pages to check if there is no new URL found. To understand this option better, please read the explanation of this option by clicking the information symbol next to the label of the option.

    • 7.4 Advanced settings

      In this section, you can set HTTP user agent and HTTP accept headers. You can also allow/disallow cookies when browsing the target page.

  • 8 Understanding find-and-replaces

    Find-and-replace options are major parts of WP Content Crawler. Using these options, you can manipulate HTML of the target page. If what you want to find does not exist in the page as you want, you can make it as you want. Moreover, you can change words to boost SEO. You can find and replace in two different ways, which are using plain text and regular expressions.

    Finding and replacing by using plain text is straightforward. As you do in all of other applications, you can write what to find and with what to replace it directly. Note that, this option is case sensitive. So, you should keep this in mind when you use it.

    Using regular expressions is far more powerful than using plain text. You can change the HTML however you want. For instance, if there is a pagination with page URLs inside it, but there is not any next page URL, you can find hyperlink next to the active element and add a “next-page” class to it. Next, you can write “a.next-page” as a CSS selector for the next page URL. This is just one example. Possibilities are endless. Regular expressions are too complicated to cover here. So, you need to learn using them from an external source. You can start by searching YouTube for “regular expressions”.

    find-and-replace-example-min

    You can check out the example above. There are three find-and-replace options defined. The first one is used to remove the links of images. $1 in the replace box represents the image element. So, if there is a link with an image inside, the image will stay, while the link is removed. The second one is to remove the p elements whose content starts with More info. As you can see, the replace box is empty, meaning that the found values will be replaced with nothing, i.e. they will be removed. Finally, the third one is to remove script elements from the post.

  • 9 Lifecycle of events

    There are several options to find and replace in the HTML of the target site and remove HTML elements. In order to use these options effectively, you should understand the lifecycle of the events. In other words, you should know which option is applied before or after which option.

    • 9.1 Category page
      1. Find-and-replaces set from general settings page
      2. Find-and-replaces set from “Find and replace in HTML at first load” setting
      3. Removal of unnecessary elements
      4. Retrieval of post links
      5. Retrieval of featured image URLs
      6. Find-and-replaces for featured images
      7. Retrieval of next page URL
    • 9.2 Post page
      1. Find-and-replaces set from general settings page
      2. Find-and-replaces set from “Find and replace in HTML at first load” setting
      3. Retrieval of next page URL or all page URLs
      4. Removal of unnecessary elements
      5. Retrieval of title
      6. Find-and-replaces for title
      7. Retrieval of excerpt
      8. Find-and-replaces for excerpt
      9. Retrieval of content
      10. Retrieval of list in post
      11. Retrieval of meta keywords
      12. Find-and-replaces for meta keywords
      13. Retrieval of meta description
      14. Find-and-replaces for meta description
      15. Retrieval of image URLs
      16. Find-and-replaces for image URLs
      17. Retrieval of featured image URL
      18. Find-and-replace for featured image URL
      19. Retrieval of custom post meta
      20. Template preparation
      21. Find-and-replaces for content
  • 10 FAQ

    Frequently asked questions and their answers are in this section.

    • 10.1 What can it be used for?
      • - Create a personal site which collects news, posts, etc. from your favorite sites to see them in one place
      • - Use it with WooCommerce to collect products from shopping sites
      • - Collect products from affiliate programs to make money
      • - Collect posts to create a test environment for your plugin/theme
      • - Collect plugins, themes, apps, images from other sites to create a collection of them
      • - Keep track of competitors

      These are just a few examples. You can use WP Content Crawler for any purpose you want!