{"id":138,"date":"2021-11-13T14:52:44","date_gmt":"2021-11-13T14:52:44","guid":{"rendered":"https:\/\/usersearch.org\/blog\/?p=138"},"modified":"2021-11-15T15:20:41","modified_gmt":"2021-11-15T15:20:41","slug":"how-to-build-a-web-crawler-for-osint","status":"publish","type":"post","link":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/","title":{"rendered":"How to build a Web-Crawler for OSINT"},"content":{"rendered":"\n<p>How exactly can you build web crawlers to help with OSINT Investigations? What an amazing question.  This is a question we asked ourselves over a decade ago.  Hence&#8230;the birth of UserSearch.org!<\/p>\n\n\n\n<p>In this article, we&#8217;re going to talk a little about how you can build your own web crawler using Python (don&#8217;t worry, it&#8217;s going to be basic).  After 10 minutes of reading, you may even have your own Python web crawler to call your own!<\/p>\n\n\n\n<p class=\"western\">We&#8217;ve been asked a number of times how <a href=\"https:\/\/usersearch.org\" target=\"_blank\" rel=\"noreferrer noopener\">our reverse username search works<\/a>.&nbsp; Trying to explain it causes a blank face.&nbsp; Instead, were going to do a post on how web crawlers can be used for open-source research (ok and how our website works).<a href=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/f4fc116168df04fcea50a9e5d4f9c46b.jpg\"><span style=\"color: #000080;\"><\/span><\/a><\/p>\n\n\n\n<p class=\"western\"><em>Health Warning<\/em>: Very technical.&nbsp; Requires knowledge in PHP and Python.&nbsp; If you don&#8217;t have it, keep reading and you may get an idea of how search engines work (or ours at least).&nbsp; Strongly recommend you read our previous posts first (Does a website<em><a href=\"https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/28\/does-a-dating-website-know-you\/\" target=\"_blank\" rel=\"noopener\"> know you?<\/a>, <a href=\"https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/26\/can-you-find-my-hidden-email-address\/\" target=\"_blank\" rel=\"noopener\">Can you find my hidden email address?<\/a> and <a href=\"https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/26\/new-search-engine-on-the-block\/\" target=\"_blank\" rel=\"noopener\">New Search engine on the blog<\/a>).&nbsp; It will give you a grounding on how the manual techniques work, so when you start reading our code &#8211; it will click!<br><\/em><\/p>\n\n\n\n<h2 id=\"h-what-is-a-web-crawler\"><strong>What is a Web-Crawler?<\/strong><\/h2>\n\n\n\n<p class=\"western\">First, what is a web crawler? (as per Wikipedia https:\/\/en.wikipedia.org\/wiki\/Web_crawler).<\/p>\n\n\n\n<p class=\"western\">&#8220;A Web crawler is an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Internet_bot\">Internet bot<\/a> which systematically browses the <a href=\"https:\/\/en.wikipedia.org\/wiki\/World_Wide_Web\">World Wide Web<\/a>, typically for the purpose of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Web_indexing\">Web indexing<\/a>. A Web crawler may also be called a Web spider,<a href=\"https:\/\/en.wikipedia.org\/wiki\/Web_crawler#cite_note-spekta-1\">[1]<\/a> an ant, an automatic indexer,<a href=\"https:\/\/en.wikipedia.org\/wiki\/Web_crawler#cite_note-2\">[2]<\/a>, or (in the <a href=\"https:\/\/en.wikipedia.org\/wiki\/FOAF_%28software%29\">FOAF<\/a> software context) a Web scutter.<a href=\"https:\/\/en.wikipedia.org\/wiki\/Web_crawler#cite_note-3\">[3]<\/a>&#8220;<\/p>\n\n\n\n<h2><strong>Types of Crawlers?<\/strong><\/h2>\n\n\n\n<p class=\"western\">This explanation is by no way official but from how our site works, we believe there are three distinct crawler bots.&nbsp; There are crawlers that are extremely sophisticated and crawl, indexes, and remember where it&#8217;s been (google for example), and there&#8217;s directional.&nbsp; Directional crawlers are more specific and have a particular task that needs completing.&nbsp; Still very sophisticated, but designed for a handful of specific, targeted results.&nbsp; <\/p>\n\n\n\n<p class=\"western\">Usersearch.org has been designed mostly on directed crawlers.&nbsp; We&#8217;ve built the crawlers from the ground up, a blank page.&nbsp; As you can see, our web crawlers have been specifically designed to find user names, pseudo names, email addresses, phone numbers, and website social stats across approximately 500 social networks and forums.&nbsp; And finally, there are Omni-directional crawlers that are extremely specific with no wiggle room, a bit like accessing an API with very few moving parts.&nbsp; We use a few of these too.<\/p>\n\n\n\n<h2 class=\"has-text-align-right\"><strong>Techniques<\/strong><\/h2>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignleft size-medium is-resized\"><a href=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/tumblr_ljmeveF10t1qf00w4.gif\"><img src=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/tumblr_ljmeveF10t1qf00w4-300x135.gif\" alt=\"Reverse lookups are the cornerstone to search engines.  They all started from web crawling technologies.\" class=\"wp-image-142\" width=\"331\" height=\"149\"\/><\/a><figcaption>Reverse lookups are the cornerstone to search engines. They all started from web crawling technologies.<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"western\">If you&#8217;ve read our previous posts (<em><a href=\"https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/28\/does-a-dating-website-know-you\/\" target=\"_blank\" rel=\"noopener\">Does a website know you?<\/a>, <a href=\"https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/26\/can-you-find-my-hidden-email-address\/\" target=\"_blank\" rel=\"noopener\">Can you find my hidden email address?<\/a> and <a href=\"https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/26\/new-search-engine-on-the-block\/\" target=\"_blank\" rel=\"noopener\">New Search engine on the blog<\/a><\/em>) you will see the manual process is really not that difficult to work out if a website knows a particular email address.&nbsp; But, if you start doing that process manually on 10, 15, 20, or 100+ websites &#8211; it gets boring, fast.&nbsp; <\/p>\n\n\n\n<p class=\"western\">The solution to this, of course, is to build a directional-web crawler (our own definition!).&nbsp; Directional-Web crawlers (a bit like a Directional satellite where a line of sight must exist for two satellites to communicate).&nbsp; Our directional web crawlers know exactly where to go, where to look, what to do, and where to put it.&nbsp; Of course, we need to put some safety measures in place for various conditions such as an unexpected change in the targeted web page or a page that is temporary not responding.&nbsp; But that&#8217;s just part of the fun.&nbsp; We need to let the crawler know what to do, should some data get clogged into the system that it may not have expected (such as a space between two words such as Fred Hammer, rather than Fred_Hammer).<\/p>\n\n\n\n<h2><strong>Basic Python Modules for web-crawling<\/strong><\/h2>\n\n\n\n<p>So, we are not going to cover how to install python or how to test the modules.&nbsp; We&#8217;re hoping you already know this (if you don&#8217;t, we can do a future post if you ask us).&nbsp; We&#8217;re jumping right in.<\/p>\n\n\n\n<p>Good web crawlers technologies:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignright size-medium\"><a href=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/25.png\"><img width=\"300\" height=\"300\" src=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/25-300x300.png\" alt=\"Anyone can build an OSINT crawler for reverse lookup tool with a little coding\" class=\"wp-image-143\" srcset=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/25-300x300.png 300w, https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/25-150x150.png 150w, https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/25.png 480w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption>Anyone can build an OSINT lookup tool with a little coding<\/figcaption><\/figure><\/div>\n\n\n\n<p>-Scrapey<\/p>\n\n\n\n<p>-Silenium<\/p>\n\n\n\n<p>-Mechanise<\/p>\n\n\n\n<p class=\"western\">Scrapey we don&#8217;t like too much as it tries to do everything for you.&nbsp; We&#8217;d class this as a generic web crawler (not directional \/ omnidirectional).&nbsp; So it&#8217;s not much use for us, but it&#8217;s good for mass-web crawling projects.&nbsp; <\/p>\n\n\n\n<p class=\"western\">Silenium is great if you want a &#8216;point and click interface (Omni-directional).&nbsp; You can build a little program in a matter of minutes that will do simple actions like entering you&#8217;re credentialed into a web-based email account, signing in, and sending email repeatability.&nbsp; Pretty cool as you actually see the actions taking place (mouse movement, firefox opening, page loading, email being typed, etc).&nbsp; Good for presentations.&nbsp; <\/p>\n\n\n\n<p class=\"western\">Mechanize (Omni and directional) is a module that allows you to interact with websites similar to Silenium but in the background.&nbsp; This means you can multi-process thousands of iterations at the same time, independently of each other (we use mechanise).&nbsp; You can then take the data captured from the crawler and use the power of python to interact with the data or store it in a database.&nbsp; It&#8217;s probably the best free solution on the market if you can use python.<\/p>\n\n\n\n<h2><strong>Python Mechanise Basics<\/strong> &#8211; crawling the web<\/h2>\n\n\n\n<p class=\"western\">Mechanise can browse to a web page and access a specific web form, and then enter details into that form and submit.&nbsp; It can then take the result of that form submission and do something else with it &#8211; whether that stores the result in a database or keep filling in the next page of a form and continue.<\/p>\n\n\n\n<p class=\"western\">The below python code simply starts to mechanise by creating a mechanize object (<code class=\"western\">br = mechanize.Browser() ) and then opens a website (response = br.open('some_site');.<\/code>&nbsp; The program then goes on to list all links on that page (br.links ).<\/p>\n\n\n\n<p class=\"western\">From here it starts a &#8216;for&#8217; loop that just iterates through each link found on that page, opens that link, and lists all the links of that opened page.&nbsp; There you have it, in under 10 lines you have a web-crawler that will open a webpage and crawl all the links on that page and then continue onto the next links and continue.&nbsp; It&#8217;s basically walking through every single page on a particular page.<\/p>\n\n\n\n<p>**************************************************************************************************************<\/p>\n\n\n\n<pre class=\"wp-block-code lang-py prettyprint prettyprinted\"><code><span class=\"pln\">br <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> mechanize<\/span><span class=\"pun\">.<\/span><span class=\"typ\">Browser<\/span><span class=\"pun\">() # Creates object<\/span><span class=\"pln\">\nresponse <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">open<\/span><span class=\"pun\">(<\/span><span class=\"str\">'some_site'<\/span><span class=\"pun\">); # opens site and puts the value in 'response' varable<\/span><span class=\"pln\">\n\ncurrent_links <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> list<\/span><span class=\"pun\">(<\/span><span class=\"pln\">br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">links<\/span><span class=\"pun\">()) # list links<\/span>\n\n<span class=\"kwd\">for<\/span><span class=\"pln\"> link <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> current_links<\/span><span class=\"pun\">:<\/span><span class=\"pln\">\n  br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">follow_link<\/span><span class=\"pun\">(<\/span><span class=\"pln\">link<\/span><span class=\"pun\">) # opens link<\/span><span class=\"pln\">\n  sub_links <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> list<\/span><span class=\"pun\">(<\/span><span class=\"pln\">br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">links<\/span><span class=\"pun\">()) #get links from opened page<\/span>\n  <span class=\"kwd\">for<\/span><span class=\"pln\"> link <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> sub_links<\/span><span class=\"pun\">: # opens the next lot of links<\/span><span class=\"pln\">\n    br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">follow_link<\/span><span class=\"pun\">(<\/span><span class=\"pln\">link<\/span><span class=\"pun\">) # follow the next lot of links<\/span><\/code><\/pre>\n\n\n\n<p>**************************************************************************************************************<\/p>\n\n\n\n<h2><strong>Cheatsheet for building quick web crawler functions<\/strong><\/h2>\n\n\n\n<p class=\"western\">So the above may be a little confusing.&nbsp; So below is a step-by-step guide on creating a crawler that will enter some details into a form and then submit it.&nbsp; You can see from the below that you need some HTML knowledge in locating the forum variable names.<\/p>\n\n\n\n<ul><li>Create a browser object and give it some optional settings.<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"kwd\">import<\/span><span class=\"pln\"> mechanize\nbr <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> mechanize<\/span><span class=\"pun\">.<\/span><span class=\"typ\">Browser<\/span><span class=\"pun\">()<\/span><span class=\"pln\">\nbr<\/span><span class=\"pun\">.<\/span><span class=\"pln\">set_all_readonly<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">False<\/span><span class=\"pun\">)<\/span>    <span class=\"com\"># allow everything to be written to<\/span><span class=\"pln\">\nbr<\/span><span class=\"pun\">.<\/span><span class=\"pln\">set_handle_robots<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">False<\/span><span class=\"pun\">)<\/span>   <span class=\"com\"># ignore robots<\/span><span class=\"pln\">\nbr<\/span><span class=\"pun\">.<\/span><span class=\"pln\">set_handle_refresh<\/span><span class=\"pun\">(<\/span><span class=\"kwd\">False<\/span><span class=\"pun\">)<\/span>  <span class=\"com\"># can sometimes hang without this<\/span><span class=\"pln\">\nbr<\/span><span class=\"pun\">.<\/span><span class=\"pln\">addheaders <\/span><span class=\"pun\">=<\/span>   \t      \t<span class=\"com\"># [('User-agent', 'Firefox')]<\/span>\n\n\n<\/pre>\n\n\n\n<ul><li>Open a webpage and inspect its contents<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"pln\">response <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">open<\/span><span class=\"pun\">(<\/span><span class=\"pln\">url<\/span><span class=\"pun\">)<\/span>\n<span class=\"kwd\">print<\/span><span class=\"pln\"> response<\/span><span class=\"pun\">.<\/span><span class=\"pln\">read<\/span><span class=\"pun\">()<\/span>      <span class=\"com\"># the text of the page<\/span><span class=\"pln\">\nresponse1 <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">response<\/span><span class=\"pun\">()<\/span>  <span class=\"com\"># get the response again<\/span>\n<span class=\"kwd\">print<\/span><span class=\"pln\"> response1<\/span><span class=\"pun\">.<\/span><span class=\"pln\">read<\/span><span class=\"pun\">()<\/span>     <span class=\"com\"># can apply lxml.html.fromstring()\n\n<\/span><\/pre>\n\n\n\n<ul><li>List the forms that are in the page<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"kwd\">for<\/span><span class=\"pln\"> form <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">forms<\/span><span class=\"pun\">():<\/span>\n    <span class=\"kwd\">print<\/span> <span class=\"str\">\"Form name:\"<\/span><span class=\"pun\">,<\/span><span class=\"pln\"> form<\/span><span class=\"pun\">.<\/span><span class=\"pln\">name\n    <\/span><span class=\"kwd\">print<\/span><span class=\"pln\"> form\n\n<\/span><\/pre>\n\n\n\n<ul><li>To go on the mechanize browser object must have a form selected<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"pln\">br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">select_form<\/span><span class=\"pun\">(<\/span><span class=\"str\">\"form1\"<\/span><span class=\"pun\">)<\/span>         <span class=\"com\"># works when form has a name<\/span><span class=\"pln\">\nbr<\/span><span class=\"pun\">.<\/span><span class=\"pln\">form <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> list<\/span><span class=\"pun\">(<\/span><span class=\"pln\">br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">forms<\/span><span class=\"pun\">())[<\/span><span class=\"lit\">0<\/span><span class=\"pun\">]<\/span>  <span class=\"com\"># use when form is unnamed\n<\/span><\/pre>\n\n\n\n<ul><li>Iterate through the controls in the form.<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"kwd\">for<\/span><span class=\"pln\"> control <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">form<\/span><span class=\"pun\">.<\/span><span class=\"pln\">controls<\/span><span class=\"pun\">:<\/span>\n    <span class=\"kwd\">print<\/span><span class=\"pln\"> control\n    <\/span><span class=\"kwd\">print<\/span> <span class=\"str\">\"type=%s, name=%s value=%s\"<\/span> <span class=\"pun\">%<\/span> <span class=\"pun\">(<\/span><span class=\"pln\">control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">type<\/span><span class=\"pun\">,<\/span><span class=\"pln\"> control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">name<\/span><span class=\"pun\">,<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">[<\/span><span class=\"pln\">control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">name<\/span><span class=\"pun\">])<\/span><\/pre>\n\n\n\n<ul><li>Controls can be found by name<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"pln\">control <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">form<\/span><span class=\"pun\">.<\/span><span class=\"pln\">find_control<\/span><span class=\"pun\">(<\/span><span class=\"str\">\"controlname\"<\/span><span class=\"pun\">)\n\n<\/span>Having a select control tells you what values can be selected\n<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"kwd\">if<\/span><span class=\"pln\"> control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">type <\/span><span class=\"pun\">==<\/span> <span class=\"str\">\"select\"<\/span><span class=\"pun\">:<\/span>  <span class=\"com\"># means it's class ClientForm.SelectControl<\/span>\n    <span class=\"kwd\">for<\/span><span class=\"pln\"> item <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">items<\/span><span class=\"pun\">:<\/span>\n    <span class=\"kwd\">print<\/span> <span class=\"str\">\" name=%s values=%s\"<\/span> <span class=\"pun\">%<\/span> <span class=\"pun\">(<\/span><span class=\"pln\">item<\/span><span class=\"pun\">.<\/span><span class=\"pln\">name<\/span><span class=\"pun\">,<\/span><span class=\"pln\"> str<\/span><span class=\"pun\">([<\/span><span class=\"pln\">label<\/span><span class=\"pun\">.<\/span><span class=\"pln\">text  <\/span><span class=\"kwd\">for<\/span><span class=\"pln\"> label <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> item<\/span><span class=\"pun\">.<\/span><span class=\"pln\">get_labels<\/span><span class=\"pun\">()]))\n\n<\/span><\/pre>\n\n\n\n<ul><li>Because &#8216;Select&#8217; type controls can have multiple selections, they must be set with a list, even if it is one element.<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"kwd\">print<\/span><span class=\"pln\"> control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">value\n<\/span><span class=\"kwd\">print<\/span><span class=\"pln\"> control  <\/span><span class=\"com\"># selected value is starred<\/span><span class=\"pln\">\ncontrol<\/span><span class=\"pun\">.<\/span><span class=\"pln\">value <\/span><span class=\"pun\">=<\/span> <span class=\"pun\">[<\/span><span class=\"str\">\"ItemName\"<\/span><span class=\"pun\">]<\/span>\n<span class=\"kwd\">print<\/span><span class=\"pln\"> control\nbr<\/span><span class=\"pun\">[<\/span><span class=\"pln\">control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">name<\/span><span class=\"pun\">]<\/span> <span class=\"pun\">=<\/span> <span class=\"pun\">[<\/span><span class=\"str\">\"ItemName\"<\/span><span class=\"pun\">]<\/span>  <span class=\"com\"># equivalent and more normal\n<\/span><\/pre>\n\n\n\n<ul><li>Controls can be set to readonly and disabled.<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"pln\">control<\/span><span class=\"pun\">.<\/span><span class=\"kwd\">readonly<\/span> <span class=\"pun\">=<\/span> <span class=\"kwd\">False<\/span><span class=\"pln\">\ncontrol<\/span><span class=\"pun\">.<\/span><span class=\"pln\">disabled <\/span><span class=\"pun\">=<\/span> <span class=\"kwd\">True\n<\/span><\/pre>\n\n\n\n<ul><li>OR disable all of them like so<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\"><span class=\"kwd\">for<\/span><span class=\"pln\"> control <\/span><span class=\"kwd\">in<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">form<\/span><span class=\"pun\">.<\/span><span class=\"pln\">controls<\/span><span class=\"pun\">:<\/span>\n   <span class=\"kwd\">if<\/span><span class=\"pln\"> control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">type <\/span><span class=\"pun\">==<\/span> <span class=\"str\">\"submit\"<\/span><span class=\"pun\">:<\/span><span class=\"pln\">\n       control<\/span><span class=\"pun\">.<\/span><span class=\"pln\">disabled <\/span><span class=\"pun\">=<\/span> <span class=\"kwd\">True\n<\/span><\/pre>\n\n\n\n<ul><li>When your form is complete you can submit\n<pre class=\"prettyprint prettyprinted\"><span class=\"pln\">response <\/span><span class=\"pun\">=<\/span><span class=\"pln\"> br<\/span><span class=\"pun\">.<\/span><span class=\"pln\">submit<\/span><span class=\"pun\">()<\/span>\n<span class=\"kwd\">print<\/span><span class=\"pln\"> response<\/span><span class=\"pun\">.<\/span><span class=\"pln\">read<\/span><span class=\"pun\">()<\/span><span class=\"pln\">\nbr<\/span><span class=\"pun\">.<\/span><span class=\"pln\">back<\/span><span class=\"pun\">()<\/span>   <span class=\"com\"># go back<\/span><\/pre>\n<\/li><\/ul>\n\n\n\n<h2><strong>Our example:<\/strong><\/h2>\n\n\n\n<p><strong><br><\/strong> So the below is an example we&#8217;ve created. It&#8217;s not part of our website but it works just fine.<br>This crawler is designed to jump through several web-forms filling out data to cause a result at the end<br>(Hint: https:\/\/usersearch.org\/blog\/index.php\/2015\/09\/28\/does-a-dating-website-know-you\/). Not commented I&#8217;m afraid but its self explanatory if you&#8217;ve read the above cheat cheats. <\/p>\n\n\n\n<pre class=\"wp-block-preformatted prettyprint prettyprinted\">def email_check_complex(email, location, site_name, form_selection, search_term, number_of_inputs, input_one, input_two, input_three, success_value, page_jump_through): # click through site check\n&nbsp;&nbsp; &nbsp;# Browser\n&nbsp;&nbsp; &nbsp;site = mechanize.Browser(factory=mechanize.RobustFactory())\n&nbsp;&nbsp; &nbsp;# Cookie Jar\n&nbsp;&nbsp; &nbsp;# Browser options\n&nbsp;&nbsp; &nbsp;site.set_handle_equiv(True)\n&nbsp;&nbsp; &nbsp;site.set_handle_gzip(False)\n&nbsp;&nbsp; &nbsp;site.set_handle_redirect(True)\n&nbsp;&nbsp; &nbsp;site.set_handle_referer(True)\n&nbsp;&nbsp; &nbsp;site.set_handle_robots(False)\n&nbsp;&nbsp; &nbsp;# Follows refresh 0 but not hangs on refresh &gt; 0\n&nbsp;&nbsp; &nbsp;site.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=5)\n&nbsp;&nbsp; &nbsp;# User-Agent\n&nbsp;&nbsp; &nbsp;site.addheaders = [('User-agent', 'Mozilla\/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko\/2008071615 Fedora\/3.0.1-1.fc9 Firefox\/3.0.1')]&nbsp;&nbsp; &nbsp;\n&nbsp;&nbsp; &nbsp;\n&nbsp;&nbsp; &nbsp;site_opened = site.open(location) #Open site\n&nbsp;&nbsp; &nbsp;#site.select_form(nr=form_selection) #Select form number from value in-putted\n\n&nbsp;&nbsp; &nbsp;if page_jump_through == 1: # jump through 1 pages\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.form[input_three] = email_chk\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;elif page_jump_through == 2: # jump through 2 pages\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.form[input_three] = email_chk\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;elif page_jump_through == 3: # jump through 3 pages\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.form[input_three] = email_chk\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;elif page_jump_through == 4: # jump through 4 pages\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.select_form(nr=form_selection)\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.form[input_three] = email_chk\n&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;site.submit()\n\n<\/pre>\n\n\n\n<p class=\"western\">So the above script has sent some commands to a website form and the response will be the result of that submission (in our case a result saying &#8216;Email already registered).&nbsp; Now, you need to build something to retrieve this response and do something clever with it (this is where your OSINT skills come in handy).<\/p>\n\n\n\n<h2>Now you&#8217;re an OSINT crawler coder!<\/h2>\n\n\n\n<p class=\"western\">Now, if you&#8217;ve read this far and if nothing else but 5-10% has sunk in then well done, we&#8217;re happy.&nbsp; If you&#8217;ve reached this far and like us web-crawling makes you want to get up in the morning and code&#8230;you may want to read our next post.&nbsp; <\/p>\n\n\n\n<p class=\"western\">We&#8217;ll build on the examples we&#8217;ve shown you and put some code together on how you can retrieve the final response, do some snazzy stuff with it and check it for particular keywords that will determine if your expected email exists at a given location or not.&nbsp; THEN you can continue and even automate what you would do next (you&#8217;re making an auto-open source searcher, well done you!)&nbsp; Who needs a team when you can code!<\/p>\n\n\n\n<p class=\"western\">Give it a go yourself and compare your code with ours next week!<\/p>\n\n\n\n<p class=\"western\">And that&#8217;s all we have time for I&#8217;m afraid. Any questions, just post\/email and we&#8217;ll try and answer.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-medium\"><a href=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/smiley-face-thumbs-up-clipart-acqbqAzcM.png\"><img width=\"300\" height=\"208\" src=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/smiley-face-thumbs-up-clipart-acqbqAzcM-300x208.png\" alt=\"Reverse email and username searching with python osint webcrawlers.\" class=\"wp-image-161\" srcset=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/smiley-face-thumbs-up-clipart-acqbqAzcM-300x208.png 300w, https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/10\/smiley-face-thumbs-up-clipart-acqbqAzcM.png 400w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><figcaption>Reverse email and username searching with python<\/figcaption><\/figure><\/div>\n","protected":false},"excerpt":{"rendered":"<p>How exactly can you build web crawlers to help with OSINT Investigations? What an amazing question. This is a question we asked ourselves over a decade ago. Hence&#8230;the birth of UserSearch.org! In this article, we&#8217;re going to talk a little about how&hellip;<\/p>\n","protected":false},"author":1,"featured_media":88,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[26],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v16.8 (Yoast SEO v18.4.1) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to build a Web-Crawler for OSINT -<\/title>\n<meta name=\"description\" content=\"We show you how to build osint web crawlers to perform open-source research (OSINT), to automate your online investigations.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to build a Web-Crawler for OSINT\" \/>\n<meta property=\"og:description\" content=\"We show you how to build osint web crawlers to perform open-source research (OSINT), to automate your online investigations.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/\" \/>\n<meta property=\"article:published_time\" content=\"2021-11-13T14:52:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-11-15T15:20:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"768\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jamie\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/usersearch.org\/blog\/#organization\",\"name\":\"UserSearch\",\"url\":\"https:\/\/usersearch.org\/blog\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/usersearch.org\/blog\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2021\/08\/find-users-online-user-lookup-tool-reverse-user-search.jpg.webp\",\"contentUrl\":\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2021\/08\/find-users-online-user-lookup-tool-reverse-user-search.jpg.webp\",\"width\":285,\"height\":179,\"caption\":\"UserSearch\"},\"image\":{\"@id\":\"https:\/\/usersearch.org\/blog\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/usersearch.org\/blog\/#website\",\"url\":\"https:\/\/usersearch.org\/blog\/\",\"name\":\"\",\"description\":\"Usersearch Blog - Helping you stay safe online\",\"publisher\":{\"@id\":\"https:\/\/usersearch.org\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/usersearch.org\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg\",\"contentUrl\":\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg\",\"width\":1024,\"height\":768,\"caption\":\"build a webcrawler for osint\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#webpage\",\"url\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/\",\"name\":\"How to build a Web-Crawler for OSINT -\",\"isPartOf\":{\"@id\":\"https:\/\/usersearch.org\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#primaryimage\"},\"datePublished\":\"2021-11-13T14:52:44+00:00\",\"dateModified\":\"2021-11-15T15:20:41+00:00\",\"description\":\"We show you how to build osint web crawlers to perform open-source research (OSINT), to automate your online investigations.\",\"breadcrumb\":{\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/usersearch.org\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to build a Web-Crawler for OSINT\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#webpage\"},\"author\":{\"@id\":\"https:\/\/usersearch.org\/blog\/#\/schema\/person\/7e34b69a5ae2693e33082d4d643a37e5\"},\"headline\":\"How to build a Web-Crawler for OSINT\",\"datePublished\":\"2021-11-13T14:52:44+00:00\",\"dateModified\":\"2021-11-15T15:20:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#webpage\"},\"wordCount\":1536,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/usersearch.org\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg\",\"keywords\":[\"webcrawler\"],\"articleSection\":[\"Open Source Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/usersearch.org\/blog\/#\/schema\/person\/7e34b69a5ae2693e33082d4d643a37e5\",\"name\":\"Jamie\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/usersearch.org\/blog\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/db798ccc802968f420608212f5301ca1?s=96&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/db798ccc802968f420608212f5301ca1?s=96&r=g\",\"caption\":\"Jamie\"},\"sameAs\":[\"https:\/\/www.usersearch.org\"],\"url\":\"https:\/\/usersearch.org\/blog\/index.php\/author\/jamie\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to build a Web-Crawler for OSINT -","description":"We show you how to build osint web crawlers to perform open-source research (OSINT), to automate your online investigations.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/","og_locale":"en_US","og_type":"article","og_title":"How to build a Web-Crawler for OSINT","og_description":"We show you how to build osint web crawlers to perform open-source research (OSINT), to automate your online investigations.","og_url":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/","article_published_time":"2021-11-13T14:52:44+00:00","article_modified_time":"2021-11-15T15:20:41+00:00","og_image":[{"width":1024,"height":768,"url":"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jamie","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/usersearch.org\/blog\/#organization","name":"UserSearch","url":"https:\/\/usersearch.org\/blog\/","sameAs":[],"logo":{"@type":"ImageObject","@id":"https:\/\/usersearch.org\/blog\/#logo","inLanguage":"en-US","url":"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2021\/08\/find-users-online-user-lookup-tool-reverse-user-search.jpg.webp","contentUrl":"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2021\/08\/find-users-online-user-lookup-tool-reverse-user-search.jpg.webp","width":285,"height":179,"caption":"UserSearch"},"image":{"@id":"https:\/\/usersearch.org\/blog\/#logo"}},{"@type":"WebSite","@id":"https:\/\/usersearch.org\/blog\/#website","url":"https:\/\/usersearch.org\/blog\/","name":"","description":"Usersearch Blog - Helping you stay safe online","publisher":{"@id":"https:\/\/usersearch.org\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/usersearch.org\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"ImageObject","@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#primaryimage","inLanguage":"en-US","url":"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg","contentUrl":"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg","width":1024,"height":768,"caption":"build a webcrawler for osint"},{"@type":"WebPage","@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#webpage","url":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/","name":"How to build a Web-Crawler for OSINT -","isPartOf":{"@id":"https:\/\/usersearch.org\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#primaryimage"},"datePublished":"2021-11-13T14:52:44+00:00","dateModified":"2021-11-15T15:20:41+00:00","description":"We show you how to build osint web crawlers to perform open-source research (OSINT), to automate your online investigations.","breadcrumb":{"@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/usersearch.org\/blog\/"},{"@type":"ListItem","position":2,"name":"How to build a Web-Crawler for OSINT"}]},{"@type":"Article","@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#article","isPartOf":{"@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#webpage"},"author":{"@id":"https:\/\/usersearch.org\/blog\/#\/schema\/person\/7e34b69a5ae2693e33082d4d643a37e5"},"headline":"How to build a Web-Crawler for OSINT","datePublished":"2021-11-13T14:52:44+00:00","dateModified":"2021-11-15T15:20:41+00:00","mainEntityOfPage":{"@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#webpage"},"wordCount":1536,"commentCount":0,"publisher":{"@id":"https:\/\/usersearch.org\/blog\/#organization"},"image":{"@id":"https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#primaryimage"},"thumbnailUrl":"https:\/\/usersearch.org\/blog\/wp-content\/uploads\/2015\/09\/3_code-matrix-9449691.jpeg","keywords":["webcrawler"],"articleSection":["Open Source Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/usersearch.org\/blog\/index.php\/2021\/11\/13\/how-to-build-a-web-crawler-for-osint\/#respond"]}]},{"@type":"Person","@id":"https:\/\/usersearch.org\/blog\/#\/schema\/person\/7e34b69a5ae2693e33082d4d643a37e5","name":"Jamie","image":{"@type":"ImageObject","@id":"https:\/\/usersearch.org\/blog\/#personlogo","inLanguage":"en-US","url":"https:\/\/secure.gravatar.com\/avatar\/db798ccc802968f420608212f5301ca1?s=96&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/db798ccc802968f420608212f5301ca1?s=96&r=g","caption":"Jamie"},"sameAs":["https:\/\/www.usersearch.org"],"url":"https:\/\/usersearch.org\/blog\/index.php\/author\/jamie\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/posts\/138"}],"collection":[{"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=138"}],"version-history":[{"count":41,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/posts\/138\/revisions"}],"predecessor-version":[{"id":383,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/posts\/138\/revisions\/383"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/media\/88"}],"wp:attachment":[{"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=138"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=138"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/usersearch.org\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=138"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}