I will be running full-day workshops on crawling with StormCrawler. Please find the program below:
In this workshop, we will explore StormCrawler a collection of resources for building low-latency, large scale web crawlers on Apache Storm. After a short introduction to Apache Storm and an overview of what Storm-Crawler provides, we'll put it to use straight away for a simple crawl before moving on to the deployed mode of Storm.
In the second part of the session, we will then introduce metrics and index documents with Elasticsearch and Kibana and dive into data extraction. Finally, we'll cover recursive crawls and scalability. This course will be hands-on: attendees will run the code on their own machines.
This course will suit Java developers with an interest in big data, stream processing, web crawling and search. It will provide a practical introduction to both Apache Storm and Elasticsearch as well of course as StormCrawler and should not require advanced programming skills.
Duration : 2x3 hours
The first workshop should be on the 21st Feb in Berlin. I am planning to run a similar event in Bristol, UK in February or March. The cost depends on the number of attendants.
You can enroll for the Berlin one is on https://endoctus.com/course?id=284
For the Bristol workshop : please let me know (firstname.lastname@example.org) if you are interested and I will keep you updated.
PS: Do you follow DigitalPebble or StormCrawler on Twitter? Announcements and updates are made there (as well as all sorts of interesting news of course!)