View Item 
      •   UMY Repository
      • 03. DISSERTATIONS AND THESIS
      • Students
      • Undergraduate Thesis
      • Faculty of Engineering
      • Department of Information Technology
      • View Item
      •   UMY Repository
      • 03. DISSERTATIONS AND THESIS
      • Students
      • Undergraduate Thesis
      • Faculty of Engineering
      • Department of Information Technology
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      TEKNIK SCRAPING DAN CRAWLING UNTUK MENGEKSTRAKSI REVIEW HOTEL ONLINE PADA WEBSITE TRAVELOKA (BERBASIS AJAX)

      Thumbnail
      View/Open
      COVER (362.2Kb)
      HALAMAN JUDUL (672.4Kb)
      HALAMAN PENGESAHAN (122.5Kb)
      ABSTRAK (517.2Kb)
      BAB I (646.0Kb)
      BAB II (832.5Kb)
      BAB III (780.0Kb)
      BAB IV (2.420Mb)
      BAB V (388.1Kb)
      DAFTAR PUSTAKA (280.3Kb)
      NASKAH PUBLIKASI (306.4Kb)
      Date
      2018-09-06
      Author
      OKTARIA, SELVI
      Metadata
      Show full item record
      Abstract
      The internet can be a source of public data available on various websites. The process of retrieving data from a website requires certain techniques because the data found on the website is unstructured data. Data retrieval or extraction techniques are known as scraping processes. A website also has many web pages that are interconnected so that techniques are also needed to be able to check all web pages where data will be taken. The technique for accessing linked web pages is called crawling. In the process of processing data from extraction, structured data is needed, therefore we need a scraping and crawling system that can produce structured data from a website. In this final project, it is explained about scraping and crawling techniques for extracting data from a website. Extracted data is hotel review data from the traveloka website. The use of javascript and ajax on a website makes accessing data on a website does not require refesh the entire web page. Data on the website can be displayed more interactively. To perform crawling on websites that use javascript and ajax, certain techniques are needed so that the scrawling system can interact with ajax and the scraping process can retrieve all the data on a web page. Scraping and crawling techniques are developed using and integrating various existing technologies. Scrapy which is a scraping and scrawling framework is an option in developing this technique. Selenium and chrome drivers are used to interact with ajax-based web. Elasticsearch are used as a place to store data from scarping through the item pipeline process. The development of scraping and scrawling techniques is carried out through several stages. The stage starts from evaluating the website that will be the source of the data to get the elements where the data is. The element selection is done by using the xpath selector. Xpath is used in scraling and crawling processes that are developed in spider in Scrapy framework. All of these techniques were developed using the Python programming language. The result of developing this technique is a scraping and crawling system to extract hotel review data from the traveloka web. The system can run steadily taking millions of hotel reviews. Data review data can also be stored and displayed properly in elasticsearch.
      URI
      http://repository.umy.ac.id/handle/123456789/22653
      Collections
      • Department of Information Technology

      DSpace software copyright © 2002-2015  DuraSpace
      Contact Us | Send Feedback
      Theme by 
      @mire NV
       

       

      Browse

      All of UMY RepositoryCollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

      My Account

      Login

      DSpace software copyright © 2002-2015  DuraSpace
      Contact Us | Send Feedback
      Theme by 
      @mire NV