CENTIPEDE is a high-performance cloud-based web harvesting platform built using NLP Tools & Techniques. CENTIPEDE performs:
- NLP based Data crawling for targeted websites
- Automated and accurate data collection through quality monitoring layer
- Output data is provided in a custom file format
Quality Monitoring Layer
96% completeness & accuracy of data with a “quality monitoring layer” on top of the crawler engine.
Data Aggregator - Hybrid Approach
- For certain websites like Ticketmaster, the initial data aggregation would check for the API (if exists) and leverage it for getting event data
- In case of failure it would fall back to crawling the data with IP masking (to avoid blacklisting)
- The advantage of the hybrid approach allows maximum data collection if the APIs are available and reduces the effort and optimizes the performance of crawling
CENTIPEDE can be customized to your specific need
CENTIPEDE is designed to configure the specific set of websites to crawl
Data crawling/extraction on regular intervals
Considered a specific set of “mandatory” and “nice to have” fields – provided by the client
Quality Monitoring Layer to ensure up to 96% completeness and accuracy
An automated quality monitoring layer will ensure completeness and accuracy
Stores the crawled/extracted data in your FTP/SFTP
The extracted data will be stored in “CSV” format with a date and timestamp
Components considered for pricing
- Hosted on AWS - provides high availability
- Uses high performance clusters to run the scraping process in parallel
- Apart from the infrastructure costs, to ensure speed and availability, we have built a quality monitoring layer that compares the scraped data to the screenshots captured while scraping. This uses OCR techniques and ensures that the data scraped is accurate
Request a Demo
Leave us your details to schedule a quick demo.
Key ContactsThe secret of getting ahead is starting. Get in touch with our practice head/leader to initiate a productive exchange of ideas.
Senior Manager, Business Developmentchaitanya.email@example.com
Chaitanya has more than 13 years of professional experience in the IT services industry. Throughout his career, Chaitanya’s sales target geography has been the USA.
Ankur has been dwelling at the intersection of mobile app development, computational analytics and deep learning for almost 10 years. Ankur specializes in the AI domain.
Associate Technical Architectkarthikeyan.firstname.lastname@example.org
Karthik is the technology head for mobile development services at CES, Chennai. Karthik has over 7+ years of experience in architecture / design / development of mobile applications
Rohit Vipin Mathew
Rohit is a results-driven, customer-focused developer with over 7 years of managerial and software development expertise across diverse domains. Currently, he is a Technical Architect at CES IT.