The Joint ALMA Observatory (JAO) decided some years ago to become a data-centric operational facility, basing its operational decision-making processes on evidence and ensuring several efforts to adopt data science practices to its daily operations. Key non-profit collaborations allowed ALMA to work with Dataiku, empowering us to design projects to explore high data volumes and prepare solutions to enable informed operational decisions. To increase the capabilities of the data science platform, JAO invested on an in-house infrastructure, providing a Hadoop ecosystem which allowed processing big datasets in reasonable time. The provisioning of such ecosystems is laborious and expensive in terms of system administration effort, highlighting the need to explore alternatives. JAO sought to collaborate with cloud providers to investigate alternatives, deciding to experiment with Amazon Web Services (AWS). A key element to this decision was flexibility provided, and a practical hands-on explorative approach, which was close to JAO's vision. The relationship, formalized through a Memorandum of Understanding, enabled the development of a proof of concept (PoC) aiming to replicate the existing system on the cloud. Although the PoC might not impress as an ambitious goal, designing an architecture using the broad set of technologies offered by AWS to seamlessly work together with Dataiku was a non-trivial challenge on top of the limited six weeks available to complete it and the continuous learning of technologies and concepts. This paper summarizes our results, lessons learned, and key insights gained during our focused and successful rapid prototyping effort.
|