Configurations Docker Image We use official Airflow image. We have to install the necessary libraries and packages into the Airflow container. For that, we have to create a Dockerfile
1 2 3 4 5 6 FROM apache/airflow:2.10.2 USER airflow COPY requirements.txt /requirements.txt RUN pip install -r /requirements.txt 1 2 3 confluent-kafka cassandra-driver pymongo This Dockerfile will be used to install airflow:2.10.2. Then, it will install all necessary libraries in the requirements.
Chapter 9: Design a web crawler A web crawler is known as a robot or spider. It is widely used by search engines to discover new or updated content on the web. Content can be a web page, an image, a video, a PDF file, etc. A web crawler starts by collecting a few web pages and then follows links on those pages to collect new content.
Install Helm chart 1 brew install helm Install the Chart 1 2 3 4 5 6 7 8 9 {seilylook} ๐minikube start {seilylook} ๐helm repo add apache-airflow https://airflow.apache.org "apache-airflow" has been added to your repositories {seilylook} ๐ helm repo list NAME URL apache-airflow https://airflow.apache.org Upgrade the Chart 1 2 3 4 5 6 7 8 9 10 11 {seilylook} ๐ helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace {seilylook} ๐ ๎ฐ ~/Development/Devlog ๎ฐ ๎ main ยฑ ๎ฐ kubectl get pods -n airflow -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES airflow-postgresql-0 1/1 Running 0 9m10s 10.
Install and start Minikube Install the Minikube 1 2 curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64 sudo install minikube-darwin-amd64 /usr/local/bin/minikube Start minikube cluster and Check the status 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 {seilylook} ๐ minikube start ๐ Darwin 14.6.1 (arm64) ์ minikube v1.33.0 โจ ๊ธฐ์กด ํ๋กํ์ ๊ธฐ๋ฐํ์ฌ docker ๋๋ผ์ด๋ฒ๋ฅผ ์ฌ์ฉํ๋ ์ค ๐ Starting "minikube" primary control-plane node in "minikube" cluster ๐ Pulling base image v0.
Introduction ํ๋ถ ์์ Django๋ฅผ ์ฒ์ ๋ฐฐ์ธ ๋๋ฅผ ์ ์ธํ๊ณ ๊ฐ๋ง์ ์น ํฌ๋กค๋ง์ ๊ฒฝํํ ๊ธฐํ๊ฐ ์๊ฒผ๋ค. Beautifulsoup | Selenium์ ์ฌ์ฉํ ์๋ ์์ง๋ง ์ฐพ์๋ณด๋ ๋น
๋ฐ์ดํฐ ํน์ ๋ฅ๋ฌ๋์์ ๋ฐ์ดํฐ ํฌ๋กค๋ง์ ํ ๋ Scrapy๋ฅผ ๋ง์ด ์ฌ์ฉํ๋ค๋ ๊ฒ์ ๋ฐ๊ฒฌํ๊ณ ์ด๋ฒ ๊ธฐํ์ ์ฌ์ฉํด ๋ณด๊ธฐ๋ก ๋ง์ ๋จน๊ณ ๊ฐ๋จํ ๋ฐ์ดํฐ ํฌ๋กค๋ง์ ๊ตฌ์ถํด๋ดค๋ค.
์ค์น ๋ฐ ์์ 1 pip install scrapy 1 scrapy startproject arxiv_crawling startproject ๋ช
๋ น์ด๋ฅผ ์
๋ ฅํ๋ฉด ๋ค์๊ณผ ๊ฐ์ด scrapy๊ฐ ์๋์ผ๋ก ํ
ํ๋ฆฟ ํด๋๋ฅผ ์์ฑํด์ค๋ค.
์์ฑ๋ ํ๋ก์ ํธ ๋๋ ํ ๋ฆฌ๋ก ์ด๋ํด์ target url์ ๋ง๋ ํ๋ก์ ํธ๋ฅผ ์์ฑํด์ค๋ค.
Introduction ์ฟ ๋ฒ๋คํฐ์ค๋ฅผ ๊ณต๋ถํ์ง๋ง ์ ์ Docker์ ๋ํด์๋ ์ ๋๋ก ๊ณต๋ถํ ์ ์ด ์๋ ๊ฒ์ด ๋ง์์ ๋ค์ง ์์, ์ด๋ฒ ๊ธฐํ์ Docker๋ฅผ ์๋ฒฝํ ์ดํดํ๊ณ ๋ด ๊ฒ์ผ๋ก ๋ง๋ ๋ค.
๊ณต์ ๋ฌธ์๋ฅผ ์ฝ์ผ๋ฉด์ ์ดํด๋์ง ์๊ฑฐ๋ ์์ผ๋ก ๊ณ์ ์ฌ์ฉํด์ผ ํ ํต์ฌ ์ฝ๋ ์์ฃผ๋ก ์ ๋ฆฌํด ๋๋๋ค.
Docker Image ์์ฑ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 # Spark Docker # builder step used to download and configure spark environment FROM openjdk:11.