๋ฐ์—”์œผ๋กœ ์„ฑ์žฅ์ค‘ ๐ŸŒฑ

Python/[์›น ํฌ๋กค๋ง]

[ํŽธ์˜์  ํฌ๋กค๋ง] EMERT24

์จ๋ฐ 2023. 3. 24. 13:15

๐Ÿšฉ PLAN

 

 

1. EMERT24 ํŽธ์˜์  ๋งค์žฅ ์ •๋ณด ์›น ํŽ˜์ด์ง€์—์„œ ํŒŒ์ด์ฌ์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.

 

2. ๊ฐ€์ ธ์˜จ ๋ฐ์ดํ„ฐ๋ฅผ DataFrame ์œผ๋กœ ๋งŒ๋“ค๊ณ , ๋กœ์ปฌ์— pickle ํŒŒ์ผ๋กœ ์ €์žฅํ•œ๋‹ค.

 

 

 

 

[ ๋Œ€์ƒ ์‚ฌ์ดํŠธ ]

 

https://www.emart24.co.kr/store

 

 

 

 


 

 

๋จผ์ €, ๋Œ€์ƒ ์‚ฌ์ดํŠธ์˜ ๊ตฌ์กฐ๋ฅผ ์‚ดํŽด๋ณด์•˜๋‹ค.

 

 

์ด๋งˆํŠธ24๋Š” ๋ฉ”์ธ ํŽ˜์ด์ง€์—์„œ ๋งค์žฅ์ฐพ๊ธฐ๋ฅผ ํ†ตํ•ด ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋‹คํ–‰ํžˆ, ์ด๋งˆํŠธ24 ๊ฐ™์€ ๊ฒฝ์šฐ ์ง€์—ญ ์„ ํƒ ์˜ต์…˜์—์„œ '์ „์ฒด' ์˜ต์…˜์„ ํ†ตํ•ด ์ „์ฒด ๋งค์žฅ ์ •๋ณด๋ฅผ ํ•œ ๋ฒˆ์— ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

 

๊ฐœ๋ฐœ์ž ๋ชจ๋“œ๋ฅผ ํ†ตํ•ด ์ „์ฒด ์˜ต์…˜์„ ์„ ํƒํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•˜๋ฉด, ์˜ค๋ฅธ์ชฝ ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ api ์™€ ํ•ด๋‹น payload ๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

์ด๋ฒˆ์—๋Š” page key ๋งŒ ๋ฐ”๊พธ์–ด์ฃผ๋ฉฐ ๋ฐ˜๋ณตํ•ด์„œ ํŽ˜์ด์ง€๋ฅผ ์กฐํšŒํ•˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Œ์„ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

(ํ•œ ํŽ˜์ด์ง€๋งˆ๋‹ค 10๊ฐœ์”ฉ ๋งค์žฅ ์กฐํšŒ๊ฐ€ ๋˜๋ฏ€๋กœ, ๋งˆ์ง€๋ง‰ ํŽ˜์ด์ง€๋ฅผ ์กฐํšŒํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ์—†๊ฑฐ๋‚˜ 10๊ฐœ ์ดํ•˜์ผ ๊ฒƒ์ด๋‹ค.)

 

๊ทธ๋ฆฌ๊ณ  Preview๋ฅผ ํ†ตํ•ด ์–ด๋–ค ์‹์œผ๋กœ ์ •๋ณด๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š”์ง€ ๋ฏธ๋ฆฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

๊น”๋”ํ•˜๊ฒŒ ํ•„์š”ํ•œ ์ •๋ณด๋“ค์ด ๋”•์…”๋„ˆ๋ฆฌ ํ˜•ํƒœ๋กœ ์ž˜ ์ •๋ฆฌ๋˜์–ด ์žˆ์Œ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ์š”์ฒญํ•ด์„œ ๋ฐ›๊ฒŒ ๋˜๋ฉด, ์œ„์™€ ๊ฐ™์ด dictionary ํ˜•ํƒœ, ์ฆ‰ json ์„ ์‚ฌ์šฉํ•ด์„œ ์ •๋ณด๋ฅผ ๋ฐ›์„ ์ˆ˜ ์žˆ๊ฒ ๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค.

 

 

๋‚˜๋Š” ์ด๋Ÿฐ flow ๋กœ ์ ‘๊ทผํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์€ ์ฝ”๋“œ๋กœ ์ž‘์—…ํ–ˆ๋‹ค.

 

 

โŒจ๏ธ Code

 

import requests
import json
import pickle
import pandas as pd
from bs4 import BeautifulSoup as BS

payload = {
    "page": "1",
    "search": "",
    "AREA1": "",
    "AREA2": "",
    "SVR_24": "",
    "SVR_AUTO": "",
    "SVR_PARCEL": "",
    "SVR_ATM": "",
    "SVR_WINE": "",
    "SVR_COFFEE": "",
    "SVR_SMOOTH": "",
    "SVR_APPLE": "",
    "SVR_TOTO": "",
}

store_df = pd.DataFrame()

page_num = 1

while True:
    
    url = "https://www.emart24.co.kr/api1/store?page={}&search=&AREA1=&AREA2=&SVR_24=&SVR_AUTO=&SVR_PARCEL=&SVR_ATM=&SVR_WINE=&SVR_COFFEE=&SVR_SMOOTH=&SVR_APPLE=&SVR_TOTO=".format(page_num)
    
    payload['page'] = page_num
    
    r= requests.get(url, data=payload, verify=False)
    
    if(not r.json()['data']) :
        break

    for y in r.json()['data']:
        temp_df = pd.DataFrame(y, index=[0])
        store_df = pd.concat([store_df, temp_df], ignore_index=True)
    
    page_num += 1

# ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ปฌ๋Ÿผ๋ช… ๋ฐ”๊ฟ”์ฃผ๊ธฐ
store_df.columns = ['ํƒ๋ฐฐ', '์• ํ”Œ์•ก์„ธ์„œ๋ฆฌ', 'ํฌ์ŠคํŠธ', '๊ฒฝ๋„', '์—ฐ๋ฝ์ฒ˜', '์˜ˆ์•ฝํ”ฝ์—…', 'ํ์—…๋‚ ์งœ', '์ฆ‰์„๋ผ๋ฉด', '์น˜ํ‚จ', 'ํ† ํ† ', '์ปคํ”ผ', 'CODE', '์™€์ธ', '์ƒ์„ธ์œ„์น˜', '24์‹œ๊ฐ„', '์ฃผ์†Œ', '์˜์—…์ข…๋ฃŒ์‹œ๊ฐ„', '์ง€์ ๋ช…', '์œ„๋„', '์˜คํ”ˆ๋‚ ์งœ', '์˜์—…์‹œ์ž‘์‹œ๊ฐ„', '๋ฌด์ธ๋งค์žฅ', '์Šค๋ฌด๋””', 'ATM', 'KIND']

# pkl ๋กœ ์ €์žฅ
store_df.to_pickle("./emart24_store_all.pkl")

 

 

๐Ÿ“‹ DataFrame

 

์œ„ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด, ์ตœ์ข… DataFrame ์€ ์•„๋ž˜์™€ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค.