Merge branch 'main' of https://github.com/NanmiCoder/MediaCrawler

# Conflicts: # README.md
2024-06-27 21:26:15 +08:00 · 2024-06-27 21:26:15 +08:00 · 71c07c7d36
commit 71c07c7d36
parent d02ccc4fa5 86a88f7260
33 changed files with 693 additions and 71 deletions
--- a/.github/workflows/main.yaml
+++ b/.github/workflows/main.yaml
@ -0,0 +1,17 @@
 on:
    push:
        branches:
            - main
 jobs:
    contrib-readme-job:
        runs-on: ubuntu-latest
        name: A job to automate contrib in readme
        permissions:
          contents: write
          pull-requests: write
        steps:
            - name: Contribute List
              uses: akhilmhdh/contributors-readme-action@v2.3.10
              env:
                  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
--- a/README.md
+++ b/README.md
@ -22,12 +22,11 @@
 |-----|-------|----------|-----|--------|-------|-------|-------|
 | 小红书 | ✅     | ✅        | ✅   | ✅      | ✅     | ✅     | ✅    |
 | 抖音  | ✅     | ✅        | ✅    | ✅       | ✅     | ✅     | ✅    |
-| 快手  | ✅     | ✅        | ❌   | ❌      | ✅     | ✅     | ✅    |
+| 快手  | ✅     | ✅        | ✅   | ✅      | ✅     | ✅     | ✅    |
 | B 站 | ✅     | ✅        | ✅   | ✅      | ✅     | ✅     | ✅    |
 | 微博  | ✅     | ✅        | ❌   | ❌      | ✅     | ✅     | ✅    |
 ## 使用方法
 ### 创建并激活 python 虚拟环境
@ -87,12 +86,19 @@
 ## 开发者服务
 - 知识星球：沉淀高质量常见问题、最佳实践文档、多年编程+爬虫经验分享，提供付费知识星球服务，主动提问，作者会定期回答问题
  <p>
-  <img alt="星球图片" src="static/images/xingqiu.jpg" style="width: auto;height: 400px" >
+  <img alt="xingqiu" src="https://nm.zizhi1.com/static/img/8e1312d1f52f2e0ff436ea7196b4e27b.15555424244122T1.webp" style="width: auto;height: 400px" >
  <img alt="星球图片" src="static/images/xingqiu_yh.png" style="width: auto;height: 400px" >
  </p>
-  前20个入驻星球的小伙伴，将获得新人券50元，还剩14张。
+  
-  <br>
+  星球精选文章： 
- 视频课程：
+  - [【独创】使用Playwright获取某音a_bogus参数流程（包含加密参数分析）](https://articles.zsxq.com/id_u89al50jk9x0.html)
  - [【独创】使用Playwright低成本获取某书X-s参数流程分析（当年的回忆录）](https://articles.zsxq.com/id_u4lcrvqakuc7.html)
  - [ MediaCrawler-基于抽象类设计重构项目缓存](https://articles.zsxq.com/id_4ju73oxewt9j.html)
  - [ 手把手带你撸一个自己的IP代理池](https://articles.zsxq.com/id_38fza371ladm.html)
  - 每天 1 快钱订阅我的知识服务
 - MediaCrawler视频课程：
  > 如果你想很快入门这个项目，或者想了具体实现原理，我推荐你看看这个视频课程，从设计出发一步步带你如何使用，门槛大大降低，同时也是对我开源的支持，如果你能支持我的课程，我将会非常开心～<br>
  > 课程售价非常非常的便宜，几杯咖啡的事儿.<br>
  > 课程介绍飞书文档链接：https://relakkes.feishu.cn/wiki/JUgBwdhIeiSbAwkFCLkciHdAnhh
@ -115,7 +121,7 @@
 > 7天有效期，自动更新, 如果人满了可以加作者wx拉进群: yzglan，备注来自github.
 <div style="max-width: 200px">  
-<p><img alt="10群二维码" src="static/images/10群二维码.JPG" style="width: 200px;height: 100%" ></p>
+<p><img alt="11群二维码" src="static/images/11群二维码.JPG" style="width: 200px;height: 100%" ></p>
 </div>
@ -144,11 +150,220 @@
 ## 手机号登录说明
 ➡️➡️➡️ [手机号登录说明](docs/手机号登录说明.md)
 <<<<<<< HEAD
 ## 词云图相关操作说明
 ➡️➡️➡️ [词云图相关说明](docs/关于词云图相关操作.md)
 =======
 ## 项目贡献者
 <!-- readme: contributors -start -->
 <table>
 	<tbody>
 		<tr>
            <td align="center">
                <a href="https://github.com/NanmiCoder">
                    <img src="https://avatars.githubusercontent.com/u/47178017?v=4" width="100;" alt="NanmiCoder"/>
                    <br />
                    <sub><b>程序员阿江-Relakkes</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/leantli">
                    <img src="https://avatars.githubusercontent.com/u/117699758?v=4" width="100;" alt="leantli"/>
                    <br />
                    <sub><b>leantli</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/BaoZhuhan">
                    <img src="https://avatars.githubusercontent.com/u/140676370?v=4" width="100;" alt="BaoZhuhan"/>
                    <br />
                    <sub><b>Bao Zhuhan</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/nelzomal">
                    <img src="https://avatars.githubusercontent.com/u/8512926?v=4" width="100;" alt="nelzomal"/>
                    <br />
                    <sub><b>zhounan</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Hiro-Lin">
                    <img src="https://avatars.githubusercontent.com/u/40111864?v=4" width="100;" alt="Hiro-Lin"/>
                    <br />
                    <sub><b>HIRO</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/PeanutSplash">
                    <img src="https://avatars.githubusercontent.com/u/98582625?v=4" width="100;" alt="PeanutSplash"/>
                    <br />
                    <sub><b>PeanutSplash</b></sub>
                </a>
            </td>
 		</tr>
 		<tr>
            <td align="center">
                <a href="https://github.com/Ermeng98">
                    <img src="https://avatars.githubusercontent.com/u/55784769?v=4" width="100;" alt="Ermeng98"/>
                    <br />
                    <sub><b>Ermeng</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Rosyrain">
                    <img src="https://avatars.githubusercontent.com/u/116946548?v=4" width="100;" alt="Rosyrain"/>
                    <br />
                    <sub><b>Rosyrain</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/henryhyn">
                    <img src="https://avatars.githubusercontent.com/u/5162443?v=4" width="100;" alt="henryhyn"/>
                    <br />
                    <sub><b>Henry He</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Akiqqqqqqq">
                    <img src="https://avatars.githubusercontent.com/u/51102894?v=4" width="100;" alt="Akiqqqqqqq"/>
                    <br />
                    <sub><b>leonardoqiuyu</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/jayeeliu">
                    <img src="https://avatars.githubusercontent.com/u/77389?v=4" width="100;" alt="jayeeliu"/>
                    <br />
                    <sub><b>jayeeliu</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/ZuWard">
                    <img src="https://avatars.githubusercontent.com/u/38209256?v=4" width="100;" alt="ZuWard"/>
                    <br />
                    <sub><b>ZuWard</b></sub>
                </a>
            </td>
 		</tr>
 		<tr>
            <td align="center">
                <a href="https://github.com/Zzendrix">
                    <img src="https://avatars.githubusercontent.com/u/154900254?v=4" width="100;" alt="Zzendrix"/>
                    <br />
                    <sub><b>Zendrix</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/chunpat">
                    <img src="https://avatars.githubusercontent.com/u/19848304?v=4" width="100;" alt="chunpat"/>
                    <br />
                    <sub><b>zhangzhenpeng</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/tanpenggood">
                    <img src="https://avatars.githubusercontent.com/u/37927946?v=4" width="100;" alt="tanpenggood"/>
                    <br />
                    <sub><b>Sam Tan</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/xbsheng">
                    <img src="https://avatars.githubusercontent.com/u/56357338?v=4" width="100;" alt="xbsheng"/>
                    <br />
                    <sub><b>xbsheng</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/yangrq1018">
                    <img src="https://avatars.githubusercontent.com/u/25074163?v=4" width="100;" alt="yangrq1018"/>
                    <br />
                    <sub><b>Martin</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/zhihuiio">
                    <img src="https://avatars.githubusercontent.com/u/165655688?v=4" width="100;" alt="zhihuiio"/>
                    <br />
                    <sub><b>zhihuiio</b></sub>
                </a>
            </td>
 		</tr>
 		<tr>
            <td align="center">
                <a href="https://github.com/renaissancezyc">
                    <img src="https://avatars.githubusercontent.com/u/118403818?v=4" width="100;" alt="renaissancezyc"/>
                    <br />
                    <sub><b>Ren</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Tianci-King">
                    <img src="https://avatars.githubusercontent.com/u/109196852?v=4" width="100;" alt="Tianci-King"/>
                    <br />
                    <sub><b>Wang Tianci</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Styunlen">
                    <img src="https://avatars.githubusercontent.com/u/30810222?v=4" width="100;" alt="Styunlen"/>
                    <br />
                    <sub><b>Styunlen</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Schofi">
                    <img src="https://avatars.githubusercontent.com/u/33537727?v=4" width="100;" alt="Schofi"/>
                    <br />
                    <sub><b>Schofi</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/Klu5ure">
                    <img src="https://avatars.githubusercontent.com/u/166240879?v=4" width="100;" alt="Klu5ure"/>
                    <br />
                    <sub><b>Klu5ure</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/keeper-jie">
                    <img src="https://avatars.githubusercontent.com/u/33612777?v=4" width="100;" alt="keeper-jie"/>
                    <br />
                    <sub><b>Kermit</b></sub>
                </a>
            </td>
 		</tr>
 		<tr>
            <td align="center">
                <a href="https://github.com/kexinoh">
                    <img src="https://avatars.githubusercontent.com/u/91727108?v=4" width="100;" alt="kexinoh"/>
                    <br />
                    <sub><b>KEXNA</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/aa65535">
                    <img src="https://avatars.githubusercontent.com/u/5417786?v=4" width="100;" alt="aa65535"/>
                    <br />
                    <sub><b>Jian Chang</b></sub>
                </a>
            </td>
            <td align="center">
                <a href="https://github.com/522109452">
                    <img src="https://avatars.githubusercontent.com/u/16929874?v=4" width="100;" alt="522109452"/>
                    <br />
                    <sub><b>tianqing</b></sub>
                </a>
            </td>
 		</tr>
 	<tbody>
 </table>
 <!-- readme: contributors -end -->
 >>>>>>> 86a88f72602fe3f692acc628427888487554b716
 ## star 趋势图
 - 如果该项目对你有帮助，star一下 ❤️❤️❤️
--- a/cmd_arg/arg.py
+++ b/cmd_arg/arg.py
@ -1,4 +1,5 @@
 import argparse
 import config
 from tools.utils import str2bool
--- a/config/base_config.py
+++ b/config/base_config.py
@ -3,8 +3,10 @@ PLATFORM = "xhs"
 KEYWORDS = "python,golang"
 LOGIN_TYPE = "qrcode"  # qrcode or phone or cookie
 COOKIES = ""
-# 具体值参见media_platform.xxx.field下的枚举值，展示只支持小红书
+# 具体值参见media_platform.xxx.field下的枚举值，暂时只支持小红书
 SORT_TYPE = "popularity_descending"
 # 具体值参见media_platform.xxx.field下的枚举值，暂时只支持抖音
 PUBLISH_TIME_TYPE = 0
 CRAWLER_TYPE = "search"  # 爬取类型，search(关键词搜索) | detail(帖子详情)| creator(创作者主页数据)
 # 是否开启 IP 代理
@ -103,6 +105,13 @@ BILI_CREATOR_ID_LIST = [
    # ........................
 ]
 # 指定快手创作者ID列表
 KS_CREATOR_ID_LIST = [
    "3x4sm73aye7jq7i",
    # ........................
 ]
 #词云相关
 #是否开启生成评论词云图
 ENABLE_GET_WORDCLOUD = False
@ -118,5 +127,3 @@ STOP_WORDS_FILE = "./docs/hit_stopwords.txt"
 #中文字体文件路径
 FONT_PATH= "./docs/STZHONGS.TTF"
--- a/media_platform/bilibili/core.py
+++ b/media_platform/bilibili/core.py
@ -106,6 +106,7 @@ class BilibiliCrawler(AbstractCrawler):
                    page += 1
                    continue
                utils.logger.info(f"[BilibiliCrawler.search] search bilibili keyword: {keyword}, page: {page}")
                video_id_list: List[str] = []
                videos_res = await self.bili_client.search_video_by_keyword(
                    keyword=keyword,
@ -126,7 +127,6 @@ class BilibiliCrawler(AbstractCrawler):
                    if video_item:
                        video_id_list.append(video_item.get("View").get("aid"))
                        await bilibili_store.update_bilibili_video(video_item)
                page += 1
                await self.batch_get_video_comments(video_id_list)
--- a/media_platform/bilibili/login.py
+++ b/media_platform/bilibili/login.py
@ -12,8 +12,8 @@ from playwright.async_api import BrowserContext, Page
 from tenacity import (RetryError, retry, retry_if_result, stop_after_attempt,
                      wait_fixed)
 from base.base_crawler import AbstractLogin
 import config
 from base.base_crawler import AbstractLogin
 from tools import utils
--- a/media_platform/douyin/client.py
+++ b/media_platform/douyin/client.py
@ -1,5 +1,6 @@
 import asyncio
 import copy
 import json
 import urllib.parse
 from typing import Any, Callable, Dict, List, Optional
@ -119,14 +120,19 @@ class DOUYINClient(AbstractApiClient):
        params = {
            "keyword": urllib.parse.quote(keyword),
            "search_channel": search_channel.value,
            "sort_type": sort_type.value,
            "publish_time": publish_time.value,
            "search_source": "normal_search",
-            "query_correct_type": "1",
+            "query_correct_type": 1,
-            "is_filter_search": "0",
+            "is_filter_search": 0,
            "offset": offset,
            "count": 10  # must be set to 10
        }
        if sort_type != SearchSortType.GENERAL or publish_time != PublishTimeType.UNLIMITED:
           params["filter_selected"] = urllib.parse.quote(json.dumps({
               "sort_type": str(sort_type.value),
               "publish_time": str(publish_time.value)
           }))
           params["is_filter_search"] = 1
           params["search_source"] = "tab_search"
        referer_url = "https://www.douyin.com/search/" + keyword
        referer_url += f"?publish_time={publish_time.value}&sort_type={sort_type.value}&type=general"
        headers = copy.copy(self.headers)
--- a/media_platform/douyin/core.py
+++ b/media_platform/douyin/core.py
@ -90,9 +90,10 @@ class DouYinCrawler(AbstractCrawler):
                    page += 1
                    continue
                try:
                    utils.logger.info(f"[DouYinCrawler.search] search douyin keyword: {keyword}, page: {page}")
                    posts_res = await self.dy_client.search_info_by_keyword(keyword=keyword,
-                                                                            offset=page * dy_limit_count,
+                                                                            offset=page * dy_limit_count - dy_limit_count,
-                                                                            publish_time=PublishTimeType.UNLIMITED
+                                                                            publish_time=PublishTimeType(config.PUBLISH_TIME_TYPE)
                                                                            )
                except DataFetchError:
                    utils.logger.error(f"[DouYinCrawler.search] search douyin keyword: {keyword} failed")
--- a/media_platform/douyin/field.py
+++ b/media_platform/douyin/field.py
@ -12,13 +12,12 @@ class SearchChannelType(Enum):
 class SearchSortType(Enum):
    """search sort type"""
    GENERAL = 0  # 综合排序
-    LATEST = 1  # 最新发布
+    MOST_LIKE = 1  # 最多点赞
-    MOST_LIKE = 2  # 最多点赞
+    LATEST = 2  # 最新发布
 class PublishTimeType(Enum):
    """publish time type"""
    UNLIMITED = 0  # 不限
    ONE_DAY = 1  # 一天内
-    ONE_WEEK = 2  # 一周内
+    ONE_WEEK = 7  # 一周内
-    SIX_MONTH = 3  # 半年内
+    SIX_MONTH = 180  # 半年内
--- a/media_platform/kuaishou/client.py
+++ b/media_platform/kuaishou/client.py
@ -1,7 +1,7 @@
 # -*- coding: utf-8 -*-
 import asyncio
 import json
-from typing import Any, Callable, Dict, Optional
+from typing import Any, Callable, Dict, List, Optional
 from urllib.parse import urlencode
 import httpx
@ -67,7 +67,7 @@ class KuaiShouClient(AbstractApiClient):
                "variables": {
                    "ftype": 1,
                },
-                "query": self.graphql.get("vision_profile")
+                "query": self.graphql.get("vision_profile_user_list")
            }
            res = await self.post("", post_data)
            if res.get("visionProfileUserList", {}).get("result") == 1:
@ -129,17 +129,60 @@ class KuaiShouClient(AbstractApiClient):
                "pcursor": pcursor
            },
            "query": self.graphql.get("comment_list")
        }
        return await self.post("", post_data)
-    async def get_video_all_comments(self, photo_id: str, crawl_interval: float = 1.0, is_fetch_sub_comments=False,
+    async def get_video_sub_comments(
-                                     callback: Optional[Callable] = None):
+        self, photo_id: str, rootCommentId: str, pcursor: str = ""
    ) -> Dict:
        """get video sub comments
        :param photo_id: photo id you want to fetch
        :param pcursor: last you get pcursor, defaults to ""
        :return:
        """
        post_data = {
            "operationName": "visionSubCommentList",
            "variables": {
                "photoId": photo_id,
                "pcursor": pcursor,
                "rootCommentId": rootCommentId,
            },
            "query": self.graphql.get("vision_sub_comment_list"),
        }
        return await self.post("", post_data)
    async def get_creator_profile(self, userId: str) -> Dict:
        post_data = {
            "operationName": "visionProfile",
            "variables": {
                "userId": userId
            },
            "query": self.graphql.get("vision_profile"),
        }
        return await self.post("", post_data)
    async def get_video_by_creater(self, userId: str, pcursor: str = "") -> Dict:
        post_data = {
            "operationName": "visionProfilePhotoList",
            "variables": {
                "page": "profile", 
                "pcursor": pcursor, 
                "userId": userId
            },
            "query": self.graphql.get("vision_profile_photo_list"),
        }
        return await self.post("", post_data)
    async def get_video_all_comments(
        self,
        photo_id: str,
        crawl_interval: float = 1.0,
        callback: Optional[Callable] = None,
    ):
        """
        get video all comments include sub comments
        :param photo_id:
        :param crawl_interval:
        :param is_fetch_sub_comments:
        :param callback:
        :return:
        """
@ -158,7 +201,107 @@ class KuaiShouClient(AbstractApiClient):
            result.extend(comments)
            await asyncio.sleep(crawl_interval)
-            if not is_fetch_sub_comments:
+            sub_comments = await self.get_comments_all_sub_comments(
-                continue
+                comments, photo_id, crawl_interval, callback
-            # todo handle get sub comments
+            )
            result.extend(sub_comments)
        return result
    async def get_comments_all_sub_comments(
        self,
        comments: List[Dict],
        photo_id,
        crawl_interval: float = 1.0,
        callback: Optional[Callable] = None,
    ) -> List[Dict]:
        """
        获取指定一级评论下的所有二级评论, 该方法会一直查找一级评论下的所有二级评论信息
        Args:
            comments: 评论列表
            photo_id: 视频id
            crawl_interval: 爬取一次评论的延迟单位（秒）
            callback: 一次评论爬取结束后
        Returns:
        """
        if not config.ENABLE_GET_SUB_COMMENTS:
            utils.logger.info(
                f"[KuaiShouClient.get_comments_all_sub_comments] Crawling sub_comment mode is not enabled"
            )
            return []
        result = []
        for comment in comments:
            sub_comments = comment.get("subComments")
            if sub_comments and callback:
                await callback(photo_id, sub_comments)
            sub_comment_pcursor = comment.get("subCommentsPcursor")
            if sub_comment_pcursor == "no_more":
                continue
            root_comment_id = comment.get("commentId")
            sub_comment_pcursor = ""
            while sub_comment_pcursor != "no_more":
                comments_res = await self.get_video_sub_comments(
                    photo_id, root_comment_id, sub_comment_pcursor
                )
                vision_sub_comment_list = comments_res.get("visionSubCommentList",{})
                sub_comment_pcursor = vision_sub_comment_list.get("pcursor", "no_more")
                comments = vision_sub_comment_list.get("subComments", {})
                if callback:
                    await callback(photo_id, comments)
                await asyncio.sleep(crawl_interval)
                result.extend(comments)
        return result
    async def get_creator_info(self, user_id: str) -> Dict:
        """
        eg: https://www.kuaishou.com/profile/3x4jtnbfter525a
        快手用户主页
        """
        visionProfile = await self.get_creator_profile(user_id)
        return visionProfile.get("userProfile")
    async def get_all_videos_by_creator(
        self,
        user_id: str,
        crawl_interval: float = 1.0,
        callback: Optional[Callable] = None,
    ) -> List[Dict]:
        """
        获取指定用户下的所有发过的帖子，该方法会一直查找一个用户下的所有帖子信息
        Args:
            user_id: 用户ID
            crawl_interval: 爬取一次的延迟单位（秒）
            callback: 一次分页爬取结束后的更新回调函数
        Returns:
        """
        result = []
        pcursor = ""
        while pcursor != "no_more":
            videos_res = await self.get_video_by_creater(user_id, pcursor)
            if not videos_res:
                utils.logger.error(
                    f"[KuaiShouClient.get_all_videos_by_creator] The current creator may have been banned by ks, so they cannot access the data."
                )
                break
            vision_profile_photo_list = videos_res.get("visionProfilePhotoList", {})
            pcursor = vision_profile_photo_list.get("pcursor", "")
            videos = vision_profile_photo_list.get("feeds", [])
            utils.logger.info(
                f"[KuaiShouClient.get_all_videos_by_creator] got user_id:{user_id} videos len : {len(videos)}"
            )
            if callback:
                await callback(videos)
            await asyncio.sleep(crawl_interval)
            result.extend(videos)
        return result
--- a/media_platform/kuaishou/core.py
+++ b/media_platform/kuaishou/core.py
@ -65,11 +65,14 @@ class KuaishouCrawler(AbstractCrawler):
            crawler_type_var.set(config.CRAWLER_TYPE)
            if config.CRAWLER_TYPE == "search":
-                # Search for notes and retrieve their comment information.
+                # Search for videos and retrieve their comment information.
                await self.search()
            elif config.CRAWLER_TYPE == "detail":
                # Get the information and comments of the specified post
                await self.get_specified_videos()
            elif config.CRAWLER_TYPE == "creator":
                # Get creator's information and their videos and comments
                await self.get_creators_and_videos()
            else:
                pass
@ -89,7 +92,7 @@ class KuaishouCrawler(AbstractCrawler):
                    utils.logger.info(f"[KuaishouCrawler.search] Skip page: {page}")
                    page += 1
                    continue
-                
+                utils.logger.info(f"[KuaishouCrawler.search] search kuaishou keyword: {keyword}, page: {page}")
                video_id_list: List[str] = []
                videos_res = await self.ks_client.search_info_by_keyword(
                    keyword=keyword,
@ -135,7 +138,7 @@ class KuaishouCrawler(AbstractCrawler):
                utils.logger.error(f"[KuaishouCrawler.get_video_info_task] Get video detail error: {ex}")
                return None
            except KeyError as ex:
-                utils.logger.error(f"[KuaishouCrawler.get_video_info_task] have not fund note detail video_id:{video_id}, err: {ex}")
+                utils.logger.error(f"[KuaishouCrawler.get_video_info_task] have not fund video detail video_id:{video_id}, err: {ex}")
                return None
    async def batch_get_video_comments(self, video_id_list: List[str]):
@ -145,7 +148,7 @@ class KuaishouCrawler(AbstractCrawler):
        :return:
        """
        if not config.ENABLE_GET_COMMENTS:
-            utils.logger.info(f"[KuaishouCrawler.batch_get_note_comments] Crawling comment mode is not enabled")
+            utils.logger.info(f"[KuaishouCrawler.batch_get_video_comments] Crawling comment mode is not enabled")
            return
        utils.logger.info(f"[KuaishouCrawler.batch_get_video_comments] video ids:{video_id_list}")
@ -200,10 +203,10 @@ class KuaishouCrawler(AbstractCrawler):
        return playwright_proxy, httpx_proxy
    async def create_ks_client(self, httpx_proxy: Optional[str]) -> KuaiShouClient:
-        """Create xhs client"""
+        """Create ks client"""
        utils.logger.info("[KuaishouCrawler.create_ks_client] Begin create kuaishou API client ...")
        cookie_str, cookie_dict = utils.convert_cookies(await self.browser_context.cookies())
-        xhs_client_obj = KuaiShouClient(
+        ks_client_obj = KuaiShouClient(
            proxies=httpx_proxy,
            headers={
                "User-Agent": self.user_agent,
@ -215,7 +218,7 @@ class KuaishouCrawler(AbstractCrawler):
            playwright_page=self.context_page,
            cookie_dict=cookie_dict,
        )
-        return xhs_client_obj
+        return ks_client_obj
    async def launch_browser(
            self,
@ -246,6 +249,39 @@ class KuaishouCrawler(AbstractCrawler):
            )
            return browser_context
    async def get_creators_and_videos(self) -> None:
        """Get creator's videos and retrieve their comment information."""
        utils.logger.info("[KuaiShouCrawler.get_creators_and_videos] Begin get kuaishou creators")
        for user_id in config.KS_CREATOR_ID_LIST:
            # get creator detail info from web html content
            createor_info: Dict = await self.ks_client.get_creator_info(user_id=user_id)
            if createor_info:
                await kuaishou_store.save_creator(user_id, creator=createor_info)
            # Get all video information of the creator
            all_video_list = await self.ks_client.get_all_videos_by_creator(
                user_id = user_id,
                crawl_interval = random.random(),
                callback = self.fetch_creator_video_detail
            )
            video_ids = [video_item.get("photo", {}).get("id") for video_item in all_video_list]
            await self.batch_get_video_comments(video_ids)
    async def fetch_creator_video_detail(self, video_list: List[Dict]):
        """
        Concurrently obtain the specified post list and save the data
        """
        semaphore = asyncio.Semaphore(config.MAX_CONCURRENCY_NUM)
        task_list = [
            self.get_video_info_task(post_item.get("photo", {}).get("id"), semaphore) for post_item in video_list
        ]
        video_details = await asyncio.gather(*task_list)
        for video_detail in video_details:
            if video_detail is not None:
                await kuaishou_store.update_kuaishou_video(video_detail)
    async def close(self):
        """Close browser context"""
        await self.browser_context.close()
--- a/media_platform/kuaishou/graphql.py
+++ b/media_platform/kuaishou/graphql.py
@ -11,7 +11,7 @@ class KuaiShouGraphQL:
        self.load_graphql_queries()
    def load_graphql_queries(self):
-        graphql_files = ["search_query.graphql", "video_detail.graphql", "comment_list.graphql", "vision_profile.graphql"]
+        graphql_files = ["search_query.graphql", "video_detail.graphql", "comment_list.graphql", "vision_profile.graphql","vision_profile_photo_list.graphql","vision_profile_user_list.graphql","vision_sub_comment_list.graphql"]
        for file in graphql_files:
            with open(self.graphql_dir + file, mode="r") as f:
--- a/media_platform/kuaishou/graphql/vision_profile.graphql
+++ b/media_platform/kuaishou/graphql/vision_profile.graphql
@ -1,16 +1,27 @@
-query visionProfileUserList($pcursor: String, $ftype: Int) {
+query visionProfile($userId: String) {
-  visionProfileUserList(pcursor: $pcursor, ftype: $ftype) {
+  visionProfile(userId: $userId) {
    result
-    fols {
+    hostName
-      user_name
+    userProfile {
-      headurl
+      ownerCount {
-      user_text
+        fan
        photo
        follow
        photo_public
        __typename
      }
      profile {
        gender
        user_name
        user_id
        headurl
        user_text
        user_profile_bg_url
        __typename
      }
      isFollowing
      user_id
      __typename
    }
    hostName
    pcursor
    __typename
  }
 }
--- a/media_platform/kuaishou/graphql/vision_profile_photo_list.graphql
+++ b/media_platform/kuaishou/graphql/vision_profile_photo_list.graphql
@ -0,0 +1,110 @@
 fragment photoContent on PhotoEntity {
  __typename
  id
  duration
  caption
  originCaption
  likeCount
  viewCount
  commentCount
  realLikeCount
  coverUrl
  photoUrl
  photoH265Url
  manifest
  manifestH265
  videoResource
  coverUrls {
    url
    __typename
  }
  timestamp
  expTag
  animatedCoverUrl
  distance
  videoRatio
  liked
  stereoType
  profileUserTopPhoto
  musicBlocked
  riskTagContent
  riskTagUrl
 }
 fragment recoPhotoFragment on recoPhotoEntity {
  __typename
  id
  duration
  caption
  originCaption
  likeCount
  viewCount
  commentCount
  realLikeCount
  coverUrl
  photoUrl
  photoH265Url
  manifest
  manifestH265
  videoResource
  coverUrls {
    url
    __typename
  }
  timestamp
  expTag
  animatedCoverUrl
  distance
  videoRatio
  liked
  stereoType
  profileUserTopPhoto
  musicBlocked
  riskTagContent
  riskTagUrl
 }
 fragment feedContent on Feed {
  type
  author {
    id
    name
    headerUrl
    following
    headerUrls {
      url
      __typename
    }
    __typename
  }
  photo {
    ...photoContent
    ...recoPhotoFragment
    __typename
  }
  canAddComment
  llsid
  status
  currentPcursor
  tags {
    type
    name
    __typename
  }
  __typename
 }
 query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {
  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {
    result
    llsid
    webPageArea
    feeds {
      ...feedContent
      __typename
    }
    hostName
    pcursor
    __typename
  }
 }
--- a/media_platform/kuaishou/graphql/vision_profile_user_list.graphql
+++ b/media_platform/kuaishou/graphql/vision_profile_user_list.graphql
@ -0,0 +1,16 @@
 query visionProfileUserList($pcursor: String, $ftype: Int) {
  visionProfileUserList(pcursor: $pcursor, ftype: $ftype) {
    result
    fols {
      user_name
      headurl
      user_text
      isFollowing
      user_id
      __typename
    }
    hostName
    pcursor
    __typename
  }
 }
--- a/media_platform/kuaishou/graphql/vision_sub_comment_list.graphql
+++ b/media_platform/kuaishou/graphql/vision_sub_comment_list.graphql
@ -0,0 +1,22 @@
 mutation visionSubCommentList($photoId: String, $rootCommentId: String, $pcursor: String) {
  visionSubCommentList(photoId: $photoId, rootCommentId: $rootCommentId, pcursor: $pcursor) {
    pcursor
    subComments {
      commentId
      authorId
      authorName
      content
      headurl
      timestamp
      likedCount
      realLikedCount
      liked
      status
      authorLiked
      replyToUserName
      replyTo
      __typename
    }
    __typename
  }
 }
--- a/media_platform/kuaishou/login.py
+++ b/media_platform/kuaishou/login.py
@ -7,6 +7,7 @@ from playwright.async_api import BrowserContext, Page
 from tenacity import (RetryError, retry, retry_if_result, stop_after_attempt,
                      wait_fixed)
 import config
 from base.base_crawler import AbstractLogin
 from tools import utils
@ -57,7 +58,7 @@ class KuaishouLogin(AbstractLogin):
        # click login button
        login_button_ele = self.context_page.locator(
-            "xpath=//p[text()=' 登录 ']"
+            "xpath=//p[text()='登录']"
        )
        await login_button_ele.click()
--- a/media_platform/weibo/core.py
+++ b/media_platform/weibo/core.py
@ -108,7 +108,7 @@ class WeiboCrawler(AbstractCrawler):
                    utils.logger.info(f"[WeiboCrawler.search] Skip page: {page}")
                    page += 1
                    continue
-                
+                utils.logger.info(f"[WeiboCrawler.search] search weibo keyword: {keyword}, page: {page}")
                search_res = await self.wb_client.get_note_by_keyword(
                    keyword=keyword,
                    page=page,
--- a/media_platform/weibo/login.py
+++ b/media_platform/weibo/login.py
@ -12,6 +12,7 @@ from playwright.async_api import BrowserContext, Page
 from tenacity import (RetryError, retry, retry_if_result, stop_after_attempt,
                      wait_fixed)
 import config
 from base.base_crawler import AbstractLogin
 from tools import utils
--- a/media_platform/xhs/core.py
+++ b/media_platform/xhs/core.py
@ -102,6 +102,7 @@ class XiaoHongShuCrawler(AbstractCrawler):
                    continue
                try:
                    utils.logger.info(f"[XiaoHongShuCrawler.search] search xhs keyword: {keyword}, page: {page}")
                    note_id_list: List[str] = []
                    notes_res = await self.xhs_client.get_note_by_keyword(
                        keyword=keyword,
--- a/static/images/10群二维码.JPG
+++ b/static/images/10群二维码.JPG
--- a/static/images/11群二维码.JPG
+++ b/static/images/11群二维码.JPG
--- a/static/images/9群二维码.JPG
+++ b/static/images/9群二维码.JPG
--- a/static/images/xingqiu.jpg
+++ b/static/images/xingqiu.jpg
--- a/static/images/xingqiu_yh.png
+++ b/static/images/xingqiu_yh.png
--- a/store/bilibili/bilibili_store_impl.py
+++ b/store/bilibili/bilibili_store_impl.py
@ -13,9 +13,9 @@ import aiofiles
 import config
 from base.base_crawler import AbstractStore
-from tools import utils
+from tools import utils, words
 from var import crawler_type_var
-from tools import words
+
 def calculate_number_of_files(file_store_path: str) -> int:
    """计算数据保存文件的前部分排序数字，支持每次运行代码不写到同一个文件中
--- a/store/douyin/douyin_store_impl.py
+++ b/store/douyin/douyin_store_impl.py
@ -11,10 +11,10 @@ from typing import Dict
 import aiofiles
 from base.base_crawler import AbstractStore
 from tools import utils,words
 from var import crawler_type_var
 import config
 from base.base_crawler import AbstractStore
 from tools import utils, words
 from var import crawler_type_var
 def calculate_number_of_files(file_store_path: str) -> int:
--- a/store/kuaishou/init.py
+++ b/store/kuaishou/init.py
@ -76,3 +76,22 @@ async def update_ks_video_comment(video_id: str, comment_item: Dict):
    utils.logger.info(
        f"[store.kuaishou.update_ks_video_comment] Kuaishou video comment: {comment_id}, content: {save_comment_item.get('content')}")
    await KuaishouStoreFactory.create_store().store_comment(comment_item=save_comment_item)
 async def save_creator(user_id: str, creator: Dict):
    ownerCount = creator.get('ownerCount', {})
    profile = creator.get('profile', {})
    local_db_item = {
        'user_id': user_id,
        'nickname': profile.get('user_name'),
        'gender': '女' if profile.get('gender') == "F" else '男',
        'avatar': profile.get('headurl'),
        'desc': profile.get('user_text'),
        'ip_location': "",
        'follows': ownerCount.get("follow"),
        'fans': ownerCount.get("fan"),
        'interaction': ownerCount.get("photo_public"),
        "last_modify_ts": utils.get_current_timestamp(),
    }
    utils.logger.info(f"[store.kuaishou.save_creator] creator:{local_db_item}")
    await KuaishouStoreFactory.create_store().store_creator(local_db_item)
--- a/store/kuaishou/kuaishou_store_impl.py
+++ b/store/kuaishou/kuaishou_store_impl.py
@ -11,10 +11,11 @@ from typing import Dict
 import aiofiles
 from base.base_crawler import AbstractStore
 from tools import utils,words
 from var import crawler_type_var
 import config
 from base.base_crawler import AbstractStore
 from tools import utils, words
 from var import crawler_type_var
 def calculate_number_of_files(file_store_path: str) -> int:
    """计算数据保存文件的前部分排序数字，支持每次运行代码不写到同一个文件中
@ -205,3 +206,14 @@ class KuaishouJsonStoreImplement(AbstractStore):
        """
        await self.save_data_to_json(comment_item, "comments")
    async def store_creator(self, creator: Dict):
        """
        Kuaishou content JSON storage implementation
        Args:
            creator: creator dict
        Returns:
        """
        await self.save_data_to_json(creator, "creator")
--- a/store/weibo/weibo_store_impl.py
+++ b/store/weibo/weibo_store_impl.py
@ -11,10 +11,11 @@ from typing import Dict
 import aiofiles
 from base.base_crawler import AbstractStore
 from tools import utils,words
 from var import crawler_type_var
 import config
 from base.base_crawler import AbstractStore
 from tools import utils, words
 from var import crawler_type_var
 def calculate_number_of_files(file_store_path: str) -> int:
    """计算数据保存文件的前部分排序数字，支持每次运行代码不写到同一个文件中
--- a/store/xhs/init.py
+++ b/store/xhs/init.py
@ -113,7 +113,7 @@ async def save_creator(user_id: str, creator: Dict):
        'gender': '女' if user_info.get('gender') == 1 else '男',
        'avatar': user_info.get('images'),
        'desc': user_info.get('desc'),
-        'ip_location': user_info.get('ip_location'),
+        'ip_location': user_info.get('ipLocation'),
        'follows': follows,
        'fans': fans,
        'interaction': interaction,
--- a/store/xhs/xhs_store_impl.py
+++ b/store/xhs/xhs_store_impl.py
@ -11,10 +11,11 @@ from typing import Dict
 import aiofiles
 from base.base_crawler import AbstractStore
 from tools import utils,words
 from var import crawler_type_var
 import config
 from base.base_crawler import AbstractStore
 from tools import utils, words
 from var import crawler_type_var
 def calculate_number_of_files(file_store_path: str) -> int:
    """计算数据保存文件的前部分排序数字，支持每次运行代码不写到同一个文件中
--- a/tools/words.py
+++ b/tools/words.py
@ -1,10 +1,12 @@
 import aiofiles
 import asyncio
 import jieba
 from collections import Counter
 from wordcloud import WordCloud
 import json
 from collections import Counter
 import aiofiles
 import jieba
 import matplotlib.pyplot as plt
 from wordcloud import WordCloud
 import config
 from tools import utils