关于头条人群包protobuf格式的php(python)解决方案

码农天地 - 2020-11-28 03:47:33

继上文： PHP使用protobuf

虽然php能序列化和反序列化，但是奈何头条不认啊,最后使用了python脚本的形式，去序列化，但很快就暴露出了问题，速度太慢！几万个设备号要序列化2小时+，当然主要的原因在于当时赶时间，是一个个设备号序列化的，大量的时间花在python上下文切换上，上文里的脚本能用，但是不适合稍微量大一点的场景，故而用三脚猫的功夫写了一个新的python脚本，接受文件，吐出序列化后的新文件，速度大大提升，实测大概1000/s个设备号。

from __future__ import print_function
import DmpDataProtoV2_pb2
import os,sys
import time
import base64


ag_len = sys.argv.__len__()
if ag_len <= 1:
    print ('ag is null')
    exit()
file = sys.argv[1]
if not file.strip():
    print ('files is null')
    exit()
if not os.path.exists(file):
    print ('files is not exists')
    exit()
f = open(file)

line = f.readline()
line=line.strip('\n')
base_name = os.path.splitext(file)[0]
target_file = base_name + '-ProtoBuf.txt'
print(target_file)
# if os.path.exists(target_file)::
#     os.remove(target_file)
t = open(target_file, 'w')
t.truncate()
while line:
    line=line.strip('\n')
    if not line.strip():
        continue
    arr = line.split('|')
    if arr.__len__() != 2:
        continue
    dmp_data  = DmpDataProtoV2_pb2.DmpData()
    id_item1  = dmp_data.idList.add()
    dtype     = arr[0]
    dev_id    = arr[1]
    id_item1.dataType = getattr(DmpDataProtoV2_pb2.IdItem,dtype)
    #id_item1.dataType = DmpDataProtoV2_pb2.IdItem.IDFA
    id_item1.id = str.lower(dev_id)
    id_item1.tags.append(dtype)
    # id_item1.timestamp = int(time.time())

    binary_string  = dmp_data.SerializeToString()
    s = base64.b64encode(binary_string)
    t.write(s+"\n");
    line = f.readline()
    line=line.strip('\n')
f.close()

PHP调用部分

//从py重写
$protobuf_path = shell_exec("python ".base_path()."/scripts/python/base64DmpItemByFile.py {$file_path}");

Done!

特别申明：本文内容来源网络，版权归原作者所有，如有侵权请立即与我们联系（cy198701067573@163.com），我们将及时处理。

php介绍

PHP即“超文本预处理器”，是一种通用开源脚本语言。PHP是在服务器端执行的脚本语言，与C语言类似，是常用的网站编程语言。PHP独特的语法混合了C、Java、Perl以及 PHP 自创的语法。利于学习，使用广泛，主要适用于Web开发领域。
更多关于php的阅读

上一篇：短信验证码注册登录的实现，php接入的3种方法（附示例）

下一篇： php培训比较好的机构是哪个

Tags 标签

protobuf php

扩展阅读

加个好友，技术交流

码农天地

首页 HTML/CSS WEB服务器 PHP Linux 数据库异常报错插件工具美文欣赏站长直荐

关于头条人群包protobuf格式的php(python)解决方案

php介绍

Tags 标签

扩展阅读

隐藏apache版本信息

CentOS 6.5安装php5.6

PHP版ZIP压缩解压类库

CentOS7.2安装 PHP7.3.4 操作详细教程

PHP 设置脚本超时时间、PHP脚本内存限制设置

PHP 函数filesize获取文件大小错误，一直不变

Linux php: command not found

php 缓冲区 buffer 原理

PHP中三种设置脚本最大执行时间的方法

加个好友，技术交流

码农天地