Sr. Platform Engineer at Comcast responsible for optimizing Kubernetes infrastructure and managing large-scale Spark workloads. Collaborating with teams to ensure performance and reliability in data processing environments.
Responsibilities
Building, managing, and optimizing the underlying infrastructure and tools for large-scale data processing workloads.
Designing systems for collecting metrics (Prometheus) and visualizing data (Grafana).
Architecting and managing the platforms where Spark runs, such as Kubernetes clusters or cloud services like AWS (EKS).
Packaging Spark workloads and integrating them with orchestration systems.
Deploying Infrastructure via Terraform/Ansible and troubleshooting job failures.
Building automation and tools in languages like Python, Java, or Scala, Linux Scripting (Bash).
Implementing and maintaining systems for monitoring, logging, and alerting.
Developing and optimizing the data catalog platform (e.g., Apache Iceberg).
Collaborating with Data Stewards, Analysts, and Scientists to address data needs and issues.
Creating and maintaining documentation for Kubernetes infrastructure and providing training to team members.
Requirements
Bachelor's degree in computer science or a related field, or equivalent experience, typically 7 years in a DevOps or Systems Engineering role.
Expertise in Apache Spark: Deep understanding of Spark architecture, including RDDs, DataFrames, execution hierarchy, lazy evaluation, shuffling, and fault tolerance.
Proficiency in languages used for Spark development and automation, such as Python, Pyspark and Scala/Java.
Proficient in Linux Scripting (Bash).
Proficient in writing SQL.
Experience in CI/CD tools, Github.
Experience in setting up and using observability tools like Prometheus, Grafana etc.
Strong knowledge on Networking Protocols (TCP/IP, DNS, Load Balancer etc.) and hardware components.
Automation via Terraform/Ansible.
Hands-on experience with on-prem and major cloud providers (AWS, Azure, GCP) and container orchestration tools like Docker and Kubernetes.
Hands-on experience setting up IAM, VPC, EC2 etc.
Familiarity with related technologies and formats like Delta Lake, Apache Iceberg, Apache Kafka, Hadoop, and various data storage systems (S3, HDFS, etc.).
Hands-on experience with Databricks, Snowflake, Apache Iceberg, Unity Catalog, or similar tools.
Solid understanding of data lakes and governance.
Experience setting up, maintaining caching layers like Alluxio.
Strong analytical skills for debugging complex distributed systems issues.
Strong communication and collaboration abilities.
Benefits
Best-in-class Benefits to eligible employees
Expert guidance and always-on tools
Support physically, financially and emotionally during big milestones and in everyday life
Staff Platform Engineer responsible for defining and scaling data and ML platform at Mistplay. Leading teams in employing data strategies from raw ingestion to real - time model serving.
Senior Platform Engineer designing, building, and operating hybrid infrastructure solutions for a digital marketplace of used vehicles. Key responsibilities include improving operational efficiency and ensuring system reliability.
Engineer building systems within a mission - driven healthcare company focused on longevity. Collaborate, design, and innovate in a hybrid work environment based in Paris.
Security Platform Engineer managing operational security tasks at NTT DATA. Collaborating in incident response and security event monitoring within a 24/7 team environment.
Infrastructure Specialist at Kyndryl responsible for managing IT infrastructure projects. Offering analysis, solutions, and hands - on involvement throughout project lifecycles.
Sr. Platform Support Engineer in SRE Operations team at Saviynt. Ensuring stability and reliability of Enterprise Identity Cloud through application support and operational ownership.
Microsoft Power Platform Developer responsible for building automation solutions to improve operational efficiency. Collaborating with teams to enhance processes using Microsoft Power Platform tools.
Azure Platform Operations Engineer responsible for operational management of core Azure shared platform services at Benefact Group. Involves collaboration with architects and senior engineers.
Smarsh seeks a Platform Engineer I to design AWS cloud infrastructure for digital communications risk management. Collaborate with teams on infrastructure and customer onboarding efforts.
Software engineer at Uncountable focusing on Generative AI deployment in software. Building AI - powered search tools and developing LLM stack for scientific research.