Header

IT CONSULTING JOBS
This is an attempt to broadcast all the IT jobs we have with our direct clients /vendors and customers. Please check back as this blog gets updated regularly with new job postings. Do respond to chandra.atholi@outlook.com for client submission/ call 936 591 2990 for rapid response.

Thursday, March 10, 2016

 

Job Title: Operations Reliability Engineer (3102016- BIAS- IOM-SANR)
Location: San Ramon, CA

 

Duration:  Long Term

Job Function/Responsibilities:
As an Operations Reliability Engineer, you will be a key member of the ORE team, you will be working to improve the reliability and performance of our Operations. The ORE works as a first responder and is ultimately responsible for ensuring our cloud infrastructure services are up and running. You will work shoulder-to-shoulder with our engineering teams to deliver, build and operate the next generation of IaaS, PaaS and SaaS Cloud infrastructure and services, focusing on automation, availability and performance. You will diagnose and resolve latent and systemic reliability issues across entire stack: hardware, software, services, database, application and network. Drive standardization efforts across multiple disciplines and services.

  • Willing to roll up your sleeves and debug/tune/code/fix
  • Strong background and experience in scripting and automation
  • Represent the ORE organization in design reviews and operational readiness exercises for new and existing services with other teams.
  • Work with internal operational teams on driving availability, latency, scalability and efficiency of service/applications by instilling reliability into operational life cycle with a focus on fault tolerant approaches
  • Making sure the IaaS, PaaS & SaaS Cloud infrastructure and services platform meets or exceeds organization goals for availability, capacity, efficiency, scalability, and performance by engineering reliability into software and systems
  • Perform proactive daily system monitoring including reviewing system and application logs as well as responding to, triaging, troubleshooting and remediating incidents
  • Repair and recover from hardware or software failures. Coordinate and communicate with impacted stakeholders and clients, escalating where appropriate
  • Work closely with Infrastructure services, software support, security, development and engineering teams helping to build, maintain and extend the IaaS, PaaS & SaaS "live" services. Contribute in new and ongoing technology projects; Performance, High Availability and Scalability including partitioning, sharding, dynamic provisioning and de-provisioning of systems for current load, etc.
  • Review entire environment and execute initiatives to reduce failures, defects and improving overall performance.
  • Design, develop and execute automated tests to validate solutions and environments.
  • Monitor and troubleshoot issues across the entire stack - hardware, software, application and network.
  • Participate in performance analysis and tuning, service capacity planning and demand forecasting
  • Document current and future configuration processes and policies.
  • Assist with the implementation and development of SRE tools and applications
  • Manage and support SRE tools and applications
  • Participate in a 24x7 rotation for production issue escalations.

 

Qualifications & Requirements :

  • BS or MS degree in Computer Science, or a related field
  • 3 - 5 years of experience administering Linux systems and infrastructure in a SaaS/Cloud production environment – AWS/Private Cloud
  • Good understanding of service orientation methodology
  • Strong working knowledge of networking, packet tracing, understanding latency and throughput.
  • Strong working knowledge of Linux operating systems, their underlying components, system statistics, performance tuning, filesystems and io.
  • Solid understanding of systems and application design, including the operational trade-offs of various designs
  • Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software service design practices
  • Specialist in at least 2-3 of the following: Pivtotal Cloud Foundry, OpenStack, Hadoop, Pivotal HD, HAWQ, MSQL, RabbitMQ, Redis, Jenkins, IaaS [Compute – Linux, Storage, Network -  SDN – Juniper Contrail, Palo Alto Network FW, F5 load balancers]
  • Experience administering in customer-facing, high-availability, large scale environments.
  • Experience in one or more of the following languages: Shell, Python, PHP or Perl
  • Must have an understanding of building and managing large-scale systems and application architectures
  • Prior experience with configuration and maintenance of common applications such as Apache, MySQL, DHCP, SSH, DNS, etc.
  • Proficient in one or more of the following monitoring and logging tools: New Relic, App dynamics, Neustar, Gomez, Nimsoft, Zabbix, Nagios, Ganglia, Cacti, Splunk, Logstash, Graphite.
  • Working knowledge of Linux, TCP/IP, and web services
  • Prior experience with one or more of the following tools: Chef, Puppet, BOSH
  • Experience working in Agile environments
  • Solid verbal and written communication skills.

To Apply, Please click here: APPLY NOW 

 



 

 

1 comment:

  1. Thanks for posting this. I wish it essay writer was able to be translated, but for some reason Google toolbar isn't working. I copy pasted it into another application and read the post.

    ReplyDelete