Skip to content

Commit 27cdfb8

Browse files
authored
Merge pull request #19 from pndaproject/RELEASE-0.3.0
Release 0.3.0
2 parents 77edba8 + b7acc4f commit 27cdfb8

16 files changed

+272
-204
lines changed

CHANGELOG.md

Lines changed: 57 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,57 @@
1-
# Change Log
2-
All notable changes to this project will be documented in this file.
3-
4-
## [0.2.1] 2016-12-12
5-
### Changed
6-
- Externalized build logic from Jenkins to shell script so it can be reused
7-
- Refactored the information returned by the Application Detail API to include the YARN application state and also to return information for jobs that have ended. Made the implementation more performant by using the YARN Resource Manager REST API instead of the CLI.
8-
9-
## [0.2.0] 2016-10-21
10-
### Added
11-
- PNDA-2233 Jupyter notebook plugin added to deployment manager
12-
13-
## [0.1.1] 2016-09-13
14-
### Changes
15-
- Improvements to documentation
16-
- Enhanced CI support
17-
18-
## [0.1.0] 2016-07-01
19-
### First version
20-
21-
## [Pre-release]
22-
23-
### Added
24-
25-
- Add hue endpoint to environment endpoints API
26-
- Application names checked to only contain alphanumeric characters (a-z A-Z 0-9 - and _) because they are used directly in file paths.
27-
- Added ability to discover HDFS namedservices
28-
- Added information field to status reports
29-
- Using an external pacakge repository API instead of internal swift integration
30-
- Application detail API (GET /applications/<application>/detail) now returns YARN IDs assigned to the running tasks for that application.
31-
- Oozie error messages are reported when querying for status of an application creation call.
32-
- Packages are validated on deployment and the error messages reported when querying for status of a package deployment call.
33-
- Added support for opentsdb.json descriptor for creating metrics when deploying applications.
34-
- Callback events are sent to the console data logger.
35-
- Application detail API now completed to return Yarn IDs for any Yarn applications associated with a PNDA application.
36-
37-
### Fixed
38-
39-
- Return IP address for webhdfs/HTTPFS endpoint instead of hostname
40-
- Timeout calls to package repository at 120 seconds.
41-
- Deploying a package that does not exist in the package repository now results in a useful error message being returned to the caller.
42-
- Fixed defect preventing '-' being used in application names.
43-
- Fix Zookeeper quorum bug issue
44-
- Improve package validation to catch packages without 3 point version numbers and where the folder inside the tar does not match the package name.
45-
- Add list of zookeeper nodes to quorum
46-
- Remove port=8020 for named service
47-
- Oozie creator plugin sets 'oozie.wf.application.path' and 'oozie.coord.application.path' to point at the folder not the xml files.
48-
- Removed some stdout printouts
49-
- Fixed bug preventing recency parameter being used on the repository/packages API.
1+
# Change Log
2+
All notable changes to this project will be documented in this file.
3+
4+
## [0.3.0] 2017-01-20
5+
### Fixed
6+
- PNDA-2498: Application package data is now stored in HDFS with a reference to the path only held in the HBase record. This prevents HBase being overloaded with large packages (100MB+).
7+
8+
### Changed
9+
- PNDA-2485: Pinned all python libraries to strict version numbers
10+
- PNDA-2499: Return all exceptions to API caller
11+
12+
## [0.2.1] 2016-12-12
13+
### Changed
14+
- Externalized build logic from Jenkins to shell script so it can be reused
15+
- Refactored the information returned by the Application Detail API to include the YARN application state and also to return information for jobs that have ended. Made the implementation more performant by using the YARN Resource Manager REST API instead of the CLI.
16+
17+
## [0.2.0] 2016-10-21
18+
### Added
19+
- PNDA-2233 Jupyter notebook plugin added to deployment manager
20+
21+
## [0.1.1] 2016-09-13
22+
### Changed
23+
- Improvements to documentation
24+
- Enhanced CI support
25+
26+
## [0.1.0] 2016-07-01
27+
### First version
28+
29+
## [Pre-release]
30+
31+
### Added
32+
33+
- Add hue endpoint to environment endpoints API
34+
- Application names checked to only contain alphanumeric characters (a-z A-Z 0-9 - and _) because they are used directly in file paths.
35+
- Added ability to discover HDFS namedservices
36+
- Added information field to status reports
37+
- Using an external pacakge repository API instead of internal swift integration
38+
- Application detail API (GET /applications/<application>/detail) now returns YARN IDs assigned to the running tasks for that application.
39+
- Oozie error messages are reported when querying for status of an application creation call.
40+
- Packages are validated on deployment and the error messages reported when querying for status of a package deployment call.
41+
- Added support for opentsdb.json descriptor for creating metrics when deploying applications.
42+
- Callback events are sent to the console data logger.
43+
- Application detail API now completed to return Yarn IDs for any Yarn applications associated with a PNDA application.
44+
45+
### Fixed
46+
47+
- Return IP address for webhdfs/HTTPFS endpoint instead of hostname
48+
- Timeout calls to package repository at 120 seconds.
49+
- Deploying a package that does not exist in the package repository now results in a useful error message being returned to the caller.
50+
- Fixed defect preventing '-' being used in application names.
51+
- Fix Zookeeper quorum bug issue
52+
- Improve package validation to catch packages without 3 point version numbers and where the folder inside the tar does not match the package name.
53+
- Add list of zookeeper nodes to quorum
54+
- Remove port=8020 for named service
55+
- Oozie creator plugin sets 'oozie.wf.application.path' and 'oozie.coord.application.path' to point at the folder not the xml files.
56+
- Removed some stdout printouts
57+
- Fixed bug preventing recency parameter being used on the repository/packages API.

api/src/main/resources/app.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,11 @@ def finish():
8383
self.finish(ex.msg)
8484
else:
8585
self.set_status(500)
86-
self.finish()
86+
if "information" in str(ex):
87+
msg = str(ex)
88+
else:
89+
msg = {"status": "UNKNOWN", "information": str(ex)}
90+
self.finish(msg)
8791

8892
IOLoop.instance().add_callback(callback=finish)
8993

@@ -297,12 +301,16 @@ def main():
297301

298302
deployer_utils.fill_hadoop_env(config['environment'])
299303

300-
package_repository = PackageRepoRestClient(config['config']["package_repository"])
304+
package_repository = PackageRepoRestClient(config['config']["package_repository"], config['config']['stage_root'])
301305
dm = deployment_manager.DeploymentManager(package_repository,
302306
package_registrar.HbasePackageRegistrar(
303-
config['environment']['hbase_rest_server']),
307+
config['environment']['hbase_thrift_server'],
308+
config['environment']['webhdfs_host'],
309+
'hdfs',
310+
config['environment']['webhdfs_port'],
311+
config['config']['stage_root']),
304312
application_registrar.HbaseApplicationRegistrar(
305-
config['environment']['hbase_rest_server']),
313+
config['environment']['hbase_thrift_server']),
306314
config['environment'],
307315
config['config'])
308316

api/src/main/resources/application_creator.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@
2222

2323
import tarfile
2424
import os
25-
import io
2625
import json
2726
import re
2827

@@ -46,14 +45,14 @@ def __init__(self, config, environment, service):
4645
environment['webhdfs_port'],
4746
'hdfs')
4847

49-
def create_application(self, package_data, package_metadata, application_name, property_overrides):
48+
def create_application(self, package_data_path, package_metadata, application_name, property_overrides):
5049

5150
logging.debug("create_application: %s", application_name)
5251

5352
if not re.match('^[a-zA-Z0-9_-]+$', application_name):
5453
raise FailedCreation('Application name %s may only contain a-z A-Z 0-9 - _' % application_name)
5554

56-
stage_path = self._stage_package(package_data)
55+
stage_path = self._stage_package(package_data_path)
5756

5857
# create each class of components in the package, aggregating any
5958
# component specific return data for destruction
@@ -164,16 +163,14 @@ def _load_creator(self, component_type):
164163

165164
return creator
166165

167-
def _stage_package(self, package_data):
166+
def _stage_package(self, package_data_path):
168167

169168
logging.debug("_stage_package")
170169

171170
if not os.path.isdir(self._config['stage_root']):
172171
os.mkdir(self._config['stage_root'])
173172

174-
file_like_object = io.BytesIO(package_data)
175-
tar = tarfile.open(fileobj=file_like_object)
173+
tar = tarfile.open(package_data_path)
176174
stage_path = "%s/%s" % (self._config['stage_root'], uuid.uuid4())
177175
tar.extractall(path=stage_path)
178-
file_like_object.close()
179176
return stage_path

api/src/main/resources/application_registrar.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import logging
2424
import json
2525
import happybase
26-
from happybase.hbase.ttypes import AlreadyExists
26+
from Hbase_thrift import AlreadyExists
2727

2828
from lifecycle_states import ApplicationState
2929

api/src/main/resources/deployer_utils.py

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -97,16 +97,14 @@ def fill_hadoop_env(env):
9797
env['yarn_resource_manager_mr_port%s' % rm_instance] = '8032'
9898
if role.type == "NODEMANAGER":
9999
if 'yarn_node_managers' in env:
100-
env['yarn_node_managers'] = '%s,%s' % (
101-
env['yarn_node_managers'], api.get_host(role.hostRef.hostId).hostname)
100+
env['yarn_node_managers'] = '%s,%s' % (env['yarn_node_managers'], api.get_host(role.hostRef.hostId).hostname)
102101
else:
103102
env['yarn_node_managers'] = '%s' % api.get_host(
104103
role.hostRef.hostId).hostname
105104
elif service.type == "MAPREDUCE":
106105
for role in service.get_all_roles():
107106
if role.type == "JOBTRACKER":
108-
env['job_tracker'] = '%s:8021' % api.get_host(
109-
role.hostRef.hostId).hostname
107+
env['job_tracker'] = '%s:8021' % api.get_host(role.hostRef.hostId).hostname
110108
break
111109
elif service.type == "ZOOKEEPER":
112110
for role in service.get_all_roles():
@@ -119,42 +117,37 @@ def fill_hadoop_env(env):
119117
elif service.type == "HBASE":
120118
for role in service.get_all_roles():
121119
if role.type == "HBASERESTSERVER":
122-
env['hbase_rest_server'] = '%s' % api.get_host(
123-
role.hostRef.hostId).hostname
120+
env['hbase_rest_server'] = '%s' % api.get_host(role.hostRef.hostId).hostname
124121
env['hbase_rest_port'] = '20550'
125-
break
122+
elif role.type == "HBASETHRIFTSERVER":
123+
env['hbase_thrift_server'] = '%s' % api.get_host(role.hostRef.hostId).hostname
126124
elif service.type == "OOZIE":
127125
for role in service.get_all_roles():
128126
if role.type == "OOZIE_SERVER":
129-
env['oozie_uri'] = 'http://%s:11000/oozie' % api.get_host(
130-
role.hostRef.hostId).hostname
127+
env['oozie_uri'] = 'http://%s:11000/oozie' % api.get_host(role.hostRef.hostId).hostname
131128
break
132129
elif service.type == "HIVE":
133130
for role in service.get_all_roles():
134131
if role.type == "HIVESERVER2":
135-
env['hive_server'] = '%s' % api.get_host(
136-
role.hostRef.hostId).hostname
132+
env['hive_server'] = '%s' % api.get_host(role.hostRef.hostId).hostname
137133
env['hive_port'] = '10000'
138134
break
139135
elif service.type == "IMPALA":
140136
for role in service.get_all_roles():
141137
if role.type == "IMPALAD":
142-
env['impala_host'] = '%s' % api.get_host(
143-
role.hostRef.hostId).hostname
138+
env['impala_host'] = '%s' % api.get_host(role.hostRef.hostId).hostname
144139
env['impala_port'] = '21050'
145140
break
146141
elif service.type == "KUDU":
147142
for role in service.get_all_roles():
148143
if role.type == "KUDU_MASTER":
149-
env['kudu_host'] = '%s' % api.get_host(
150-
role.hostRef.hostId).hostname
144+
env['kudu_host'] = '%s' % api.get_host(role.hostRef.hostId).hostname
151145
env['kudu_port'] = '7051'
152146
break
153147
elif service.type == "HUE":
154148
for role in service.get_all_roles():
155149
if role.type == "HUE_SERVER":
156-
env['hue_host'] = '%s' % api.get_host(
157-
role.hostRef.hostId).hostname
150+
env['hue_host'] = '%s' % api.get_host(role.hostRef.hostId).hostname
158151
env['hue_port'] = '8888'
159152
break
160153

@@ -240,6 +233,25 @@ def create_file(self, data, remote_file_path):
240233
sio,
241234
overwrite=True)
242235

236+
def append_file(self, data, remote_file_path):
237+
238+
logging.debug('append to: %s', remote_file_path)
239+
240+
self._hdfs.append_file(canonicalize(remote_file_path), data)
241+
242+
243+
def stream_file_to_disk(self, remote_file_path, local_file_path):
244+
chunk_size = 10*1024*1024
245+
offset = 0
246+
with open(local_file_path, 'wb') as dest_file:
247+
data = self._hdfs.read_file(canonicalize(remote_file_path), offset=offset, length=chunk_size)
248+
while True:
249+
dest_file.write(data)
250+
if len(data) < chunk_size:
251+
break
252+
offset += chunk_size
253+
data = self._hdfs.read_file(canonicalize(remote_file_path), offset=offset, length=chunk_size)
254+
243255
def read_file(self, remote_file_path):
244256

245257
data = self._hdfs.read_file(canonicalize(remote_file_path))

0 commit comments

Comments
 (0)