瀏覽代碼

LIVY-294. HiveContext is always created instead of SQLContext for pyspark. (#270)

HiveContext is always created no matter whether we enable hiveContext through spark.repl.enableHiveContext. The root cause is that we depends on shell.py of spark. and unfortunately HiveContext would not initialize itself when created, but defer its initialization until any methods is called. This change would call sqlContext.tables() to check whether hiveContext can work properly.
Jeff Zhang 8 年之前
父節點
當前提交
89099b0d5d
共有 1 個文件被更改,包括 14 次插入0 次删除
  1. 14 0
      repl/src/main/resources/fake_shell.py

+ 14 - 0
repl/src/main/resources/fake_shell.py

@@ -30,6 +30,7 @@ import threading
 import tempfile
 import shutil
 import pickle
+import textwrap
 
 if sys.version >= '3':
     unicode = str
@@ -534,8 +535,21 @@ def main():
             exec('from pyspark.sql import HiveContext', global_dict)
             exec('from pyspark.streaming import StreamingContext', global_dict)
             exec('import pyspark.cloudpickle as cloudpickle', global_dict)
+
             if spark_major_version >= "2":
                 exec('from pyspark.shell import spark', global_dict)
+            else:
+                # LIVY-294, need to check whether HiveContext can work properly,
+                # fallback to SQLContext if HiveContext can not be initialized successfully.
+                # Only for spark-1.
+                code = textwrap.dedent("""
+                    import py4j
+                    from pyspark.sql import SQLContext
+                    try:
+                      sqlContext.tables()
+                    except py4j.protocol.Py4JError:
+                      sqlContext = SQLContext(sc)""")
+                exec(code, global_dict)
 
             #Start py4j callback server
             from py4j.protocol import ENTRY_POINT_OBJECT_ID